Skip to main content

Is today's world all about creativity and ideation?

Are they the seeds to be nurtured to bring in automation, innovation and transformation.  There is a saying, necessity is the mother of invention. I would say, innovation is amalgamation of creativity and necessity.  We need to understand the ecosystem, to apply creativity and identify the ideas to bring in change. We need to be competent with changing ecosystem and think beyond the possible. What is the biggest challenge in doing this? "Unlearning and Learning", we think the current ecosystem is the best. Be it health, finserve, agriculture or mechanical domain, we need to emphasize with the stakeholders, to come up with the strategy to drive. The very evident example here is the quality of life is changing every millisecond. Few decades back the phone connection was limited to few, but today all the millennials are having a mobile phone. Now phone is not just a medium to talk, but are so powerful devices that an innovative solution can be developed on it....

Z and T distribution values using R

Hello Data Experts,

Let me continue from my last blog http://outstandingoutlier.blogspot.in/2017/08/normality-test-for-data-using-r.html “ Normality test using R as part of advanced Exploratory Data Analysis where I had covered four moments of statistics and key concept around probability distribution, normal distribution and Standard normal distribution. Finally, I had also touched upon how to transform data to run normality test. I will help recap all those 4 moments. Those 4 moments of statistics.

  • First step covers Mean, Median and Mode, it is a measure of central tendency.
  • Second step covers Variance Standard Deviation, Range, it is a measure of dispersion.
  • Third step covers Skewness, it is a measure of asymmetry.
  • Fourth step covers Kurtosis, it is a measure of peakness.
To get standardized data use “scale” command using R whereas run “pnorm” command to get probability of a value using Z distribution. To understand if data follows normality we can execute “qqnorm” and “qqline” commands using R.

We had learned thus far that probability of any value is always Zero but can get probability less than or greater using standard normal distribution leveraging pnorm value. Generally, in the industry we have come across 95% as the starting benchmark value for confidence that expected outcome will be within this range. This definition of confidence in statistical terms called as confidence level. In simple statistical definition, it means for 95% of the samples population will follow the same mean.

We will touch upon Z Distribution and T Distribution techniques.  There is always an open query when to use which technique. As a matter of experience and usage, I follow below guiding principle for myself to proceed, If the size of a sample is < 30 (sample less than 30 is categorized as small in statistical world) and the Standard deviation for population is unknown, T distribution can should be the first choice whereas if the sample size is large i.e., >30 as well as SD for population is known Z distribution should be the technique. As sample size increase they trend closer output.

Confidence Interval    =          Sample mean + Margin of Error
Z Distribution               =          Sample mean + Z(1-α) value* (SD/square root of (sample size))
T Distribution               =          Sample mean + T(1-α, n-1,) value* (SD/square root of (sample size))

Let us consider a e-retailer who has 10500 register customers whom e-retailer wants to launch a new offer but before doing so she would like to get the confidence level of success. Before going for a launch, they chose 200 customers and granted then an access to new promotion where on an average 5 new products were purchased during this selecting launch with a standard deviation of 6. E-retailer typically launch new promotion every month hence they have a sd from last launch to the full population which is 5.5. Before new full launch she wanted the 95% confidence level to go full scale.

Here we have a sample size > 30 (Big sample size) and population SD is also known this Z-Distribution is the appropriate option here.

We can take manual route and using Z table come up with the Z score for 95 % confidence level and then then calculate confidence intervals but using R it is simple to get Z score by executing “qnorm” command.

# for 95% confidence, a value will be (for easy remembrance follow 95+(100-95)/2 = 97.5%).
qnorm (.975)  
result will be 1.959964

Once we get Z value (1.959964), sample mean as 5, Population SD (5.5) and sample size (200), applying a formula will get confidence level.  

5 + (1.971957*(6/Square root (200))) to 5 + (1.971957*(6/Square root (200)))

Pilot launch helped e-retailer that there will be 95% confidence that average sale will fall in the range from 4.24 to 5.76

Let us assume there was no earlier pilot launch and hence it’s for the first-time e-retailer is trying to launch promotion. In this case, only change will be instead of using population SD, it is recommended to use sample SD with Degree of freedom.  Degree of freedom can be considered as n-1 because if we have n-1 value, last value will be confirm/fix.

We can take manual route and using T table come up with the T score for 95% confidence and 199 degree of freedom and then calculate confidence intervals but using R it is simple to get T score by executing “qt” command.

# for 95% confidence, a value will be (for easy remembrance follow 95+(100-95)/2 = 97.5%) whereas degree of freedom will be 199 as sample size minus 1
qt(.975, 199)  
result will be 1.971957

Once we get T value (1.971957), sample mean as 6, Sample SD (6) and sample size (200), applying a formula will get confidence level.  

5 + (1.971957*(6/Square root (200))) to 5 + (1.971957*(6/Square root (200)))

Pilot launch helped e-retailer that there will be a 95% confidence that average sale will fall in the range from 4.16 to 5.84

If we know the benchmark confidence level, we can proceed with range but if we would like to understand the confidence level for a LCL or UCL we can use
pt(1.971957, 199)
result will be .975, i.e., 95% confidence.

I hope this topic was helpful in understating Z and T distribution concepts and how to derive Z Score and T score using R. Sample size and standard deviation for the population plays key role in deciding which technique to opt for.

Thank you for going through this blog, I hope it helped you built sound foundation of Z and T Distribution using R. Kindly share your valuable and kind opinion. Please do not forget to suggest what you would like to understand and hear from me in my future blogs. 

Thank you...
Outstanding Outliers:: "AG".  

 

 

Comments

Popular posts from this blog

Do we really need Data Scientist?

Hello Data Inquisitors, Today while having my discussion with Database expert, there was a healthy discussion between us around "Do we really need Data Scientist?". "DATA SPEAKS WHAT AND HOW ONE WANT TO SEE" - AG Discussion started by one of my dear friend who is the DB expert, he is the database administrator and is serving the industries consuming Data Mining and Data Warehouse techniques. He was very clear when he called out that Data Analytics is like an old wine in the new bottle. It just a new Job title has been created to continuous with thunder in new disruptive world. I appreciated his thought and the sense of attachment to "Data Cloud". Discussion went on for an hour before he embraced the need of Data Scientists.  Data Scientist to me is an Architect who has the skills to project collection of data points i.e., " Data Ocean" to a decision-making Data Visualization asset by using complex stati...

DevOps Models

Hello Everyone, IT industry is going through a Disruptive Evolution, where Artificial Intelligence and Intelligent Automation is helping organization go Lean and Agile. Leaders are at the crossroad where they need to pick the path which will empower their business teams to be more productive and focus on core. In this blog, I tried to invoke a thought process for leaders how they can step up their game by taking baby steps but still following fast lane to reach destination on time. Thought leaders must have been tracking the industry pulse how IT is changing fast pace by adopting Artificial Intelligent Driven Innovative frameworks. To drive Delivery in much more efficient and eloquent way, everyone must adopt new optimized Development and Operations practices to sustain in the current competitive ecosystem (Service or Captive world), by keeping cost to minimal.     IT gurus are smartly redefining their vision and practices towards Lean methodologies.  ...

What's the right time for Digital Marketing?

  Hello Friends, First, let me THANK YOU for taking time out from your demanding timetable to read my articles on Digital Marketing sequence. This is my second write up on Digital Marketing to help Entrepreneurs, Digital Marketers and Small Businesses to understand A to Z about Digital Marketing. In this artefact, I will be concentrating on “When should one go for Digital Marketing?”.   Last article I wrote was focused on “ Why Digital Marketing Strategy is needed? ”. Taking an analogy, like there is life cycle for all human life, Digital Marketing maturity can also be defined as naïve to mature. It is important to understand what should be actioned and when to get necessary benefits. Let me come back to the topic "When should we focus on Digital Marketing?”.   It is imperative to take right actions at right time to get right outcome. Similarly, right Digital Marketing using apt disruptive techniques results can be noticed in form of consumer beh...