Skip to main content

Is today's world all about creativity and ideation?

Are they the seeds to be nurtured to bring in automation, innovation and transformation.  There is a saying, necessity is the mother of invention. I would say, innovation is amalgamation of creativity and necessity.  We need to understand the ecosystem, to apply creativity and identify the ideas to bring in change. We need to be competent with changing ecosystem and think beyond the possible. What is the biggest challenge in doing this? "Unlearning and Learning", we think the current ecosystem is the best. Be it health, finserve, agriculture or mechanical domain, we need to emphasize with the stakeholders, to come up with the strategy to drive. The very evident example here is the quality of life is changing every millisecond. Few decades back the phone connection was limited to few, but today all the millennials are having a mobile phone. Now phone is not just a medium to talk, but are so powerful devices that an innovative solution can be developed on it....

Statistical Programming using R

Hello Data Experts,

Let me continue from my last blog http://outstandingoutlier.blogspot.in/2017/08/dataset-using-r.html  “Dataset using R” where we discussed how to load Dataset from CSV and work on basic operations over that dataset.   

Given R is the statistical language, it is important for us to unleash the power of R for statistical analysis. This blog will help you learn statistical calculations such as Mean, Median, Mode, Variance, SD and few other formulas. Let us move forward and understand how to get these values.

Before we delve into statistical programming, let us understand why do we need to calculate all three mean, Median and Mode and not just anyone. Mean represent average value which can get influenced by outlier (higher or lower) value. Median reflects middle most value with less probability of getting influenced by outlier. Mode is the maximum repeated value in the dataset. Standard Deviation derives level of variation. It is very commonly used as 1 sigma,2 Sigma, 3 Sigma, 4 Sigma, 5 Sigma and 6 Sigma. Healthcare and airline industry should meet 6 Sigma hence any industry which can have huge impact either on life or intensity of disaster higher sigma is good. 

To keep this session simple, I will populate an object CarsMileage with certain set of values and later derive statistical values.

Let us populate CarsMileage with 20 random values, which defines mileage for last 20 weeks.

CarsMileage <- c(12, 14, 12.5, 13.5, 15, 10, 11, 12, 12, 14, 12, 11.5, 12.5, 13.5, 15, 10.5, 15, 12, 14, 14)

Let us calculate Statistical value:

**********
Minimum value is the least value all numbers in the dataset
min(CarsMileage)
This will result in the minimum value i.e., 10

**********
Maximum value is the maximum value all numbers in the dataset
max(CarsMileage)
This will result in the maximum value i.e., 15

**********
Mean value is the average of all values
mean(CarsMileage)
This will result in the mean value i.e., 12.8

**********
Median value is the middle most value of the dataset. Dataset gets sorted in the memory, post that middle value is derived. In case of even number of data points Median is derive as the average of middle 2 values, whereas in case of odd number of values middle most value is the Median.
median(CarsMileage)
Median value of this dataset will be 12.5

***********
Mode is the value which is maximum times there in the dataset for string and data type of numeric.
mode(CarsMileage)
Mode value of this dataset is “numeric”.
 
**********
Standard Deviation of the dataset
sd(CarsMileage)
Standard Deviation value of this dataset is 1.499123

**********
Variance of this dataset
var(CarsMileage)
Variance value of this dataset is 2.247368

**********

Let us find out the probability of 10 mph using standard normal distribution.
pnorm(10, mean(CarsMileage), sd(CarsMileage))

Similarly, if one would like to understand if standardization value follows normal distribution.
qqnorm(CarsMileage)
qqline(CarsMileage)

***********
If in one shot we would like to have a summary of key statistical values, it is very easy to achieve using R programing language.

summary(CarsMileage)

This will get us below set of details:
Min.    1st Qu.    Median    Mean    3rd Qu.    Max.
10.0    12.0        12.5       12.8      14.0        15.0

I hope first glimpse of statistical power must have been very helpful. Now that we have got key statistical formulas handy we should explore more. We should explore more statistical power with R programming in coming sessions. In my next blog, I will cover “Graphical representation of statistical values using R Studio”.

Thank you for sparing time and going through this blog I hope it helped you grasp basics of statistical power using R. Kindly share your valuable and kind opinion. Please do not forget to suggest what you would like to understand and hear from me in my future blogs. 

Thank you...
Outstanding Outliers:: "AG".    

Comments

Popular posts from this blog

Do we really need Data Scientist?

Hello Data Inquisitors, Today while having my discussion with Database expert, there was a healthy discussion between us around "Do we really need Data Scientist?". "DATA SPEAKS WHAT AND HOW ONE WANT TO SEE" - AG Discussion started by one of my dear friend who is the DB expert, he is the database administrator and is serving the industries consuming Data Mining and Data Warehouse techniques. He was very clear when he called out that Data Analytics is like an old wine in the new bottle. It just a new Job title has been created to continuous with thunder in new disruptive world. I appreciated his thought and the sense of attachment to "Data Cloud". Discussion went on for an hour before he embraced the need of Data Scientists.  Data Scientist to me is an Architect who has the skills to project collection of data points i.e., " Data Ocean" to a decision-making Data Visualization asset by using complex stati...

DevOps Models

Hello Everyone, IT industry is going through a Disruptive Evolution, where Artificial Intelligence and Intelligent Automation is helping organization go Lean and Agile. Leaders are at the crossroad where they need to pick the path which will empower their business teams to be more productive and focus on core. In this blog, I tried to invoke a thought process for leaders how they can step up their game by taking baby steps but still following fast lane to reach destination on time. Thought leaders must have been tracking the industry pulse how IT is changing fast pace by adopting Artificial Intelligent Driven Innovative frameworks. To drive Delivery in much more efficient and eloquent way, everyone must adopt new optimized Development and Operations practices to sustain in the current competitive ecosystem (Service or Captive world), by keeping cost to minimal.     IT gurus are smartly redefining their vision and practices towards Lean methodologies.  ...

“OUTCOME” or “OUTPUT” driven Agile

Hello All,     Nowadays IT industry is bombarded with articles on Agile with loud and clear message #BeLean. Everyone around teaches AGILE as in #GOAGILE, #BEAGILE, #AGILITYLEADS and many more hashtags around #ONLYAGILE. Lean Engineering gurus have been coaching corporates to go #AGILE and be #LEAN. Literal English meaning of being Agile is to be nimble, to be able to adapt to the changing needs of company to achieve goals as to what is desired by business. But why do we need Agility, is it to be able to achieve outcome i.e., #BusinessesNeed with speed i.e., #Velocity? I am perplexed with what I keep hearing around Agile practices and I firmly believe we should try to understand the rational for being Agile by choosing right “O”, either go #Outcome or #Output. What will you prefer without reading this blog, Output or Outcome?   Let me take you two decades back when there was a need for transformation. Transformation from big-bang i.e., #waterfall to iterat...