Skip to main content

Is today's world all about creativity and ideation?

Are they the seeds to be nurtured to bring in automation, innovation and transformation.  There is a saying, necessity is the mother of invention. I would say, innovation is amalgamation of creativity and necessity.  We need to understand the ecosystem, to apply creativity and identify the ideas to bring in change. We need to be competent with changing ecosystem and think beyond the possible. What is the biggest challenge in doing this? "Unlearning and Learning", we think the current ecosystem is the best. Be it health, finserve, agriculture or mechanical domain, we need to emphasize with the stakeholders, to come up with the strategy to drive. The very evident example here is the quality of life is changing every millisecond. Few decades back the phone connection was limited to few, but today all the millennials are having a mobile phone. Now phone is not just a medium to talk, but are so powerful devices that an innovative solution can be developed on it.

Data Types and Visualization using R

Hello Data Experts,

Let me continue from my last blog http://outstandingoutlier.blogspot.in/2017/08/statistical-programming-using-r-part-1.html “Statistical Programming using R Part 1” where we discussed basic statistical inbuilt functions like Mean, Median Mode etc. summary of statistical observation of dataset.

Let us move to an interesting usage of R for Data Visualization keep statistical scope in view. Visualization is in term of Graphs like Box Plots, Scatter Diagrams, Pie charts, Histogram, Line Graph, Bar chart and many more.

Before we get into visual representation of statistical data, it is also important for one to understand different Data and Object types.

There are different data types like Numeric, String, Integer, Binary. Different types of Objects those   can store different data types like List, Vector, Matrix, Data Frames, Factors and arrays

Vectors can have only single data type values. Let us look at few examples:

This will hold logical value
W <- TRUE
W

This will hold numeric value
N <- 5.5
N

This will hold Integer Value
G <- 4L
G

Whereas below example covers Character Data type
F <-"FALSE"
F

Let us try assigning set of m=numeric values to an object A, since all values assigned are numeric (homogeneous set) hence default data type will be numeric.
A <- c(1,2,3,4,5,6,7,8,9)
A

If we have heterogeneous set of values in a vector it will convert each value into Character type. There is a data type loss in vectors.

G <- c("QW", 1, FALSE)
G

To retain the data type, we should go for LIST.

GLIST <- list("QW", 1, FALSE)
GLIST

So far we understood how is Vector different from List. Sorting works well with Vector type but not   for List. Let us try sort on both sets now, first sort will execute well whereas second one will error     out as expected.

GSORT <- sort(G)
GSORT

GLISTSORT <- sort(GLIST)
GLISTSORT

Let us use merge command for both types. Merge for Vector is a Cartesian product whereas merge for List is merging of 2 data sets hence for vector 3*3 will be the outcome where 3+3 for list. This is important to understand.

VECTORMERGE <- merge(G, GSORT)
VECTORMERGE

LISTMERGE <- merge(GLIST, GLISTSORT)
LISTMERGE

To summarize attributes for Vector and List.
Vector: Converts data in characters and merging is a Cartesian product.
List: Retains the data type of the data and merging is a simple concatenation.

Let us discuss new object type ARRAY. Array is a data type which hold data in rows and column form.

A1 <- c(1,2,3,4,5)
A2 <- c(9,8,7,6,5,4)
A1
A2

Array2 <- array(c(A1, A2),dim = c(3,5,2))
Array2

This will form an array of 3 rows, 5 columns and 2 dimensional.

Array2 <- array(c(A1 ,A2),dim = c(6,2,4))
Array2

This will form an array of 6 rows, 2 columns and 4 dimensional.

Let us move on to Matrix. It is a simple 2-dimension rectangular layout. TRUE or FALSE parameter is use to arrange data by Row or Column. TRUE will set data as Row and FALSE as COLUMN. Default value is FALSE.

A1 <- c(1,2,3,4,5)
A2 <- c(9,8,7,6,5,4)

MAT <- matrix(c(A1, A2),11,5)
MAT

MAT <- matrix(c(A1, A2),11,5, TRUE)
MAT

MAT <- matrix(c(A1, A2),5, 11)
MAT

MAT <- matrix(c(A1, A2), 5,11, TRUE)
MAT

There is another data type FACTOR, which is used to identify LEVEL i.e., it is same as getting the unique values in sort order.

A3 <- c(9,8,8,7,9,4,4,2,8,6,5,4)
FV <- factor(A3)
FV

Let us now talk about most important Object Type Data Frame. It is a table structure with Rows and columns.

IncidentNumber <- c(111,114,143,456)
IncidentDesc <- c("AA", "GG")
IncidentPri <- c("Low", "Low", "Medium", "Complex")

IncidentDet <- data.frame(IncidentNumber,IncidentDesc, IncidentPri)
IncidentDet

Since we have covered various types of objects, we will start will Graphical visualization now.

For Pie chart let us populate 2 objects with values

PLOTVAL <- c(2,5,9,4,9)
PIEDESC <-c("A","B","C","D","E")
pie (PLOTVAL,PIEDESC, col = rainbow(length(PLOTVAL)))

Let us try to plot a BAR CHART

PLOTVAL <- c(2,5,9,4,9)
barplot(PLOTVAL)

To draw Histogram, we need an object with value

PLOTVAL <- c(2,5,9,4,9)
hist(PLOTVAL, col = "green", border = "red")

Line Graph can be drawn with the single object as well

PLOTVAL <- c(2,5,9,4,9)
plot(PLOTVAL, type = "o", col = "red")

To draw a scatter Graph, need two set of values so that they represent X and Y Axis.

PLOTVAL <- c(2,5,9,4,9)
PLOTVAL1 <- c(2,5,9,4,9)
plot(PLOTVAL, PLOTVAL1, col = "blue")

For box plot to be drawn we need 2 set of values

PLOTVAL <- c(120,50,90,14,9)
PLOTVAL1 <- c(120,25,19,175,29)
boxplot(PLOTVAL, PLOTVAL1, col ="blue")

I hope first view of how one can generate graphs must be a good experience. Now that you have got key statistical formulas and visualization exposure, you are ready to explore advance statistical problems. In my next blog, I will cover “Advance statistical formulas using R Studio”.

Thank you for sparing time and going through this blog I hope it helped you built sound foundation of statistics using R. Kindly share your valuable and kind opinion. Please do not forget to suggest what you would like to understand and hear from me in my future blogs.  

Thank you...
Outstanding Outliers:: "AG".  

Comments

Popular posts from this blog

Z and T distribution values using R

Hello Data Experts, Let me continue from my last blog http://outstandingoutlier.blogspot.in/2017/08/normality-test-for-data-using-r.html “ Normality test using R as part of advanced Exploratory Data Analysis where I had covered four moments of statistics and key concept around probability distribution, normal distribution and Standard normal distribution. Finally, I had also touched upon how to transform data to run normality test. I will help recap all those 4 moments. Those 4 moments of statistics. First step covers Mean, Median and Mode, it is a measure of central tendency. Second step covers Variance Standard Deviation, Range, it is a measure of dispersion. Third step covers Skewness, it is a measure of asymmetry. Fourth step covers Kurtosis, it is a measure of peakness. To get standardized data use “scale” command using R whereas run “pnorm” command to get probability of a value using Z distribution. To understand if data follows normality we can e

Practical usage of RStudio features

Hello Data Experts, Let me continue from my last blog Step by Step guide to install R :: “Step by Step guide to install R” where I had shared steps to install R framework and R Studio on windows platform. Now that we are ready with Installation and R Studio, I will take you through R Studio basics. R Studio has primarily 4 sections with multiple sub tabs in each window: Top Left Window: Script editor: It is for writing, Saving and opening R Scripts. Commands part of Script can also be executed from this window. Data viewer: Data uploaded can be viewed in this window.   Bottom Left Window: Console: R Commands run in this window.   Top Right Window: Workspace: workspace allow one to view objects and values assigned to them in global environment. Historical commands: There is an option to search historical commands from beginning till last session. Beauty of this editor is that historical commands are searchable. Once historical commands are searched they can be

Code Branch and Merge strategies

Learn Git in a Month of Lunches Hello Everyone, IT industry is going through a disruptive evolution where being AGILE and adopting DevOps is the key catalytic agent for accelerating the floor for success. As explained in my earlier blog, they complement each other rather than competing against one another. If Leaders will at the crossroad where in case they need to pick one what should be their pick. There is no right or wrong approaching, it depends on the scenario and dynamics for the program or project. I would personally pick #DevOps over Agile as its supremacy lies in ACCELERATING delivery with RELIABILITY and CONSISTENCY . This path will enable and empower development teams to be more productive and prone to less rework. Does this mean adopting DevOps with any standard will help reap benefits? In this blog, I will focus on importance of one of the standard and best practice around Code branching and merging strategy to get the desired outcome by adopting DevOps. To