Are they the seeds to be nurtured to bring in automation, innovation and transformation. There is a saying, necessity is the mother of invention. I would say, innovation is amalgamation of creativity and necessity. We need to understand the ecosystem, to apply creativity and identify the ideas to bring in change. We need to be competent with changing ecosystem and think beyond the possible. What is the biggest challenge in doing this? "Unlearning and Learning", we think the current ecosystem is the best. Be it health, finserve, agriculture or mechanical domain, we need to emphasize with the stakeholders, to come up with the strategy to drive. The very evident example here is the quality of life is changing every millisecond. Few decades back the phone connection was limited to few, but today all the millennials are having a mobile phone. Now phone is not just a medium to talk, but are so powerful devices that an innovative solution can be developed on it.
Hello Data Experts,
Let me continue from my last blog
http://outstandingoutlier.blogspot.in/2017/08/statistical-programming-using-r-part-1.html “Statistical Programming using R Part
1” where we discussed basic statistical inbuilt functions like Mean, Median
Mode etc. summary of statistical observation of dataset.
Let us move to an interesting
usage of R for Data Visualization keep statistical scope in view. Visualization
is in term of Graphs like Box Plots, Scatter Diagrams, Pie charts, Histogram, Line
Graph, Bar chart and many more.
Before we get into visual
representation of statistical data, it is also important for one to understand
different Data and Object types.
There are different data types
like Numeric, String, Integer, Binary. Different types of Objects those can store
different data types like List, Vector, Matrix, Data Frames, Factors and arrays
Vectors can have only single data
type values. Let us look at few examples:
This will hold logical value
W
<- TRUE
W
This will hold numeric value
N
<- 5.5
N
This will hold Integer Value
G
<- 4L
G
Whereas below example covers Character
Data type
F
<-"FALSE"
F
Let us try assigning set of
m=numeric values to an object A, since all values assigned are numeric (homogeneous set) hence default data type will be numeric.
A
<- c(1,2,3,4,5,6,7,8,9)
A
If we have heterogeneous set of
values in a vector it will convert each value into Character type. There is a
data type loss in vectors.
G
<- c("QW", 1, FALSE)
G
To retain the data type, we
should go for LIST.
GLIST
<- list("QW", 1, FALSE)
GLIST
So far we understood how is
Vector different from List. Sorting works well with Vector type but not for List. Let
us try sort on both sets now, first sort will execute well whereas second one
will error out as expected.
GSORT
<- sort(G)
GSORT
GLISTSORT
<- sort(GLIST)
GLISTSORT
Let us use merge command for both
types. Merge for Vector is a Cartesian product whereas merge for List is
merging of 2 data sets hence for vector 3*3 will be the outcome where 3+3 for
list. This is important to understand.
VECTORMERGE
<- merge(G, GSORT)
VECTORMERGE
LISTMERGE
<- merge(GLIST, GLISTSORT)
LISTMERGE
To summarize attributes for Vector
and List.
Vector: Converts data in
characters and merging is a Cartesian product.
List: Retains the data type of
the data and merging is a simple concatenation.
Let us discuss new object type
ARRAY. Array is a data type which hold data in rows and column form.
A1
<- c(1,2,3,4,5)
A2
<- c(9,8,7,6,5,4)
A1
A2
Array2
<- array(c(A1, A2),dim = c(3,5,2))
Array2
This will form an array of 3
rows, 5 columns and 2 dimensional.
Array2
<- array(c(A1 ,A2),dim = c(6,2,4))
Array2
This will form an array of 6
rows, 2 columns and 4 dimensional.
Let us move on to Matrix. It is a
simple 2-dimension rectangular layout. TRUE or FALSE parameter is use to arrange
data by Row or Column. TRUE will set data as Row and FALSE as COLUMN. Default
value is FALSE.
A1
<- c(1,2,3,4,5)
A2
<- c(9,8,7,6,5,4)
MAT
<- matrix(c(A1, A2),11,5)
MAT
MAT
<- matrix(c(A1, A2),11,5, TRUE)
MAT
MAT
<- matrix(c(A1, A2),5, 11)
MAT
MAT
<- matrix(c(A1, A2), 5,11, TRUE)
MAT
There is another data type
FACTOR, which is used to identify LEVEL i.e., it is same as getting the unique
values in sort order.
A3
<- c(9,8,8,7,9,4,4,2,8,6,5,4)
FV
<- factor(A3)
FV
Let us now talk about most
important Object Type Data Frame. It is a table structure with Rows and
columns.
IncidentNumber
<- c(111,114,143,456)
IncidentDesc
<- c("AA", "GG")
IncidentPri
<- c("Low", "Low", "Medium",
"Complex")
IncidentDet
<- data.frame(IncidentNumber,IncidentDesc, IncidentPri)
IncidentDet
Since we have covered various
types of objects, we will start will Graphical visualization now.
For Pie chart let us populate 2
objects with values
PLOTVAL
<- c(2,5,9,4,9)
PIEDESC
<-c("A","B","C","D","E")
pie
(PLOTVAL,PIEDESC, col = rainbow(length(PLOTVAL)))
Let us try to plot a BAR CHART
PLOTVAL
<- c(2,5,9,4,9)
barplot(PLOTVAL)
To draw Histogram, we need an
object with value
PLOTVAL
<- c(2,5,9,4,9)
hist(PLOTVAL,
col = "green", border = "red")
Line Graph can be drawn with the
single object as well
PLOTVAL
<- c(2,5,9,4,9)
plot(PLOTVAL,
type = "o", col = "red")
To draw a scatter Graph, need two
set of values so that they represent X and Y Axis.
PLOTVAL
<- c(2,5,9,4,9)
PLOTVAL1
<- c(2,5,9,4,9)
plot(PLOTVAL,
PLOTVAL1, col = "blue")
For box plot to be drawn we need
2 set of values
PLOTVAL
<- c(120,50,90,14,9)
PLOTVAL1
<- c(120,25,19,175,29)
boxplot(PLOTVAL,
PLOTVAL1, col ="blue")
I hope first view of how one can
generate graphs must be a good experience. Now that you have got key
statistical formulas and visualization exposure, you are ready to explore
advance statistical problems. In my next blog, I will cover “Advance
statistical formulas using R Studio”.
Thank you for sparing time and
going through this blog I hope it helped you built sound foundation of statistics
using R. Kindly share your valuable and kind opinion. Please do not forget to
suggest what you would like to understand and hear from me in my future blogs.
Thank you...
Outstanding Outliers::
"AG".
Comments
Post a Comment