Univariate Statistics is study of individual variables as the name suggests. Its not always possible to go through entire data, you’d rather like to look at few summary points which would give an idea/overview of the variables behaviour.
Facets of the data: To arrive at proper summary statistics we need to understand what facets of the data we need to look to have a complete overview:
Central Tendency: Central tendency means average or representative behaviour of the data.
Measures of central tendency are:
➢Median : middle most value when all the values are sorted
➢In case of even number of values, average of two middle most values is taken as median
➢Mode: Value which is most frequent. This is used for categorical variables mostly
➢Mean is sensitive to extreme values, Median isn’t.
➢Mean, Median ; Both take unique values only
➢Mode can take multiple values
➢Equal central tendency doesn’t mean similar all round behavior
➢Data values can be different in terms of spread around the central tendency
Measures of Variability:
➢Standard Deviation / Variance
➢Mean Absolute Deviation (MAD)
➢Inter Quartile Range (IQR) : Q2 (second quartile aka median) divides data into two parts. Q1 :first quartile divided first part into two equal parts and Q3: third quartile does the same for second part.
Properties of Measures:
➢Range is most sensitive to extreme values
➢Std , Variance and MAD are sensitive to extreme values as well
➢IQR ignores leading and trailing values, hence is not sensitive to extreme values
➢MAD is as good a measure of variability as variance , only issue is that its difficult to manipulate algebraically
➢Central tendency and variability summary stats give you an idea about how your data is centered and spread but it doesn’t tell you how frequent particular values are in your data
Types of shapes & Numerical Measure
➢They are defined for numeric variables
➢Values in the data which are too different from rest of the data, meaning they are either too small or too large
➢We can quantify these “too small” and “too large” values by defining an acceptable range of values for data ➢Standard example of ranges:
➢Mean ± N * Std
➢[Q1- N*IQR, Q3+N*IQR]