Z-Score (Statistics for Data Science)

Subhradeep Guha
4 min readFeb 24, 2022
Z-Score

Z-Score is basically a numerical distribution which is used to find out if any observation is common or exceptional.

Z-score is measured in terms of standard deviations from the mean. Z-scores may be positive or negative, with a positive value indicating the score is above the mean and a negative score indicating it is below the mean. Z-Score indicates how many standard deviations away (above or below) from the mean value to the given point.

Importance and brief about Z-Score:

As we said Z-score is measured in terms of standard deviations from the mean, let’s take an easy example if a Z-score is 0, it indicates that the data point’s score is identical to the mean score. A Z-score of 1.0 would indicate a value that is one standard deviation from the mean.

Z-scores reveal to statisticians and traders whether a score is typical for a specified data set or if it is atypical. Z-scores also make it possible for analysts to adapt scores from various data sets to make scores that can be compared to one another more accurately.

To calculate a Z-score is z = (x-μ)/σ, where x is the raw score, μ is the mean, and σ is the standard deviation.

We can decide if a certain Z score is high or low and it depends on context and distribution.

  1. If it is a normal distribution

In this case we can say z score >3 or z score<-3 can be assumed as rather exceptional.

2. If it is skewed to right.

Large positive Z scores are more common in right skewed graph.

3. If it is skewed to left.

Large negative Z scores are more common as more extreme values are on left side of distribution.

For any distribution, regardless shape it is said that 75% of data must lie between -2 and +2 Z score and 89% of data between -3 and 3 Z score values.

Example of Z-Score:

Let’s say Rick has scored 70 out of 100, the mean score was 60, and the standard deviation was 15 then:

Score (x)- 70

Mean (µ)- 60

Standard Deviation (sigma)- 15

In terms of z-scores, this gives us:

The z-score is 0.67 (to 2 decimal places), but now we need to work out the percentage (or number) of students that scored higher and lower than Rick. To do this, we need to refer to the standard normal distribution table.

Now Let’s say if he has score 90 out of 100 and the mean & standard deviations are 60 and 15 accordingly then Z-score will be 2 which means it’s 2 standard deviation to the right of the mean.

Pros & Cons:

Advantages of Z scores:

One major advantage of standard or z scores is that they can be used to compare raw scores that are taken from different tests especially when the data are at the interval of management.

The advantage of the z score transformation is that it takes into account both the mean value and the variability in a set of raw scores

Disadvantages of Z scores:

The main disadvantage of standard scores is that they always assume a normal distribution. But if this assumption is not met, the scores cannot be interpreted as a standard proportion of the distribution from which they were calculated. For example, if the distribution is skewed, the area with the standard deviation of 1 to the left of the mean is not equal to the area within the same distance to the right of the mean.

--

--

Subhradeep Guha

Data Scientist with a good amount analytical skill, Python, SAS, ML, Statistics skills.