Statistics - Standard Deviation


Standard deviation is the most commonly used measure of variation, which describes how spread out the data is.


Standard Deviation

Standard deviation (σ) measures how far a 'typical' observation is from the average of the data (μ).

Standard deviation is important for many statistical methods.

Here is a histogram of the age of all 934 Nobel Prize winners up to the year 2020, showing standard deviations:

Histogram of the age of Nobel Prize winners with interquartile range shown.

Each dotted line in the histogram shows a shift of one extra standard deviation.

If the data is normally distributed:

  • Roughly 68.3% of the data is within 1 standard deviation of the average (from μ-1σ to μ+1σ)
  • Roughly 95.5% of the data is within 2 standard deviations of the average (from μ-2σ to μ+2σ)
  • Roughly 99.7% of the data is within 3 standard deviations of the average (from μ-3σ to μ+3σ)

Note: A normal distribution has a "bell" shape and spreads out equally on both sides.


Calculating the Standard Deviation

You can calculate the standard deviation for both the population and the sample.

The formulas are almost the same and uses different symbols to refer to the standard deviation (\(\sigma\)) and sample standard deviation (\(s\)).

Calculating the standard deviation (\(\sigma\)) is done with this formula:

\(\displaystyle \sigma = \sqrt{\frac{\sum (x_{i}-\mu)^2}{n}}\)

Calculating the sample standard deviation (\(s\)) is done with this formula:

\(\displaystyle s = \sqrt{\frac{\sum (x_{i}-\bar{x})^2}{n-1}}\)

\(n\) is the total number of observations.

\(\sum \) is the symbol for adding together a list of numbers.

\(x_{i}\) is the list of values in the data: \(x_{1}, x_{2}, x_{3}, \ldots \)

\(\mu\) is the population mean and \(\bar{x}\) is the sample mean (average value).

\( (x_{i} - \mu ) \) and \( (x_{i} - \bar{x} ) \) are the differences between the values of the observations (\(x_{i}\)) and the mean.

Each difference is squared and added together.

Then the sum is divided by \(n\) or (\( n - 1 \)) and then we find the square root.

Using these 4 example values for calculating the population standard deviation:

4, 11, 7, 14

We must first find the mean:

\(\displaystyle \mu = \frac{\sum x_{i}}{n} = \frac{4 + 11 + 7 + 14}{4} = \frac{36}{4} = \underline{9} \)

Then we find the difference between each value and the mean \( (x_{i}- \mu)\):

  • \( 4-9 \; \:= -5 \)
  • \( 11-9 = 2 \)
  • \( 7-9 \; \:= -2 \)
  • \( 14-9 = 5 \)

Each value is then squared, or multiplied with itself \( ( x_{i}- \mu )^2\):

  • \( (-5)^2 = (-5)(-5) = 25 \)
  • \( 2^2 \; \; \; \; \; \, = 2*2 \; \; \; \; \; \; \; \: = 4 \)
  • \( (-2)^2 = (-2)(-2) = 4 \)
  • \( 5^2 \; \; \; \; \; \, = 5*5 \; \; \; \; \; \; \; \: = 25 \)

All of the squared differences are then added together \( \sum (x_{i} -\mu )^2\):

\( 25 + 4 + 4 + 25 = 58\)

Then the sum is divided by the total number of observations, \( n \):

\( \displaystyle \frac{58}{4} = 14.5\)

Finally, we take the square root of this number:

\( \sqrt{14.5} \approx \underline{3.81} \)

So, the standard deviation of the example values is roughly: \(3.81 \)



Calculating the Standard Deviation with Programming

The standard deviation can easily be calculated with many programming languages.

Using software and programming to calculate statistics is more common for bigger sets of data, as calculating by hand becomes difficult.

Population Standard Deviation

Example

With Python use the NumPy library std() method to find the standard deviation of the values 4,11,7,14:

import numpy

values = [4,11,7,14]

x = numpy.std(values)

print(x)
Try it Yourself »

Example

Use an R formula to find the standard deviation of the values 4,11,7,14:

values <- c(4,7,11,14)

sqrt(mean((values-mean(values))^2))
Try it Yourself »

Sample Standard Deviation

Example

With Python use the NumPy library std() method to find the sample standard deviation of the values 4,11,7,14:

import numpy

values = [4,11,7,14]

x = numpy.std(values, ddof=1)

print(x)
Try it Yourself »

Example

Use the R sd() function to find the sample standard deviation of the values 4,11,7,14:

values <- c(4,7,11,14)

sd(values)
Try it Yourself »

Statistics Symbol Reference

Symbol Description
\( \sigma \) Population standard deviation. Pronounced 'sigma'.
\( s \) Sample standard deviation.
\( \mu \) The population mean. Pronounced 'mu'.
\( \bar{x} \) The sample mean. Pronounced 'x-bar'.
\( \sum \) The summation operator, 'capital sigma'.
\( x \) The variable 'x' we are calculating the average for.
\( i \) The index 'i' of the variable 'x'. This identifies each observation for a variable.
\( n \) The number of observations.

Copyright 1999-2023 by Refsnes Data. All Rights Reserved.