In statistics, quartiles and their graphical representations in boxplots is useful when comparing samples and testing whether data is distributed symmetrically. They can also be one way to graphically determine outliers in a data set.
A boxplot or box and whiskers plot provides a pictorial representation of the following statistics: maximum, 75th percentile, median (50th percentile), mean, 25th percentile and minimum.
In this first post I explore quartiles and boxplots using the TI-83 Plus Graphics Calculator and in later posts using Python and Excel.
It should be noted there is no universal agreement on selecting the quartile values. Indeed for each of the three ways I implement they use slightly different methods and these will be highlighted.
The concept of Quartiles is easy to understand; basically you take a sorted data set and divide it into four equal groups. The first quartile (Q1) is defined as the middle number between the smallest number and the median of the data set. The second quartile (Q2) is the median of the data. The third quartile (Q3) is the middle value between the median and the highest value of the data set.
For example, given X = 1, 2, 3, 4, 5, 6, 7
then Q1 = 2, Q2 (median) = 4, Q3 = 6
For an odd number of data-points such as Y = 1, 2, 3, 4, 5, 6, 7, 8
Q1 = (2+3)/2 = 2.5, Q2(median) = (4+5)/2 = 4.5, Q3 =(5+6)/2 = 6.5
The Interquartile Range (IQR) is defined as the difference between the Q3 and Q1 quartiles. So for the above two data-sets X and Y the IQRs are (6-2) and (6.5 – 2.5) or 4 and 4 respectively.
The IQR may be used to characterise the data when there may be extremities that skew the data. These would appear outside of ‘fences’ that can determined as follows:
Lower fence = Q1 – 1.5 (IQR)
Upper fence = Q3 + 1.5 (IQR)
A boxplot is a method for graphically depicting groups of numerical data through their quartiles and they may have lines (whiskers) extending vertically from the boxes indicating variability outside the upper and lower quartiles. Outliers may be plotted as individual points.
The TI-83 Graphing Calculator
Various summary stats for a batch of data are calculated and displayed using the 1-Var Stats. These include the mean and median, the quartiles and the min and max values.
For the above dataset above Y = 1, 2, 3, 4, 5, 6, 7 these are the steps to calculate the quartiles and display them with a boxplot using a data list in List L1 as an example
- From the STAT –> EDIT menu clear the required list if necessary using ClrList
- Enter the data-points in list L1, using STAT -> EDIT –> 1:Edit…
- Display the quartiles using the STAT-CALC menu selection 1: –Var Stats with the data list i.e 1: – Var Stats (2nd) [L1]
Note that it is not necessary to sort the data-points as this will be carried out automatically by the TI-83 for the purposes of the calculation.
To obtain a boxplot for the above you will need to use the five graphing keys situated in the top row of the keyboard. To setup the plot press (2nd)[STAT PLOT], then select Plot 1 and set it up as shown to use boxplot and List L1, ensuring the Type ‘boxplot’ is active.
To display the boxplot use the blue ZOOM key, and then scroll down to option 9: ZoomStat.
The resulting boxplot should then be displayed as shown.
Notice that the five statistical values are not displayed on the boxplot. This is where using the TRACE facility comes in. With the boxplot displayed press the TRACE button to see the following annotations.
P1:LI indicates that the point marked is on Plot 1 and represents data from L1. This feature is useful where you may be displaying more than a single boxplot.
A flashing cursor appears at the median of the boxplot and Med=4 is written at the bottom of the screen. As you move the right and left cursor keys this cursor will move across the quartiles and the extremes and display the corresponding values in the bottom of the screen.
The up and down cursor arrow will move the cursor between boxplots if there is more than one being displayed.
The TI-83 will automatically display the plots so that they fill the screen sensibly but you can change how the boxplot(s) are displayed via the WINDOW facility.
Xmin and Ymin are the values corresponding to the left and right-hand side of the screen. This can be a little confusing since the minX and minY are the extreme values of the list.
As alluded to in the intro, boxplots can be useful to graphically highlight outliers. These are shown on the screen as extending beyond the whisker line.
However, for this to happen the ModBoxplot (modified box plot) option must be selected in the StatPlot options as shown. As an exercise, take your chosen data list and add an outlier data point using the formula given above. Don’t forget to rerun the CALC- 1:1-Var Stats function again before displaying the boxplot using the GRAPH blue key. You may also need to adjust the screen Xmin and Xmax settings using the blue WINDOW key to get the outliers to show on the screen.
As a further exercise for the reader, add a data list in L2 say and display both boxplots on the screen. You will need to press the ZOOM key and zoomstat option to get both boxplots on the screen after you have added another data list.
Use the TRACE facility to move between the two boxplots and check for sensible values. You may also need to adjust the WINDOW values to get the best fit.
Having obtained boxplots on the Graphing Screen and using TRACE to display the statistics helps with transferring them to paper if required because you now know what they look like and what axis values to use.
In the next part we will look at how to create boxplots using Excel.