From July 2017

Article image

Quartiles & Boxplots (Part 2)

In  Part 1 of this series of posts I looked at Quartiles and their associated Boxplots using the TI-83 Graphing Calculator. In this second part I will explain how to display boxplots in Excel. Unless you are fortunate enough to have Excel 2016 or later then displaying boxplots involves a little jiggery-pokery to manipulate a stacked column chart to look like a boxplot. Essentially, you stack the four quartiles as a column and then make the top and bottom quartiles into error bars to give the familiar boxplot shape.

To demonstrate this I will use the following three data sets to produce three boxplots side-by-side for ease of comparison. I am using Excel 2010 here so depending on what your version of Excel the steps may be slightly different but the principle holds.

image

After entering these data sets the next step is to calculate the following which will be used for the sections of a stacked column chart. The yellow values giving the Q2 and Q3 sections of the boxplot and the orange values will become the error bars representing the Q1 and Q4 sections.

image

Min =MIN(B1:B8),
Q1-Min = QUARTILE(B1:B8,1)-B10
Q1 = QUARTILE(B1:B8,1)
Med-Q1 = MEDIAN(B1:B8)-QUARTILE(B1:B8,1)
Q3-Med = QUARTILE(B1:B8,3)-MEDIAN(B1:B8)
Max-Q3 = MAX(B1:B8)-QUARTILE(B1:B8,3)

It should be noted that there are two versions of the Quartile function in Excel, these being QUARTILE.INC or QUARTILE and QUARTILE.EXC. The interpolation method used to determine the Quartiles is slightly different.  For both methods Q2 will always be the same, because this is the median of the set. However, QUARTILE.EXC excludes the median in the interpolation and this results in quartiles that are further from the center of the set. On the other hand QUARTILE.INC includes the median, which results in more symmetrical quartiles but at the same time shrinks the Interquartile Range and so making it more difficult to identify outliers.

The next step is to generate the stacked column chart and for this select the highlighted cells and insert a STACKED COLUMN CHART.

imageimage

Now select the bottom section of each stacked column and under CHART TOOLS –>LAYOUT->ERROR BARS select ‘More Error Bar Options’.

imageimage

Make sure that the radio boxes are set as shown and click ‘Specify value’. Select the range B11:D11 (Q1-Min) for the NEGATIVE Error Value. The result should be something like this:

imageimage

Then removing gridlines and formatting the bottom sections to have no fill colour to give the result showing the bottom whiskers.

To add the top whiskers select the top sections of the stacked bars and add error bars as before but using the data values B15:D15 (Max-Q3) and under ‘more error bar options’ select ‘ Plus’ for the Display Direction  and in the ‘Specify Value’ select B15:D15 as the POSITIVE error Value.

image
Hit OK and then Close and the result should be the desired Boxplot as shown below.

image

Phew! That was quite a to-do getting all that in place but fortunately there is a much easier way (now he tells me) and I don’t mean just splashing out on Excel 2016. A nice gentlemen by the name of Charles Zaiontz has made a Statistics Data Analysis Tool freely available on his website and this incorporates the facility to easily generate boxplots in various versions of Excel. You will however need to install Excel’s Solver Add-in first. Follow these instructions if it hasn’t already previously been added.

Using this tool I produced the following boxplot and associated summary statistics with only two or three mouse clicks! This tool can also be used to plot outliers as well.

image

Far easier I think you will agree, but nevertheless I did find going through the process manually instructive.

In the third and final part of this series I will be looking at how to generate summary statistics and associated boxplots using Python.

facebooktwittergoogle_plusredditpinterestlinkedinmail

Quartiles and Boxplots (Part 1)

In statistics, quartiles and their graphical representations in boxplots is useful when comparing samples and testing whether data is distributed symmetrically. They can also be one way to graphically determine outliers in a data set.

A boxplot or  box and whiskers plot provides a pictorial representation of the following statistics: maximum, 75th percentile, median (50th percentile), mean, 25th percentile and minimum.

In this first post I explore quartiles and boxplots using the TI-83 Plus Graphics Calculator and in later posts using Python and  Excel.

It should be noted there is no universal agreement on selecting the quartile values.  Indeed for each of the three ways I implement they use slightly  different methods and these will be highlighted.

The concept of Quartiles is easy to understand; basically you take a sorted data set and divide it into four equal groups. The first quartile (Q1) is defined as the middle number between the smallest number and the median of the data set. The second quartile (Q2) is the median of the data. The third quartile (Q3) is the middle value between the median and the highest value of the data set.

For example, given X = 1, 2, 3, 4, 5, 6, 7

then Q1 = 2, Q2 (median) = 4, Q3 = 6

For an odd number of data-points such as Y = 1, 2, 3, 4, 5, 6, 7, 8

Q1 = (2+3)/2 = 2.5,  Q2(median) = (4+5)/2 = 4.5, Q3  =(5+6)/2 = 6.5

The Interquartile Range (IQR) is defined as the difference between the Q3 and Q1 quartiles. So for the above two data-sets X and Y the IQRs are (6-2)  and (6.5 – 2.5) or 4 and 4 respectively.

Outliers

The IQR may be used to characterise the data when there may be extremities that skew the data. These would appear outside of ‘fences’ that can determined as follows:

Lower fence = Q1 – 1.5 (IQR)

Upper fence = Q3 + 1.5 (IQR)

Boxplots

A boxplot is a method for graphically depicting groups of numerical data through their quartiles and they may have lines (whiskers) extending vertically from the boxes indicating variability outside the upper and lower quartiles. Outliers may be plotted as individual points.

boxplot

The TI-83 Graphing Calculator

Various summary stats for a batch of data are calculated and displayed using the 1-Var Stats. These include the mean and median, the quartiles and the min and max values.

For the above dataset above Y = 1, 2, 3, 4, 5, 6, 7 these are the steps to calculate the quartiles and display them with a boxplot using a data list in List L1 as an example

  1. From the STAT –> EDIT menu clear the required list if necessary using ClrList
    image
  2. Enter the data-points in list  L1, using STAT -> EDIT –> 1:Edit…
    imageimage
  3. Display the quartiles using the STAT-CALC menu selection 1: –Var Stats with the data list i.e 1: – Var Stats (2nd) [L1]
    image
    image
  4. Press ENTER and scroll down to the lower set of statistics and read off the quartiles Q1 = 2, Med = 4, Q3 = 6image

 

 

 

Note that it is not necessary to sort the data-points as this will be carried out automatically by the TI-83 for the purposes of the calculation.

To obtain a boxplot for the above you will need to use the five graphing keys situated in the top row of the keyboard. To setup the plot press (2nd)[STAT PLOT], then select Plot 1 and set it up as shown to use boxplot and List L1, ensuring the Type ‘boxplot’ is active.

imageimage

 

To display the boxplot use the blue ZOOM key, and then scroll down to option 9: ZoomStat.

image

The resulting boxplot should then be displayed as shown.

image

Notice that the five statistical values are not displayed on the boxplot. This is where using the TRACE facility comes in. With the boxplot displayed press the TRACE button to see the following annotations.

imageimage

P1:LI indicates that the point marked is on Plot 1 and represents data from L1. This feature is useful where you may be displaying more than a single boxplot.

A flashing cursor appears at the median of the boxplot and Med=4 is written at the bottom of the screen. As you move the right and left cursor keys this cursor will move across the quartiles and the extremes and display the corresponding values in the bottom of the screen.

The up and down cursor arrow will move the cursor between boxplots if there is more than one being displayed.

The TI-83 will automatically display the plots so that they fill the screen sensibly but you can change how the boxplot(s) are displayed via the WINDOW facility.

image

Xmin and Ymin are the values corresponding to the left and right-hand side of the screen. This can be a little confusing since the minX and minY are the extreme values of the list.

As alluded to in the intro, boxplots can be useful to graphically highlight outliers. These are shown on the screen as extending beyond the whisker line.

imageimage

However, for this to happen the ModBoxplot (modified box plot) option must be selected in the StatPlot options as shown.  As an exercise, take your chosen data list and add an outlier data point using the formula given above. Don’t forget to rerun the CALC- 1:1-Var Stats function again before displaying the boxplot using the GRAPH blue key. You may also need to adjust the screen Xmin and Xmax settings using the blue WINDOW key to get the outliers to show on the screen.

As a further exercise for the reader, add a data list in L2 say and display both boxplots on the screen. You will need to press the ZOOM key and zoomstat option to get both boxplots on the screen after you have added another data list.

Use the TRACE facility to move between the two boxplots and check for sensible values. You may also need to adjust the WINDOW values to get the best fit.

Having obtained boxplots on the Graphing Screen and using TRACE to display the statistics helps with transferring them to paper if required because you now know what they look like and what axis values to use.

In the next part we will look at how to create boxplots using Excel.

facebooktwittergoogle_plusredditpinterestlinkedinmail