In Part 1 of this series of posts I looked at Quartiles and their associated Boxplots using the TI-83 Graphing Calculator. In this second part I will explain how to display boxplots in Excel. Unless you are fortunate enough to have Excel 2016 or later then displaying boxplots involves a little jiggery-pokery to manipulate a stacked column chart to look like a boxplot. Essentially, you stack the four quartiles as a column and then make the top and bottom quartiles into error bars to give the familiar boxplot shape.
To demonstrate this I will use the following three data sets to produce three boxplots side-by-side for ease of comparison. I am using Excel 2010 here so depending on what your version of Excel the steps may be slightly different but the principle holds.
After entering these data sets the next step is to calculate the following which will be used for the sections of a stacked column chart. The yellow values giving the Q2 and Q3 sections of the boxplot and the orange values will become the error bars representing the Q1 and Q4 sections.
Q1-Min = QUARTILE(B1:B8,1)-B10
Q1 = QUARTILE(B1:B8,1)
Med-Q1 = MEDIAN(B1:B8)-QUARTILE(B1:B8,1)
Q3-Med = QUARTILE(B1:B8,3)-MEDIAN(B1:B8)
Max-Q3 = MAX(B1:B8)-QUARTILE(B1:B8,3)
It should be noted that there are two versions of the Quartile function in Excel, these being QUARTILE.INC or QUARTILE and QUARTILE.EXC. The interpolation method used to determine the Quartiles is slightly different. For both methods Q2 will always be the same, because this is the median of the set. However, QUARTILE.EXC excludes the median in the interpolation and this results in quartiles that are further from the center of the set. On the other hand QUARTILE.INC includes the median, which results in more symmetrical quartiles but at the same time shrinks the Interquartile Range and so making it more difficult to identify outliers.
The next step is to generate the stacked column chart and for this select the highlighted cells and insert a STACKED COLUMN CHART.
Now select the bottom section of each stacked column and under CHART TOOLS –>LAYOUT->ERROR BARS select ‘More Error Bar Options’.
Make sure that the radio boxes are set as shown and click ‘Specify value’. Select the range B11:D11 (Q1-Min) for the NEGATIVE Error Value. The result should be something like this:
Then removing gridlines and formatting the bottom sections to have no fill colour to give the result showing the bottom whiskers.
To add the top whiskers select the top sections of the stacked bars and add error bars as before but using the data values B15:D15 (Max-Q3) and under ‘more error bar options’ select ‘ Plus’ for the Display Direction and in the ‘Specify Value’ select B15:D15 as the POSITIVE error Value.
Phew! That was quite a to-do getting all that in place but fortunately there is a much easier way (now he tells me) and I don’t mean just splashing out on Excel 2016. A nice gentlemen by the name of Charles Zaiontz has made a Statistics Data Analysis Tool freely available on his website and this incorporates the facility to easily generate boxplots in various versions of Excel. You will however need to install Excel’s Solver Add-in first. Follow these instructions if it hasn’t already previously been added.
Using this tool I produced the following boxplot and associated summary statistics with only two or three mouse clicks! This tool can also be used to plot outliers as well.
Far easier I think you will agree, but nevertheless I did find going through the process manually instructive.
In the third and final part of this series I will be looking at how to generate summary statistics and associated boxplots using Python.