Article image

Quartiles & Boxplots (Part 2)

In  Part 1 of this series of posts I looked at Quartiles and their associated Boxplots using the TI-83 Graphing Calculator. In this second part I will explain how to display boxplots in Excel. Unless you are fortunate enough to have Excel 2016 or later then displaying boxplots involves a little jiggery-pokery to manipulate a stacked column chart to look like a boxplot. Essentially, you stack the four quartiles as a column and then make the top and bottom quartiles into error bars to give the familiar boxplot shape.

To demonstrate this I will use the following three data sets to produce three boxplots side-by-side for ease of comparison. I am using Excel 2010 here so depending on what your version of Excel the steps may be slightly different but the principle holds.

image

After entering these data sets the next step is to calculate the following which will be used for the sections of a stacked column chart. The yellow values giving the Q2 and Q3 sections of the boxplot and the orange values will become the error bars representing the Q1 and Q4 sections.

image

Min =MIN(B1:B8),
Q1-Min = QUARTILE(B1:B8,1)-B10
Q1 = QUARTILE(B1:B8,1)
Med-Q1 = MEDIAN(B1:B8)-QUARTILE(B1:B8,1)
Q3-Med = QUARTILE(B1:B8,3)-MEDIAN(B1:B8)
Max-Q3 = MAX(B1:B8)-QUARTILE(B1:B8,3)

It should be noted that there are two versions of the Quartile function in Excel, these being QUARTILE.INC or QUARTILE and QUARTILE.EXC. The interpolation method used to determine the Quartiles is slightly different.  For both methods Q2 will always be the same, because this is the median of the set. However, QUARTILE.EXC excludes the median in the interpolation and this results in quartiles that are further from the center of the set. On the other hand QUARTILE.INC includes the median, which results in more symmetrical quartiles but at the same time shrinks the Interquartile Range and so making it more difficult to identify outliers.

The next step is to generate the stacked column chart and for this select the highlighted cells and insert a STACKED COLUMN CHART.

imageimage

Now select the bottom section of each stacked column and under CHART TOOLS –>LAYOUT->ERROR BARS select ‘More Error Bar Options’.

imageimage

Make sure that the radio boxes are set as shown and click ‘Specify value’. Select the range B11:D11 (Q1-Min) for the NEGATIVE Error Value. The result should be something like this:

imageimage

Then removing gridlines and formatting the bottom sections to have no fill colour to give the result showing the bottom whiskers.

To add the top whiskers select the top sections of the stacked bars and add error bars as before but using the data values B15:D15 (Max-Q3) and under ‘more error bar options’ select ‘ Plus’ for the Display Direction  and in the ‘Specify Value’ select B15:D15 as the POSITIVE error Value.

image
Hit OK and then Close and the result should be the desired Boxplot as shown below.

image

Phew! That was quite a to-do getting all that in place but fortunately there is a much easier way (now he tells me) and I don’t mean just splashing out on Excel 2016. A nice gentlemen by the name of Charles Zaiontz has made a Statistics Data Analysis Tool freely available on his website and this incorporates the facility to easily generate boxplots in various versions of Excel. You will however need to install Excel’s Solver Add-in first. Follow these instructions if it hasn’t already previously been added.

Using this tool I produced the following boxplot and associated summary statistics with only two or three mouse clicks! This tool can also be used to plot outliers as well.

image

Far easier I think you will agree, but nevertheless I did find going through the process manually instructive.

In the third and final part of this series I will be looking at how to generate summary statistics and associated boxplots using Python.

facebooktwittergoogle_plusredditpinterestlinkedinmail

Quartiles and Boxplots (Part 1)

In statistics, quartiles and their graphical representations in boxplots is useful when comparing samples and testing whether data is distributed symmetrically. They can also be one way to graphically determine outliers in a data set.

A boxplot or  box and whiskers plot provides a pictorial representation of the following statistics: maximum, 75th percentile, median (50th percentile), mean, 25th percentile and minimum.

In this first post I explore quartiles and boxplots using the TI-83 Plus Graphics Calculator and in later posts using Python and  Excel.

It should be noted there is no universal agreement on selecting the quartile values.  Indeed for each of the three ways I implement they use slightly  different methods and these will be highlighted.

The concept of Quartiles is easy to understand; basically you take a sorted data set and divide it into four equal groups. The first quartile (Q1) is defined as the middle number between the smallest number and the median of the data set. The second quartile (Q2) is the median of the data. The third quartile (Q3) is the middle value between the median and the highest value of the data set.

For example, given X = 1, 2, 3, 4, 5, 6, 7

then Q1 = 2, Q2 (median) = 4, Q3 = 6

For an odd number of data-points such as Y = 1, 2, 3, 4, 5, 6, 7, 8

Q1 = (2+3)/2 = 2.5,  Q2(median) = (4+5)/2 = 4.5, Q3  =(5+6)/2 = 6.5

The Interquartile Range (IQR) is defined as the difference between the Q3 and Q1 quartiles. So for the above two data-sets X and Y the IQRs are (6-2)  and (6.5 – 2.5) or 4 and 4 respectively.

Outliers

The IQR may be used to characterise the data when there may be extremities that skew the data. These would appear outside of ‘fences’ that can determined as follows:

Lower fence = Q1 – 1.5 (IQR)

Upper fence = Q3 + 1.5 (IQR)

Boxplots

A boxplot is a method for graphically depicting groups of numerical data through their quartiles and they may have lines (whiskers) extending vertically from the boxes indicating variability outside the upper and lower quartiles. Outliers may be plotted as individual points.

boxplot

The TI-83 Graphing Calculator

Various summary stats for a batch of data are calculated and displayed using the 1-Var Stats. These include the mean and median, the quartiles and the min and max values.

For the above dataset above Y = 1, 2, 3, 4, 5, 6, 7 these are the steps to calculate the quartiles and display them with a boxplot using a data list in List L1 as an example

  1. From the STAT –> EDIT menu clear the required list if necessary using ClrList
    image
  2. Enter the data-points in list  L1, using STAT -> EDIT –> 1:Edit…
    imageimage
  3. Display the quartiles using the STAT-CALC menu selection 1: –Var Stats with the data list i.e 1: – Var Stats (2nd) [L1]
    image
    image
  4. Press ENTER and scroll down to the lower set of statistics and read off the quartiles Q1 = 2, Med = 4, Q3 = 6image

 

 

 

Note that it is not necessary to sort the data-points as this will be carried out automatically by the TI-83 for the purposes of the calculation.

To obtain a boxplot for the above you will need to use the five graphing keys situated in the top row of the keyboard. To setup the plot press (2nd)[STAT PLOT], then select Plot 1 and set it up as shown to use boxplot and List L1, ensuring the Type ‘boxplot’ is active.

imageimage

 

To display the boxplot use the blue ZOOM key, and then scroll down to option 9: ZoomStat.

image

The resulting boxplot should then be displayed as shown.

image

Notice that the five statistical values are not displayed on the boxplot. This is where using the TRACE facility comes in. With the boxplot displayed press the TRACE button to see the following annotations.

imageimage

P1:LI indicates that the point marked is on Plot 1 and represents data from L1. This feature is useful where you may be displaying more than a single boxplot.

A flashing cursor appears at the median of the boxplot and Med=4 is written at the bottom of the screen. As you move the right and left cursor keys this cursor will move across the quartiles and the extremes and display the corresponding values in the bottom of the screen.

The up and down cursor arrow will move the cursor between boxplots if there is more than one being displayed.

The TI-83 will automatically display the plots so that they fill the screen sensibly but you can change how the boxplot(s) are displayed via the WINDOW facility.

image

Xmin and Ymin are the values corresponding to the left and right-hand side of the screen. This can be a little confusing since the minX and minY are the extreme values of the list.

As alluded to in the intro, boxplots can be useful to graphically highlight outliers. These are shown on the screen as extending beyond the whisker line.

imageimage

However, for this to happen the ModBoxplot (modified box plot) option must be selected in the StatPlot options as shown.  As an exercise, take your chosen data list and add an outlier data point using the formula given above. Don’t forget to rerun the CALC- 1:1-Var Stats function again before displaying the boxplot using the GRAPH blue key. You may also need to adjust the screen Xmin and Xmax settings using the blue WINDOW key to get the outliers to show on the screen.

As a further exercise for the reader, add a data list in L2 say and display both boxplots on the screen. You will need to press the ZOOM key and zoomstat option to get both boxplots on the screen after you have added another data list.

Use the TRACE facility to move between the two boxplots and check for sensible values. You may also need to adjust the WINDOW values to get the best fit.

Having obtained boxplots on the Graphing Screen and using TRACE to display the statistics helps with transferring them to paper if required because you now know what they look like and what axis values to use.

In the next part we will look at how to create boxplots using Excel.

facebooktwittergoogle_plusredditpinterestlinkedinmail
WordPress logo

WordPress Workflow with VVV & Variable VVV

Lately I have been looking at WordPress theme design and I came across Varying Vagrant Vagrants (VVV). Despite the rather strange tongue-twister of a name this is a neat environment for developing with WordPress. Up to this point I have been doing my WordPress development using XAMPP as a local server but it appears that this is now rather an old hat way of doing things. Apparently, those in the ‘know’ use the likes of VVV for the many added benefits it brings. However when researching how to go about implementing VVV on Windows 10 I couldn’t find all the information in a single place. Therefore I’ve put this post together for anybody else for which this is new territory, though I might add its raison d’etre is that these days I tend to forget things rather more quickly than I would like and so this post serves as a crib-sheet more than anything else, but if it saves any head scratching for others than that’s just a bonus. In this first post I will cover the installation of the prerequisite software and in subsequent posts I will detail how to use VVV and associated tools. Note that the following is specifically for Windows but it is a similar procedure for other OS.

So, what exactly is Vagrant and VVV and how will it help you develop WP themes and plugins more efficiently? The benefits of using Vagrant as a developer, is that it will isolate dependencies and their configuration within a single disposable, consistent environment. While VVV allows you to run many different sites on one VVV configuration. It’s a great environment to develop themes and plugins on, and it makes contributing to WordPress core easier.  VVV installs a lot of components. git, composer, wp-cli, grunt-cli, nodeJs and a lot of other tools to help you work faster and smarter. It  also includes a lot of debugging and profiling tools:  Webgrind, Opcache Status and Xdebug, it also uses nginx by default.

The prerequisite software required to run VVV is VirtualBox (other VM software also supported, such as VMWare Fusion  or Hyper-V) to provide a VM sandboxed from your local environment and Vagrant, which is a tool for building and distributing development environments. Start off by installing VirtualBox first and then Vagrant .  Installation is straightforward, just follow the install prompts. Check that everything installed correctly by running vagrant –v from the command line which will show the version number.

You may also want to install the vagrant-hostsupdater plugin at this point. This will make things easier by adding the necessary redirects to your hosts file, otherwise you will need to manually add these. This plugin updates your hosts file when you execute vagrant up, reload and resume.

  • vagrant plugin install vagrant-hostsupdater

Next install Varying Vagrant Vagrants by cloning into a directory named vagrant-local, i.e. c:/vagrant-local/:

  • git clone git://github.com/Varying-Vagrant-Vagrants/VVV.git vagrant-local

Finally, install Variable VVV, which is a site wizard that makes it easy to create, delete, and list your WordPress sites. It also supports site creation with different options via site blueprints – a great time saver.

  • git clone https://github.com/bradp/vv.git

Once installed you now need to add this directory to your PATH variable. In Windows 10 the quickest way to do this:

Right click start menu –>System->Advanced->System Settings->Environment Variables->Edit path variable.

Just to recap we installed the required software in this order:

  1. VirtualBox into default install directory
  2. Vagrant into default install directory
  3. Vagrant-hostupdater plugin (optional)
  4. Vagrant-triggers plugin (optional  – see https://github.com/emyl/vagrant-triggers)
  5. Varying-Vagrant-Vagrants (VVV) into c:\vagrant-local
  6. Variable VVV into c:\vv
  7. Added c:\vv to the PATH variable

We are now ready to fire up all the software and start flexing it – in a command prompt cd into c:/vagrant-local and start the Vagrant environment with vagrant up.
This initial vagrant up may take quite a while to do its stuff as it has to download all the required files – subsequent runs should be considerably quicker. You may see a few warnings that the remote connection is not connected, this is normal. However, should the operation time-out after several warnings and Vagrant is not able to connect to the virtual machine then you may need to check that you have Intel virtualization enabled in your BIOS/UEFI settings. This was the case on both PCs that I installed this software on. You may also see a warning to this effect in the Preview windows of the VirtualBox Manager. Another reason being that these programs are running under different privilege levels – it may be that the ‘hosts’ file requires admin privileges. This can readily be checked by opening them as an ‘administrator’ if you trust the software.

image

You can check the status at any time with the command vagrant status or halt it with vagrant halt

Now open up a browser and navigate to vvv.dev* and if everything is working you will see all the default sites listed.

image

In subsequent posts I will be using Variable VVV to create and delete sites; adding a blueprint for creating new sites and exploring WP-CLI. Oh, and just to make life a little bit easier still we will add a VVV dashboard .

*It appears that Google have snaffled the .dev gTLD for themselves so this will change at some point. ‘Do no evil ‘– yea, right!

facebooktwittergoogle_plusredditpinterestlinkedinmail