Box and Whisker Plot (2024)

Overview #

A box and whisker plot (aka boxplot) is a way to show data distribution.

The dividing lines along the “box” part of box and whisker plot typically represent the median (the middle observation in a sequentially sorted dataset), the upper quartile (the observation that is the middle point of the upper half of the dataset), and the lower quarter (the observation that is the middle in the lower half of the dataset).

The “box” part captures what is known as the interquartile range of the dataset.

The “whisker” part usually extend out to some multiple of the calculated interquartile range, usually 1.5x the interquartile range.

Outliers beyond the extreme ends of the whiskers are typically represented as individual points.

Box and Whisker Plot (1)

For instance, in the sequence 1, 2, 5, 7, 9:

  • 5 is the median
  • 7 is the upper quartile
  • 2 is the lower quartile
  • 1 is the lower end of one of the whiskers
  • 9 is the end of the the other whisker
  • The interquartile range is the difference between 7 (the upper quartile) and 2 (the lower quartile), or 5 (7-2).

Each group of data is shown within its own box and whisker block. On a single plot, there can be many groups of data shown.

Advantages #

A box and whisker plot is extremely simple when compared to something like a histogram or a density plot.

In fact, the concept underlying a box and whisker plot lends itself well to simplification. Edward Tufte takes the simplification to an extreme by reducing the classic box and whisker plot further to line-dot-line plot (The Visual Display of Quantitative Information, p. 123-124), or what he refers to as a quartile plot.

Box and Whisker Plot (2)

Disadvantages #

Due to the simplification in representation of a box and whisker plot, a lot of the underlying detail is lost. This may be a bad thing depending on the context.

Data #

At a very minimum, a box and whisker plot requires one continuous numerical data field.

continuous
1
2
5
7
9

A discrete categorical variable can be added to enable the display of separate box and whisker plots for different groups of data.

continuousgroup
1A
2A
5A
7A
9A
2B
4B
5B
8B

R #

Box and whisker plots can be rendered in R using the base R language and with ggplot2.

Base R #

In base R, a simple boxplot can be generated using the boxplot(x, data) command, where x refers to a formula that specifies what goes into the boxplot and is of the form continuous~group, and data refers to the source dataframe.

example_dat
## # A tibble: 9 × 2## continuous group## <dbl> <chr>## 1 1 A ## 2 2 A ## 3 5 A ## 4 7 A ## 5 9 A ## 6 2 B ## 7 4 B ## 8 5 B ## 9 8 B
boxplot(continuous~group, data = example_dat)

Box and Whisker Plot (3)

ggplot2 #

The ggplot2 package can also be used to generate more refined box and whisker plots.

library(ggplot2)

A basic box and whisker plot using the synthetic data from above:

ggplot(example_dat) + geom_boxplot( aes( x = group, y = continuous ) )

Box and Whisker Plot (4)

Great! We now have a box and whisker plot in ggplot2, but that’s not really stretching the potential of the ggplot2 package. Let’s challenge ourselves a bit.

Let’s try making another, more sophisticated plot using the built in sample iris dataset.

# generate a preview of the iris dataset, limited to 10 recordshead(iris, 10) %>% kable()
Sepal.LengthSepal.WidthPetal.LengthPetal.WidthSpecies
5.13.51.40.2setosa
4.93.01.40.2setosa
4.73.21.30.2setosa
4.63.11.50.2setosa
5.03.61.40.2setosa
5.43.91.70.4setosa
4.63.41.40.3setosa
5.03.41.50.2setosa
4.42.91.40.2setosa
4.93.11.50.1setosa

There’s a Species categorical field, and a few other continuous numerical fields. For simplicity, let’s pick one numerical field - Sepal.Length.

ggplot(data = iris) + geom_boxplot( aes( x = Species, y = Sepal.Length ) )

Box and Whisker Plot (5)

I think we can do better.

ggplot(data = iris) + geom_boxplot( aes( x = Species, y = Sepal.Length, fill = Species # color the boxes by species ) ) + coord_flip() + # turn it sideways labs( # give the plot some labels title = "Box and whisker plot of Iris Species", x = "Species", y = "Sepal Length" ) + theme( legend.position = "none" # remove the legend since it doesn't really convey any real useful information )

Box and Whisker Plot (6)

Let’s enhance that even more by adding the individual data points. We’ll use the geom_jitter() function in ggplot2 for the points to give the point positions some random variation.

ggplot( data = iris, aes( # note that the aes() aesthetic mappings were moved out from geom_boxplot() to ggplot(). This is now being shared across other mappings, namely geom_jitter() x = Species, y = Sepal.Length, )) + geom_boxplot( aes( fill = Species # fill the boxes with by species ), alpha = .5 # make the box and whisker plots semi-transparent ) + geom_jitter( aes( color = Species, # color the points by species alpha = .9 ) ) + coord_flip() + # turn it sideways labs( # give the plot some labels title = "Box and whisker plot of Iris Species", x = "Species", y = "Sepal Length" ) + theme( legend.position = "none" # remove the legend since it doesn't really convey any real useful information )

Box and Whisker Plot (7)

This is still a fairly basic plot, but it’s much richer in detail than what we started with.

Box and Whisker Plot (2024)

References

Top Articles
Hyperthyroidism Diet
Perfect Copycat Chick Fil A Nuggets - Little Sunny Kitchen
Rosy Boa Snake — Turtle Bay
No Hard Feelings Showtimes Near Metropolitan Fiesta 5 Theatre
Unit 30 Quiz: Idioms And Pronunciation
Atvs For Sale By Owner Craigslist
7543460065
Chase Claypool Pfr
Sunday World Northern Ireland
Bernie Platt, former Cherry Hill mayor and funeral home magnate, has died at 90
Savage X Fenty Wiki
Maxpreps Field Hockey
What’s the Difference Between Cash Flow and Profit?
Gfs Rivergate
OSRS Dryness Calculator - GEGCalculators
Nalley Tartar Sauce
Haunted Mansion Showtimes Near Millstone 14
Gdp E124
History of Osceola County
Inside the life of 17-year-old Charli D'Amelio, the most popular TikTok star in the world who now has her own TV show and clothing line
Dumb Money, la recensione: Paul Dano e quel film biografico sul caso GameStop
Icommerce Agent
Weepinbell Gen 3 Learnset
Stoney's Pizza & Gaming Parlor Danville Menu
Winco Employee Handbook 2022
Plaza Bonita Sycuan Bus Schedule
Ontdek Pearson support voor digitaal testen en scoren
Defending The Broken Isles
California Online Traffic School
Fiona Shaw on Ireland: ‘It is one of the most successful countries in the world. It wasn’t when I left it’
4 Times Rihanna Showed Solidarity for Social Movements Around the World
55Th And Kedzie Elite Staffing
Ocala Craigslist Com
Joann Fabrics Lexington Sc
Viduthalai Movie Download
Franklin Villafuerte Osorio
R3Vlimited Forum
Fedex Walgreens Pickup Times
House Of Budz Michigan
Scanning the Airwaves
Planet Fitness Santa Clarita Photos
Mid America Clinical Labs Appointments
Scarlet Maiden F95Zone
Wunderground Orlando
R: Getting Help with R
My Gsu Portal
Tyco Forums
Learn4Good Job Posting
Strawberry Lake Nd Cabins For Sale
Turning Obsidian into My Perfect Writing App – The Sweet Setup
WHAT WE CAN DO | Arizona Tile
Morgan State University Receives $20.9 Million NIH/NIMHD Grant to Expand Groundbreaking Research on Urban Health Disparities
Latest Posts
Article information

Author: Gov. Deandrea McKenzie

Last Updated:

Views: 6441

Rating: 4.6 / 5 (46 voted)

Reviews: 93% of readers found this page helpful

Author information

Name: Gov. Deandrea McKenzie

Birthday: 2001-01-17

Address: Suite 769 2454 Marsha Coves, Debbieton, MS 95002

Phone: +813077629322

Job: Real-Estate Executive

Hobby: Archery, Metal detecting, Kitesurfing, Genealogy, Kitesurfing, Calligraphy, Roller skating

Introduction: My name is Gov. Deandrea McKenzie, I am a spotless, clean, glamorous, sparkling, adventurous, nice, brainy person who loves writing and wants to share my knowledge and understanding with you.