VizW(h)iz 2: Boxes, Violins, and Histograms

R-Ladies Sydney

2018/12/20

When you want to capture the distribution of your data in a plot, without getting too far away from the raw data, box and whisker plots, violin plots, and histograms are likely to be useful. In this lesson, we’re tackling how to creat these plots using various geom commands!

Lesson Outcomes

By the end of the lesson, you should:

    2.1 Be able to use geom_boxplot and geom_violin to plot the distribution of raw data     2.2 Be able to use geom_histogram to eyeball whether your data is normally distributed     2.3 Be able to layer more than one geom to gain extra insight about the distribution of your data

2.1 Boxes and violins

I don’t think I have used a box plot since primary school. In fact, I had to google what the lines on the box represent. Definitely check out the ggplot documentation here and ignore me when I try and convince you in the video that the interquartile range represents 75% of the data; it’s definitely 50%.

Boxplots are so 1980 anyway; boxplots are out and violin plots are in.

Image credit: https://xkcd.com/1967/

In this screencast, we’ll review:

Here’s the plot for reference:

Watch the video and then carry out the following steps:

  1. Use geom_boxplot to plot the log-transformed buglevels by site
  2. Use geom_violin to plot log-transformed buglevels by year
  3. Use filter to only plot buggier than average days and add a facet_wrap to look at the violin plots for each site separately
  4. What happens when you filter for buggier_all? Does that change your plot?
  5. Play around with colour and fill aesthetics. Do they work on the geom_boxplot too?

Helpful hint: You can find ggplot documentation about violin plots here

2.2 Histograms

Often the quickest way to get an idea of whether your data is normally distributed is to plot a histogram. Let’s learn how to do that.

In this screencast, we’ll review:

Here’s the plot for reference:

Watch the video and then carry out the following steps:

  1. Use base graphics to plot the log transformed beachbugs data in a histogram. Does that look better?
  2. Use geom_histogram to plot log-transformed buglevels for Clovelly in 2018
  3. Compare this plot to one that uses the raw rather than log-transformed data. What is the most appropriate bin_width for this raw data?

2.3 Combination plots

Each time you add a + to a ggplot, you are adding a layer, and there is no reason why those layers can’t be extra geoms!

In this screencast, we’ll review:

Here’s the plot for reference:

Watch the video and then carry out the following steps:

  1. Filter for days that are buggier than average and then plot the log transformed beach bugs values for each site by combining geom_boxplot and geom_point
  2. Use geom_violin to plot the log transformed beach bugs values and layer geom_points; this time try colouring by council

ggplot Inspo

Check out the results of a google image search for ‘ggplot violin’ here to get inspired!

Now, apply that inspiration to your own data! Don’t forget ggsave() from VizW(h)iz 1 so you can show others your fantastic outputs!

As per usual, Sydney-based R-Ladies are encouraged to share (and vent) at #ryouwithme_3_vizwhiz!

Now, we all know there are times when you need (read: are forced) to create boring bar or column plots! That’s what Lesson 3 is for! We also cover scatterplots, so all is not for naught! Head on to Lesson 3!