# VizWhiz 3

The `ggplot` fanciness we covered in Lesson 1 and Lesson 2 is great, but sometimes your boss/grant agency/publication outlet insists on a bar or column graph. Or perhaps you simply need to visualise two continuous variables. In this lesson, we will show you how to plot bar and column graphs with error bars and how to plot a scatter plot.

## Lesson Outcomes

By the end of the lesson, you should:

• 3.1 Understand the difference between `geom_bar` and `geom_col` and use them to plot frequency vs. summary data
• 3.2 Be able to use `summarise` to calculate standard error and `geom_errorbar` to add error bars to a plot
• 3.3 Be able to use `geom_point` to create a scatter plot of two continuous variables

# 3.1 Bars vs. columns

There are two geoms in ggplot that draw bar plots, `geom_bar`and `geom_col`.

When you want to plot frequency/count data and are happy to let R to do the counting autoamtically, use `geom_bar`. It only requires that you tell it what you want on the x axis, and it will put frequency on the y axis.

If you want the height of the bar to represent a value you have calculated, then use `geom_col`. For this geom, you need to tell it what you want for both the x axis and the y axis.

In this screencast, we’ll review:

• How to use `geom_bar` to plot count/frequency data
• How to combine `summarise` and `geom_col` to plot mean bug levels by year
• How to `group_by` more than one variable and use `facet_wrap` to plot mean bug levels by year, separately for each site

Here’s the plot for reference: Watch the video and then carry out the following steps:

1. Use `geom_bar` and `facet_wrap` to plot the number of readings that were taken each year, separately for each site
2. Use `group_by` and `summarise` with `geom_col` to plot the mean beach bug levels averaged across all the sites each year
3. Use `group_by` and `summarise` with `geom_col` to plot the mean beach bug levels each year, using `facet_wrap` to plot each site separately

# 3.2 Error bars

Of course, good practice dictats that you need error bars on those columns. Never fear! Using `summarise`, it is easy to calculate standard error.

In this screencast, we’ll review:

• How to use `summarise` to calculate the mean, standard deviation and standard error
• How to add a `geom_errorbar` layer to your plot to display the mean beach bugs data in a column graph with error bars

Here’s the plot for reference: Watch the video and then carry out the following steps:

1. Use `group_by` and `summarise` to calculate the mean, standard deviation, N, and standard error for each site
2. Pipe that `summarise` into a `geom_col` adding a `coorid_flip` and `geom_errorbar` layer

# 3.3 Scatter plots

Sometimes you want to visualise the relationship between two continuous variables using a scatterplot. Our original beachbugs dataset doesn’t include any interesting variables that might be correlated with the bacteria levels, so we have pulled in some weather data to see whether bacteria levels might be related to rainfall or temperature, or some combination of the two.

In this screencast, we’ll review:

• How to import the rain_temp_beachbugs.csv dataset into R
• How to use `geom_point` and `geom_smooth` to plot a scatter plot and best fit line
• How to use point color to illustrate potential interactions in your data

Here’s the plot for reference: Watch the video and then carry out the following steps:

1. use `read_csv` and `here` to read the rain_temp data into RStudio (need help? revisit Unit 1 Basic Basics Lesson 3: Loading Data
2. use `geom_point` to plot the relation between rainfall and beach bugs
3. `filter` the data for values more than 500 and add a `geom_smooth` layer to see a regression line
4. colour the points by the temperature variable