Making New Variables

What if the data you are really interested in isn’t in your dataframe yet? Perhaps you want to break data contained in one variable across many variables, or to combine data from several columns into one. Maybe you want to transform your data or compute difference scores.

In this lesson, we will continue to explore the sydneybeaches data, learning how to compute new variables using separate, unite, mutate and other functions from dpylr.

Lesson Outcomes

By the end of the lesson, you should be able to :

use separate and unite to create new variables in your data
use mutate to compute new variables (numeric and logical)
pipe filter, arrange, group_by, and mutate together to accomplish a lot, with relatively few lines of code.

Use separate and unite to create new variables

We are going to cheat a little bit with the date column here. We will learn how to use the lubridate package eventually, but for now, we can capitalise on the fact that R thinks our date column contains characters to practice splitting a single variable into several variables using the separate function.

Tip

I ran a “Date Night” event for RLadies Sydney all about the lubridate package. Here is a short video covering 3 super useful things that the lubridate package can do to make working with dates easier

In this screencast, we’ll review:

How toseparate the date column into day, month, year
How to unite data from the site and council columns to create a new variable called site_council

Your turn

Watch the video and then carry out the following steps:

Split the date column into a day, month, and year column
Combine the site and council columns into a single variable

Use `mutate` to compute new variables

Sometimes the data you are most interested are not in your dataframe yet, you need to compute them. The mutate function allows you to compute a new variable and add it to your dataframe.

In this screencast, we’ll review:

How to use the mutate function to
- transform your data
- compute numeric variables
- compute logical variables

Your turn

Watch the video and then carry out the following steps:

Compute a variable that log transforms the beachbugs data
Compute a variable that contains beachbugs difference scores
Compute a variable that contains TRUE/FALSE according to whether each reading is greater than the mean bug levels

Pipe it all together

In Clean It Up Lesson 1 you learned about the pipe %>% - which can help you to string a whole series of wrangling functions together. To review, the pipe allows you to take your data, apply a function, take that output, apply another function, etc etc until you have added a series of new variables, all in a single chunk of code.

In this screencast, we’ll review:

How to pipe together a sequence of dplyr functions and assign the output to a new object in your environment

Your turn

Watch the video and then create a new dataframe called cleanbeaches_new by piping together the following steps…

Separate the date column into day, month, year
Create a new column that contains the log transformed beach bugs data
Create a new column that contains the difference scores
Create a new column that contains a logical vector re whether each beachbug reading is higher than average
Group_by site
Create a new column that contains a logical vector re whether each beachbug reading is higher than average, for each site

Now have a go with your own data!

Choose a variable in character format and separate it into several columns
Pick two character vectors, and combine them using the unite function
Use mutate to transform your data, compute a new numeric variable, and compute a new logical variable.

Next up - Clean It Up Lesson 4: Wide to Long

Making New Variables

Lesson Outcomes

Use separate and unite to create new variables

Your turn

Use mutate to compute new variables

Your turn

Pipe it all together

Your turn

Now have a go with your own data!

Use `mutate` to compute new variables