Making New Variables
What if the data you are really interested in isn’t in your dataframe yet? Perhaps you want to break data contained in one variable across many variables, or to combine data from several columns into one. Maybe you want to transform your data or compute difference scores.
In this lesson, we will continue to explore the sydneybeaches data, learning how to compute new variables using separate
, unite
, mutate
and other functions from dpylr
.
Lesson Outcomes
By the end of the lesson, you should be able to :
- use
separate
andunite
to create new variables in your data - use
mutate
to compute new variables (numeric and logical) - pipe
filter
,arrange
,group_by
, andmutate
together to accomplish a lot, with relatively few lines of code.
Use separate and unite to create new variables
We are going to cheat a little bit with the date column here. We will learn how to use the lubridate
package eventually, but for now, we can capitalise on the fact that R thinks our date column contains characters to practice splitting a single variable into several variables using the separate
function.
I ran a “Date Night” event for RLadies Sydney all about the lubridate
package. Here is a short video covering 3 super useful things that the lubridate package can do to make working with dates easier
In this screencast, we’ll review:
- How to
separate
the date column into day, month, year - How to
unite
data from the site and council columns to create a new variable called site_council
Your turn
Watch the video and then carry out the following steps:
- Split the date column into a day, month, and year column
- Combine the site and council columns into a single variable
Use mutate
to compute new variables
Sometimes the data you are most interested are not in your dataframe yet, you need to compute them. The mutate
function allows you to compute a new variable and add it to your dataframe.
In this screencast, we’ll review:
- How to use the
mutate
function to- transform your data
- compute numeric variables
- compute logical variables
Your turn
Watch the video and then carry out the following steps:
- Compute a variable that log transforms the beachbugs data
- Compute a variable that contains beachbugs difference scores
- Compute a variable that contains TRUE/FALSE according to whether each reading is greater than the mean bug levels
Pipe it all together
In Clean It Up Lesson 1 you learned about the pipe %>% - which can help you to string a whole series of wrangling functions together. To review, the pipe allows you to take your data, apply a function, take that output, apply another function, etc etc until you have added a series of new variables, all in a single chunk of code.
In this screencast, we’ll review:
- How to pipe together a sequence of
dplyr
functions and assign the output to a new object in your environment
Your turn
Watch the video and then create a new dataframe called cleanbeaches_new by piping together the following steps…
- Separate the date column into day, month, year
- Create a new column that contains the log transformed beach bugs data
- Create a new column that contains the difference scores
- Create a new column that contains a logical vector re whether each beachbug reading is higher than average
- Group_by site
- Create a new column that contains a logical vector re whether each beachbug reading is higher than average, for each site
Now have a go with your own data!
- Choose a variable in character format and
separate
it into several columns - Pick two character vectors, and combine them using the
unite
function - Use
mutate
to transform your data, compute a new numeric variable, and compute a new logical variable.
Next up - Clean It Up Lesson 4: Wide to Long