Clean It Up 3: Making New Variables

R-Ladies Sydney

2018/11/20

What if the data you are really interested in isn’t in your dataframe yet? Perhaps you want to break data contained in one variable across many variables, or to combine data from several columns into one. Maybe you want to transform your data or compute difference scores.

In this lesson, we will continue to explore the sydneybeaches data, learning how to make new variables in using separate, unite, mutate and other functions from dpylr.

Lesson Outcomes

By the end of the lesson, you should:

    3.1 Know how to use separate and unite to create new variables in your data     3.2 Know how to use mutate to compute new variables (numeric and logical)     3.3 Know how to pipe filter, arrange, group_by, and mutate together to accomplish a lot, with relatively few lines of code.

3.1 Use separate and unite to create new variables

We are going to cheat a little bit with the date column here. We will learn how to use the lubridate package eventually, but for now, we can capitalise on the fact that R thinks our date column contains characters to practice splitting a single variable into several variables using the separate function.

In this screencast, we’ll review:

Watch the video and then carry out the following steps:

  1. Split the date column into a day, month, and year column
  2. Combine the site and council columns into a single variable

3.2 Use mutate to compute new variables

Sometimes the data you are most interested are not in your dataframe yet, you need to computer them. The mutate function allows you to compute a new variable and add it to your dataframe.

In this screencast, we’ll review:

Watch the video and then carry out the following steps:

  1. Compute a variable that log transforms the beachbugs data
  2. Compute a variable that contains beachbugs difference scores
  3. Compute a variable that contains TRUE/FALSE according to whether each reading is greater than the mean bug levels

3.3 Pipe it all together

In Clean It Up Lesson 1 you learned about the pipe %>% - which can help you to string a whole series of wrangling functions together. To review, you can take your data, apply a function, take that output, apply another function, etc etc until you have added a series of new variables, all in a single chunk of code.

In this screencast, we’ll review:

Watch the video and then create a new dataframe called cleanbeaches_new by piping together the following steps…

  1. Separate the date column into day, month, year
  2. Create a new column that contains the log transformed beach bugs data
  3. Create a new column that contains the difference scores
  4. Create a new column that contains a logical vector re whether each beachbug reading is higher than average
  5. Group_by site
  6. Create a new column that contains a logical vector re whether each beachbug reading is higher than average, for each site

Now have a go with your own data!

Next up - Clean It Up Lesson 4: Wide to Long

Sydney-based R-Ladies - share your successes and any challenges you’ve faced in the #ryouwithme_2_cleaning Slack channel!