# CleanItUp 3

# Making New Variables

What if the data you are really interested in isn’t in your dataframe yet? Perhaps you want to break data contained in one variable across many variables, or to combine data from several columns into one. Maybe you want to transform your data or compute difference scores.

In this lesson, we will continue to explore the sydneybeaches data, learning how to make new variables in using `separate`

, `unite`

, `mutate`

and other functions from `dpylr`

.

## Lesson Outcomes

By the end of the lesson, you should:

- 3.1 Know how to use
`separate`

and`unite`

to create new variables in your data - 3.2 Know how to use
`mutate`

to compute new variables (numeric and logical) - 3.3 Know how to pipe
`filter`

,`arrange`

,`group_by`

, and`mutate`

together to accomplish a lot, with relatively few lines of code.

## 3.1 Use separate and unite to create new variables

We are going to cheat a little bit with the date column here. We will learn how to use the `lubridate`

package eventually, but for now, we can capitalise on the fact that R thinks our date column contains characters to practice splitting a single variable into several variables using the `separate`

function.

In this screencast, we’ll review:

- How to separate the date column into day, month, year
- How to unite data from the site and council columns to create a new variable called site_council

Watch the video and then carry out the following steps:

- Split the date column into a day, month, and year column
- Combine the site and council columns into a single variable

## 3.2 Use `mutate`

to compute new variables

Sometimes the data you are most interested are not in your dataframe yet, you need to compute them. The `mutate`

function allows you to compute a new variable and add it to your dataframe.

In this screencast, we’ll review:

- How to use the
`mutate`

function to- transform your data
- compute numeric variables
- compute logical variables

Watch the video and then carry out the following steps:

- Compute a variable that log transforms the beachbugs data
- Compute a variable that contains beachbugs difference scores
- Compute a variable that contains TRUE/FALSE according to whether each reading is greater than the mean bug levels

## 3.3 Pipe it all together

In Clean It Up Lesson 1 you learned about the pipe %>% - which can help you to string a whole series of wrangling functions together. To review, you can take your data, apply a function, take that output, apply another function, etc etc until you have added a series of new variables, all in a single chunk of code.

In this screencast, we’ll review:

- How to pipe together a sequence of dplyr functions and assign the output to a new object in your environment

Watch the video and then create a new dataframe called cleanbeaches_new by piping together the following steps…

- Separate the date column into day, month, year
- Create a new column that contains the log transformed beach bugs data
- Create a new column that contains the difference scores
- Create a new column that contains a logical vector re whether each beachbug reading is higher than average
- Group_by site
- Create a new column that contains a logical vector re whether each beachbug reading is higher than average, for each site

## Now have a go with **your own data!**

- Choose a variable in character format and separate it into several columns
- Pick two character vectors, and combine them using the unite function
- Use mutate to transform your data, compute a numeric variable, and compute a new logical variable.

Next up - Clean It Up Lesson 4: Wide to Long

**Sydney-based R-Ladies** - share your successes and any challenges you’ve faced in the #ryouwithme_2_cleaning Slack channel!