You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Before you begin wrangling data, you should be able to:
@@ -127,6 +127,19 @@ This follows the *tidy data* style, an approach to handling data in R that aims
127
127
The bundle of tidy-associated packages is called the `tidyverse`, and it's a 🔥 hot-topic 🔥 in the R world. In fact, `ggplot` is a package that you have already used that is part of the `tidyverse`! Most data wrangling problems can be solved with `tidy` or base (default) R functions. This can lead to some headaches for beginners, as there are multiple ways to accomplish the same thing!
128
128
:::
129
129
130
+
Review the below datasets. Given the above criteria, are they tidy? If not, write out in words what you would need to do. The first one is done as an example.
131
+
132
+
```{r}
133
+
library(tidyverse)
134
+
head(relig_income)
135
+
```
136
+
137
+
This data is not tidy because there are variables (income) in the columns. A tidy dataset would have three columns: religion, income, and number of respondents (n). We would need to pivot the data to create new columns called income and n.
138
+
139
+
```{r}
140
+
head(billboard)
141
+
```
142
+
130
143
### `dplyr` verbs
131
144
132
145
One of the most popular `tidyverse` packages, `dplyr`, offers a suite of helpful and readable functions for data manipulation. Let's get started with how it can help you see your data:
@@ -190,7 +203,22 @@ More information about functions like this can be found [here](https://r4ds.hadl
190
203
`dplyr` verbs work great as a team!
191
204
:::
192
205
193
-
Although these were basic examples, hopefully you feel a little more confident about working with vectors, and data frames using `dplyr` verbs to clean and manipulate data. Happy Wrangling!
206
+
Although these were basic examples, hopefully you feel a little more confident about working with vectors, and data frames using `dplyr` verbs to clean and manipulate data. Give some of them a try with the `billboard` dataset below. Happy Wrangling!
207
+
208
+
```{r}
209
+
# First, let's make this data set tidy :)
210
+
billboard2 <- billboard |>
211
+
pivot_longer(
212
+
wk1:wk76,
213
+
names_to = "week",
214
+
values_to = "rank",
215
+
values_drop_na = TRUE
216
+
)
217
+
```
218
+
219
+
1. Use `mutate()` to add a new column called `week_number` that is the week as integer (i.e. wk1 is 1)
220
+
2. Use `filter()` to get all the songs by Eve.
221
+
3. Use `mutate()` to add a new column called `year` with the year derived from the date in the column `date.entered`
194
222
195
223
### Functions on functions
196
224
@@ -334,3 +362,13 @@ x <- 10
334
362
335
363
To summarize, `%>%` is slightly more lenient than `|>` when it comes to the Placeholder operator, the Right Hand Side (RHS) and Anonymous functions.
336
364
:::::::::
365
+
366
+
Using the same `billboard2` dataset from above, try out using pipes on the following:
367
+
368
+
1. Use `filter()`, `group_by(),` and a `slice` function (read the documentation linked above to determine which one!) to create a new dataframe called `number_one_hits_2000` that has the top ranked song for each week from the year 2000.
369
+
370
+
<!---->
371
+
372
+
2. Use some of the same functions to create a new dataframe called `number_one_hits` that has the top ranked song for each week from *each year.*
373
+
3. What was the highest ranking Creed's "Higher" achieved?
374
+
4. Using `group_by()` and `summarize()` how many unique songs did Whitney Houston have on the charts?
Copy file name to clipboardExpand all lines: index.qmd
+2-4Lines changed: 2 additions & 4 deletions
Original file line number
Diff line number
Diff line change
@@ -7,15 +7,13 @@ We are an organization that hopes to make learning to program approachable, acce
7
7
This is our curriculum for learning R programming in the context of data analysis. Our curriculum development team has worked tirelessly to develop this new curriculum. We are constantly improving and updating our curricula, so if you're interested in contributing or have suggestions, please visit <https://howtolearntocode.web.unc.edu/> for our most up-to-date contact information. Feel free to submit an issue or pull request at <https://github.com/How-to-Learn-to-Code/Rclass-DataScience>.
0 commit comments