Skip to content

Commit 763016c

Browse files
updating class 5
1 parent e79ad74 commit 763016c

File tree

2 files changed

+42
-6
lines changed

2 files changed

+42
-6
lines changed

class5.qmd

Lines changed: 40 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ format:
1919

2020
- Understand what tidy data is and what it looks like
2121

22-
- Understand piping basics
22+
- Understand piping basics: `mutate()`, `filter()`, `group_by()`, and `summarize()`
2323

2424
::: {.callout-note title="Measure twice, cut once"}
2525
Before you begin wrangling data, you should be able to:
@@ -127,6 +127,19 @@ This follows the *tidy data* style, an approach to handling data in R that aims
127127
The bundle of tidy-associated packages is called the `tidyverse`, and it's a 🔥 hot-topic 🔥 in the R world. In fact, `ggplot` is a package that you have already used that is part of the `tidyverse`! Most data wrangling problems can be solved with `tidy` or base (default) R functions. This can lead to some headaches for beginners, as there are multiple ways to accomplish the same thing!
128128
:::
129129

130+
Review the below datasets. Given the above criteria, are they tidy? If not, write out in words what you would need to do. The first one is done as an example.
131+
132+
```{r}
133+
library(tidyverse)
134+
head(relig_income)
135+
```
136+
137+
This data is not tidy because there are variables (income) in the columns. A tidy dataset would have three columns: religion, income, and number of respondents (n). We would need to pivot the data to create new columns called income and n.
138+
139+
```{r}
140+
head(billboard)
141+
```
142+
130143
### `dplyr` verbs
131144

132145
One of the most popular `tidyverse` packages, `dplyr`, offers a suite of helpful and readable functions for data manipulation. Let's get started with how it can help you see your data:
@@ -190,7 +203,22 @@ More information about functions like this can be found [here](https://r4ds.hadl
190203
`dplyr` verbs work great as a team!
191204
:::
192205

193-
Although these were basic examples, hopefully you feel a little more confident about working with vectors, and data frames using `dplyr` verbs to clean and manipulate data. Happy Wrangling!
206+
Although these were basic examples, hopefully you feel a little more confident about working with vectors, and data frames using `dplyr` verbs to clean and manipulate data. Give some of them a try with the `billboard` dataset below. Happy Wrangling!
207+
208+
```{r}
209+
# First, let's make this data set tidy :)
210+
billboard2 <- billboard |>
211+
pivot_longer(
212+
wk1:wk76,
213+
names_to = "week",
214+
values_to = "rank",
215+
values_drop_na = TRUE
216+
)
217+
```
218+
219+
1. Use `mutate()` to add a new column called `week_number` that is the week as integer (i.e. wk1 is 1)
220+
2. Use `filter()` to get all the songs by Eve.
221+
3. Use `mutate()` to add a new column called `year` with the year derived from the date in the column `date.entered`
194222

195223
### Functions on functions
196224

@@ -334,3 +362,13 @@ x <- 10
334362

335363
To summarize, `%>%` is slightly more lenient than `|>` when it comes to the Placeholder operator, the Right Hand Side (RHS) and Anonymous functions.
336364
:::::::::
365+
366+
Using the same `billboard2` dataset from above, try out using pipes on the following:
367+
368+
1. Use `filter()`, `group_by(),` and a `slice` function (read the documentation linked above to determine which one!) to create a new dataframe called `number_one_hits_2000` that has the top ranked song for each week from the year 2000.
369+
370+
<!-- -->
371+
372+
2. Use some of the same functions to create a new dataframe called `number_one_hits` that has the top ranked song for each week from *each year.*
373+
3. What was the highest ranking Creed's "Higher" achieved?
374+
4. Using `group_by()` and `summarize()` how many unique songs did Whitney Houston have on the charts?

index.qmd

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,15 +7,13 @@ We are an organization that hopes to make learning to program approachable, acce
77
This is our curriculum for learning R programming in the context of data analysis. Our curriculum development team has worked tirelessly to develop this new curriculum. We are constantly improving and updating our curricula, so if you're interested in contributing or have suggestions, please visit <https://howtolearntocode.web.unc.edu/> for our most up-to-date contact information. Feel free to submit an issue or pull request at <https://github.com/How-to-Learn-to-Code/Rclass-DataScience>.
88

99
| Class Day | Topic | Link |
10-
|:----------------:|:----------------------------:|:----------------------:|
10+
|:-----------------:|:----------------------------:|:----------------------:|
1111
| 0 | Welcome to How to Learn to Code! | [Introduction](class0.qmd) |
1212
| 1 | R Coding Basics | [Coding Basics 1](class1.qmd) |
1313
| 2 | Applying Coding Basics | [Coding Basics 2](class2.qmd) |
1414
| 3 | Let's Get Plotting! | [Data Visualization 1](class3.qmd) |
1515
| 4 | Applying Visualization Methods | [Data Vizualization 2](class4.qmd) |
1616
| 5 | Data Wrangling Basics | [Data Wrangling 1](class5.qmd) |
17-
| 6 | Data Wrangling with Real Experimental Data | [Data Wrangling 2](class6.qmd) |
17+
| 6 | Applying Data Wrangling Basics | [Data Wrangling 2](class6.qmd) |
1818
| 7 | Practicing on Real World Data | [Project 1](class7.qmd) |
1919
| 8 | Bonus Lessons | [Bonus Lessons](Extras.qmd) |
20-
21-
: Table of Contents

0 commit comments

Comments
 (0)