R for Lunch

Data wrangling with {dplyr}

John Little

Duke University Libraries

Center for Data & Visualization Sciences

2024-09-12

Today’s topics

  • Five essential {dplyr} data wrangling verbs

  • Data pipes inside code-chunks

Yesterday (video)

  • Import data

  • Tour of RStudio IDE

  • Coding notebooks (Quarto)

Housekeeping

  • Drew / Lauren / breakout rooms
  • CDVS
    • Themes
      • Data Management (Plans, Reproducibility, Repositories)

      • Data Science

      • Data Visualization

      • GIS and Spatial Analysis

      • Data Sources

Housekeeping continued

R for Lunch as a series

R for Lunch is a series that meets 8 times (till EOM Feb.) After today it will meet regularly on Thursdays at noon.

  • Sign-up for each workshop individually

  • Each episode has a unique zoom link

Eat your own dog food


Model how R can work for practical reproducible workflows

Pipes and Assignments

 

Operator Operator Name Keystroke Shortcuts Pnuemonic
<- assignment Alt-dash “Gets value from”

|>
or

%>%

pipe Ctrl-Shift-M “And then”

Tidyverse and Tidy data

Foundation

 

Tidyverse and Quarto is the most practical and well developed, reproducible, scientific analysis and publishing workflow available.

Tidy data1

Tidy data

  • Every row is a single observation
  • Every column is a variable
  • The cells are single data values

Wide data

Code
library(tidyverse)
library(gt)
library(gtExtras)

tidyr::relig_income |> 
  gt::gt_preview() |> 
  gtExtras::gt_theme_dark()
religion <$10k $10-20k $20-30k $30-40k $40-50k $50-75k $75-100k $100-150k >150k Don't know/refused
1 Agnostic 27 34 60 81 76 137 122 109 84 96
2 Atheist 12 27 37 52 35 70 73 59 74 76
3 Buddhist 27 21 30 34 33 58 62 39 53 54
4 Catholic 418 617 732 670 638 1116 949 792 633 1489
5 Don’t know/refused 15 14 15 11 10 35 21 17 18 116
6..17
18 Unaffiliated 217 299 374 365 341 528 407 321 258 597

Tall data

Code
relig_income |> 
  pivot_longer(cols = -religion, 
               names_to = "income_category", 
               values_to = "income") |> 
  gt::gt_preview() |> 
  gtExtras::gt_theme_dark()
religion income_category income
1 Agnostic <$10k 27
2 Agnostic $10-20k 34
3 Agnostic $20-30k 60
4 Agnostic $30-40k 81
5 Agnostic $40-50k 76
6..179
180 Unaffiliated Don't know/refused 597
Code
relig_income |> 
  pivot_longer(cols = -religion, 
               names_to = "income_category", 
               values_to = "income") |> 
  mutate(religion = fct_relevel(religion, "Evangelical Prot", "Mainline Prot", "Catholic", "Unaffiliated", "Historically Black Prot")) |> 
  mutate(income_category = fct_rev(as_factor(income_category))) |>
  ggplot(aes(income, income_category)) +
  geom_col(fill = "#eee8d5") +
  facet_wrap(vars(
    fct_other(
      religion, 
      keep = c("Evangelical Prot", "Mainline Prot", "Catholic", "Unaffiliated", "Historically Black Prot")))) +
  theme(plot.background = element_rect(fill = "#002b36"),
        text = element_text(color = "#eee8d5"),
        axis.text = element_text(color = "#eee8d5"), 
        panel.background = element_rect(fill = "#002b36"),
        panel.grid = element_line(color = "#002b36"),
        strip.background = element_rect(fill = "#7b9c9f"))

Code

 

relig_income |> 
  pivot_longer(cols = -religion, names_to = "income_category") |> 
  ggplot(aes(value, income_category)) +
  geom_col() +
  facet_wrap(vars(religion))

Image Credit: apreshill | CC BY 4.0 | https://github.com/apreshill/teachthat/blob/master/pivot/pivot_longer_smaller.gif]

Polls

dplyr

https://intro2r.library.duke.edu/wrangle.html

We are here to help

  • askData@duke.edu

  • https://library.duke.edu/data

  • https://is.gd/littleconsult

Let’s do it

Two things for today

Exercises

  1. https://intro2r.library.duke.edu/ > Exercises > Link out > Green Code button > Download ZIP

  2. Then, Unzip (i.e. Expand) the folder (on your local file system)

  3. Then, double click the rforlunch_exercises.Rproj file

  4. From RStudio the Files tab, open the 01_dplyr.qmd

    • The answer file is in the RStudio rforlunch_exercises project > Files Tab > Answers folder

Closing

Pipes and Assignments

 

Operator Operator Name Keystroke shortcuts Pnuemonic
<- assignment Alt-dash “Gets value from”

|>
or

%>%

pipe Ctrl-Shift-M “And then”

Citation management

 

RStudio > Quarto Notebook > Insert > Citation

Example DOI: 10.18637/jss.v059.i10

ai-paired coding

 

  • Data science concepts: Microsoft copilot (“More precise” setting)

  • Code completion: GitHub copilot and RStudio (IDE) or VSCode (IDE)

Bye for now