R for Lunch

Import data and install RStudio / Tidyverse

John Little

Duke University Libraries

Center for Data & Visualization Sciences

2025-01-14

Today’s topics

How to import data
Tour of RStudio IDE
Coding notebooks

Preceded by where to download RStudio and R

Housekeeping

Drew / Lauren / breakout rooms
CDVS
- Themes
  - Data Management (Plans, Reproducibility, Repositories)
  - Data Science
  - Data Visualization
  - GIS and Spatial Analysis
  - Data Sources

Housekeeping continued

Website - https://library.duke.edu/data
Workshops
- https://library.duke.edu/data/workshops
Consulting in the Lab
- askData@duke.edu
- my schedule: https://is.gd/littleconsult

R for Lunch as a series

R for Lunch is a series that meets 8 times (through March.) After today it will meet regularly on Thursdays at noon.

Sign-up for each workshop individually
~~Each episode has a unique zoom link AND/OR~~
find a recording/data/slides/code from a previous CDVS workshop episode
- CDVS Online Learning Tutorials
- Specific R for Lunch videos: https://warpwire.duke.edu/w/bQAEAA/

Eat your own dog food

Model how R can work for practical reproducible workflows

Code in RStudio
One kind of report is these slides (Quarto Presentation slidedeck - hosted)
Another report is the Introduction to R/Tidyverse/Quarto text.

Definitions

R/Tidyverse/Quarto

R/Tidyverse/Quarto represents the state of the art for practical reproducibility

R & RStudio

R is a data-first programming language

RStudio is an IDE

Reproducibility

Independently and transparently achieve reliable results with the same data and the same workflow
- Transparency with reproducible workflows
Best workflow and ecosystem to achieve reproducible work is to “do everything with code”
- Import data, analyze, visualize, and publish/share

Tidyverse

An opinionated set of packages for data manipulation and analysis
A meta-package of eight symbiotic packages

Packages

Extend R into your subject domain
And/or make it easier to accomplish a computational task
There are thousands
- MetaCRAN, CRAN, BioConductor, GitHub

Quarto

works with R and Python

A scientific publishing system (workflow)
- dashboards, manuscripts, MSWord, slides, website, e-book, PDF
Coding Notebooks: Code chunks interspersed with explanatory text (Natural language)
- Render reproducible, shareable reports
A next-gen (or modern) Markdown

Quarto notebook

A side-by-side view of a Quarto editor and rendered report expression

Opinionated

Tidyverse and Quarto is the most practical and developed, reproducible, scientific analysis and publishing workflow available.

Tidy data

Tidy data¹

Tidy data

Every row is a single observation
Every column is a variable
The cells are single data values

Wide data

Code

library(tidyverse)
library(gt)
library(gtExtras)

tidyr::relig_income |> 
  gt::gt_preview() |> 
  gtExtras::gt_theme_dark()

	religion	<$10k	$10-20k	$20-30k	$30-40k	$40-50k	$50-75k	$75-100k	$100-150k	>150k	Don't know/refused
1	Agnostic	27	34	60	81	76	137	122	109	84	96
2	Atheist	12	27	37	52	35	70	73	59	74	76
3	Buddhist	27	21	30	34	33	58	62	39	53	54
4	Catholic	418	617	732	670	638	1116	949	792	633	1489
5	Don’t know/refused	15	14	15	11	10	35	21	17	18	116
6..17
18	Unaffiliated	217	299	374	365	341	528	407	321	258	597

Tall data

Code

relig_income |> 
  pivot_longer(cols = -religion, 
               names_to = "income_category", 
               values_to = "income") |> 
  gt::gt_preview() |> 
  gtExtras::gt_theme_dark()

	religion	income_category	income
1	Agnostic	<$10k	27
2	Agnostic	$10-20k	34
3	Agnostic	$20-30k	60
4	Agnostic	$30-40k	81
5	Agnostic	$40-50k	76
6..179
180	Unaffiliated	Don't know/refused	597

Code

relig_income |> 
  pivot_longer(cols = -religion, 
               names_to = "income_category", 
               values_to = "income") |> 
  mutate(religion = fct_relevel(religion, "Evangelical Prot", "Mainline Prot", "Catholic", "Unaffiliated", "Historically Black Prot")) |> 
  mutate(income_category = fct_rev(as_factor(income_category))) |>
  ggplot(aes(income, income_category)) +
  geom_col(fill = "#eee8d5") +
  facet_wrap(vars(
    fct_other(
      religion, 
      keep = c("Evangelical Prot", "Mainline Prot", "Catholic", "Unaffiliated", "Historically Black Prot")))) +
  theme(plot.background = element_rect(fill = "#002b36"),
        text = element_text(color = "#eee8d5"),
        axis.text = element_text(color = "#eee8d5"), 
        panel.background = element_rect(fill = "#002b36"),
        panel.grid = element_line(color = "#002b36"),
        strip.background = element_rect(fill = "#7b9c9f"))

Code

relig_income |> 
  pivot_longer(cols = -religion, names_to = "income_category") |> 
  ggplot(aes(value, income_category)) +
  geom_col() +
  facet_wrap(vars(religion))

Image Credit: apreshill | CC BY 4.0 | https://github.com/apreshill/teachthat/blob/master/pivot/pivot_longer_smaller.gif]

Polls

Grammar (data and graphics)

By next week you’ll have the basic building blocks to

Leverage reproducible data workflows: import data, analyze data, and generate visualizations.

Along the way

Rendering reproducible reports (Quarto)
Practical techniques

Pro-tips that comprise a fluency of reproducible data analysis

We are here to help

askData@duke.edu
https://library.duke.edu/data
https://is.gd/littleconsult

Let’s do it

Three things for today

Tour of the RStudio IDE (Projects)
How to import data
Coding notebooks

Exercises

https://intro2r.library.duke.edu/ > Exercises > Link out > Green Code button > Download ZIP
Then, Unzip (i.e. Expand) the folder (on your local file system)
Then, double click the rforlunch_exercises.Rproj file
From RStudio the Files tab, open the 00_import_answers.qmd
- The answer file is in the RStudio rforlunch_exercises project > Files Tab > Answers folder

Closing

Pipes and Assignments

Operator Operator Name Keystroke shortucts Pnuemonic

<- assignment Alt-dash “Gets value from”

Operator	Operator Name	Keystroke shortucts	Pnuemonic
`<-`	assignment	Alt-dash	“Gets value from”
`\|>` or `%>%`	pipe	Ctrl-Shift-M	“And then”

|>
or

%>%

pipe

Ctrl-Shift-M

“And then”

Citation management

RStudio > Quarto Notebook > Insert > Citation

Example DOI: 10.18637/jss.v059.i10

ai-paired coding

Data science concepts: Microsoft copilot (“More precise” setting)
Code completion: GitHub copilot and RStudio (IDE) or VSCode (IDE)

R for Lunch

Today’s topics

Housekeeping

Housekeeping continued

R for Lunch as a series

Eat your own dog food

Definitions

R/Tidyverse/Quarto

R & RStudio

Reproducibility

Tidyverse

Packages

Quarto

Quarto notebook

Opinionated

Tidy data

Tidy data1

Tidy data

Wide data

Tall data

Code

Polls

Grammar (data and graphics)

We are here to help

Let’s do it

Three things for today

Exercises

Closing

Pipes and Assignments

Citation management

ai-paired coding

Bye for now

Tidy data¹