[1] "Timestamp"
[2] "Email Address"
[3] "What is your name?"
[4] "Which days of the bootcamp will you attend?"
[5] "Any meal/food restrictions?"
[6] "Workshop session 1 - Day 1 @ 1:45 pm"
[7] "Workshop session 2 - Day 1 @ 3:00 pm"
[8] "Workshop session 3 - Day 2 @ 1:15 pm"
[9] "Workshop session 4 - Day 2 @ 2:45 pm"
[10] "Workshop session 5 - Day 3 @ 10:45 1m"
The imported CSV file has n=1 rows.
Note
The first row represents data generated by Rick Gilmore to test this workflow. We can delete that row, but only when there are >1 rows. The chunk below does not evaluate if there are fewer than 2 rows.
Code
if (dim(confirmations)[1] >1) { confirmations <- confirmations[2:dim(confirmations)[1],]} else {warning("Only one row in `confirmations; leaving data intact")}
We want to capture the “raw” or full question name and the short variable name in a data dictionary.
Code
confirmations_qs <-names(confirmations)confirmations_clean <- confirmations |> dplyr::rename(timestamp ="Timestamp",attend_days ="Which days of the bootcamp will you attend?",food_restrictions ="Any meal/food restrictions?",name ="What is your name?",psu_email ="Email Address",day_1_session_1 ="Workshop session 1 - Day 1 @ 1:45 pm",day_1_session_2 ="Workshop session 2 - Day 1 @ 3:00 pm",day_2_session_3 ="Workshop session 3 - Day 2 @ 1:15 pm",day_2_session_4 ="Workshop session 4 - Day 2 @ 2:45 pm",day_3_session_5 ="Workshop session 5 - Day 3 @ 10:45 1m" )confirmations_short <-c("timestamp","attend_days","food_restrictions","name","psu_email","day_1_session_1","day_1_session_2","day_2_session_3","day_2_session_4","day_3_session_5")confirmations_pid <-c(FALSE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE)confirmations_dd <-data.frame(qs = confirmations_qs, qs_short = confirmations_short, pid = confirmations_pid)confirmations_dd |> knitr::kable(format ='html')readr::write_csv(confirmations_dd,file =file.path(params$csv_dir,"confirmations-2025-data-dict.csv"))
Table 10.1: A minimal data dictionary.
qs
qs_short
pid
Timestamp
timestamp
FALSE
Email Address
attend_days
FALSE
What is your name?
food_restrictions
FALSE
Which days of the bootcamp will you attend?
name
TRUE
Any meal/food restrictions?
psu_email
TRUE
Workshop session 1 - Day 1 @ 1:45 pm
day_1_session_1
FALSE
Workshop session 2 - Day 1 @ 3:00 pm
day_1_session_2
FALSE
Workshop session 3 - Day 2 @ 1:15 pm
day_2_session_3
FALSE
Workshop session 4 - Day 2 @ 2:45 pm
day_2_session_4
FALSE
Workshop session 5 - Day 3 @ 10:45 1m
day_3_session_5
FALSE
Then, we want to shorten the responses (e.g., day_n_session_m) for easier visualization.
Code
confirmations_clean <- confirmations_clean |>mutate(day_1_session_1 =case_match( day_1_session_1, "Harnessing advanced cyberinfrastructure for research: An introduction to Roar and ICDS resources"~"intro_roar","Getting credit for sharing your data (Part I): Good enough data management practices"~"data_mgmt")) |>mutate(day_1_session_2 =case_match( day_1_session_2, "Quarto (Part I): A tool for open scholarship"~"quarto_I","Questionable research practices"~"qrps")) |>mutate(day_2_session_3 =case_match( day_2_session_3, "Introduction to Jupyter notebooks"~"jupyter_intro","Quarto (Part II): Reproducible research reports"~"quarto_II")) |>mutate(day_2_session_4 =case_match( day_2_session_4,"Getting credit (Part II): Sharing your data"~"sharing_data","LLMs with Jupyter notebooks"~"jupyter_llms")) |>mutate(day_3_session_5 =case_match( day_3_session_5,"Where to start? Early career panel"~"early_career","Getting credit (Part III): Data papers"~"data_papers" ))