Confirmation

Modified

July 30, 2025

Warning

This page is under construction.

Not all functions described work properly.

About

This page documents the workflow for downloading, cleaning, and visualizing the Bootcamp attendance confirmation data.

Setup

We load some packages into memory for convenience.

Code
suppressPackageStartupMessages(library('tidyverse'))
suppressPackageStartupMessages(library('ggplot2'))
suppressPackageStartupMessages(library('dplyr'))
suppressPackageStartupMessages(library('tidyr'))
suppressPackageStartupMessages(library('stringr'))
suppressPackageStartupMessages(library('lubridate'))

Import

The Google Form generates a Google Sheet that we download to a protected directory (include/csv) that is not synched to GitHub.

Important

This is because the sheet contains personally identifying information.

Code
if (!dir.exists(params$csv_dir)) {
  message("Creating missing `include/csv/`.")
  dir.create(params$csv_dir)
}

options(gargle_oauth_email = Sys.getenv("GMAIL_SURVEY"))
googledrive::drive_auth()

googledrive::drive_download(
  params$sheets_fn,
  path = file.path(params$csv_dir, params$data_csv_fn),
  type = "csv",
  overwrite = TRUE
)

Clean

We reimport the saved CSV file and then clean it.

Code
confirmations <- readr::read_csv(file.path(params$csv_dir, params$data_csv_fn),
                                 show_col_types = FALSE)

names(confirmations)
 [1] "Timestamp"                                  
 [2] "Email Address"                              
 [3] "What is your name?"                         
 [4] "Which days of the bootcamp will you attend?"
 [5] "Any meal/food restrictions?"                
 [6] "Workshop session 1 - Day 1 @ 1:45 pm"       
 [7] "Workshop session 2 - Day 1 @ 3:00 pm"       
 [8] "Workshop session 3 - Day 2 @ 1:15 pm"       
 [9] "Workshop session 4 - Day 2 @ 2:45 pm"       
[10] "Workshop session 5 - Day 3 @ 10:45 1m"      

The imported CSV file has n=1 rows.

Note

The first row represents data generated by Rick Gilmore to test this workflow. We can delete that row, but only when there are >1 rows. The chunk below does not evaluate if there are fewer than 2 rows.

Code
if (dim(confirmations)[1] > 1) {
  confirmations <- confirmations[2:dim(confirmations)[1],]
} else {
  warning("Only one row in `confirmations; leaving data intact")
}

We want to capture the “raw” or full question name and the short variable name in a data dictionary.

Code
confirmations_qs <- names(confirmations)

confirmations_clean <- confirmations |>
  dplyr::rename(
    timestamp = "Timestamp",
    attend_days = "Which days of the bootcamp will you attend?",
    food_restrictions = "Any meal/food restrictions?",
    name = "What is your name?",
    psu_email = "Email Address",
    day_1_session_1 = "Workshop session 1 - Day 1 @ 1:45 pm",
    day_1_session_2 = "Workshop session 2 - Day 1 @ 3:00 pm",
    day_2_session_3 = "Workshop session 3 - Day 2 @ 1:15 pm",
    day_2_session_4 = "Workshop session 4 - Day 2 @ 2:45 pm",
    day_3_session_5 = "Workshop session 5 - Day 3 @ 10:45 1m"
  )

confirmations_short <- c(
  "timestamp",
  "attend_days",
  "food_restrictions",
  "name",
  "psu_email",
  "day_1_session_1",
  "day_1_session_2",
  "day_2_session_3",
  "day_2_session_4",
  "day_3_session_5"
)

confirmations_pid <- c(FALSE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE)

confirmations_dd <- data.frame(qs = confirmations_qs, qs_short = confirmations_short, pid = confirmations_pid)

confirmations_dd |>
  knitr::kable(format = 'html')
readr::write_csv(confirmations_dd,
                 file = file.path(params$csv_dir,
                                  "confirmations-2025-data-dict.csv"))
Table 10.1: A minimal data dictionary.
qs qs_short pid
Timestamp timestamp FALSE
Email Address attend_days FALSE
What is your name? food_restrictions FALSE
Which days of the bootcamp will you attend? name TRUE
Any meal/food restrictions? psu_email TRUE
Workshop session 1 - Day 1 @ 1:45 pm day_1_session_1 FALSE
Workshop session 2 - Day 1 @ 3:00 pm day_1_session_2 FALSE
Workshop session 3 - Day 2 @ 1:15 pm day_2_session_3 FALSE
Workshop session 4 - Day 2 @ 2:45 pm day_2_session_4 FALSE
Workshop session 5 - Day 3 @ 10:45 1m day_3_session_5 FALSE

Then, we want to shorten the responses (e.g., day_n_session_m) for easier visualization.

Code
confirmations_clean <- confirmations_clean |>
  mutate(day_1_session_1 = case_match(
    day_1_session_1, 
    "Harnessing advanced cyberinfrastructure for research: An introduction to Roar and ICDS resources" ~ "intro_roar",
    "Getting credit for sharing your data (Part I): Good enough data management practices" ~ "data_mgmt")) |>
  mutate(day_1_session_2 = case_match(
    day_1_session_2, "Quarto (Part I): A tool for open scholarship" ~ "quarto_I",
    "Questionable research practices" ~ "qrps")) |>
  mutate(day_2_session_3 = case_match(
    day_2_session_3, "Introduction to Jupyter notebooks" ~ "jupyter_intro",
    "Quarto (Part II): Reproducible research reports" ~ "quarto_II")) |>
  mutate(day_2_session_4 = case_match(
    day_2_session_4,
    "Getting credit (Part II): Sharing your data" ~ "sharing_data",
    "LLMs with Jupyter notebooks" ~ "jupyter_llms")) |>
  mutate(day_3_session_5 = case_match(
    day_3_session_5,
    "Where to start? Early career panel" ~ "early_career",
    "Getting credit (Part III): Data papers" ~ "data_papers"
  ))

Create separate variables by attendance day.

Code
confirmations_clean <- confirmations_clean |>
  mutate(plan_wed = stringr::str_detect(attend_days, "Wed"),
                plan_thu = stringr::str_detect(attend_days, "Thu"),
                plan_fri = stringr::str_detect(attend_days, "Fri")) 

It’s a good idea to save the cleaned file.

Code
readr::write_csv(confirmations_clean, 
                 file = file.path(params$csv_dir, 
                                  paste0(str_remove(params$data_csv_fn, ".csv"),
                                         "-clean.csv")))