Author
Affiliation

Rick Gilmore

Published

May 12, 2026

Bootcamp registration data: Visualizing

About

This page documents code used to visualize the Bootcamp 2026 registration data.

Setup

library(gt)
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(tidyr)

Import

We have saved an anonymized version of the data in data_public.

bootcamp_26 <- readr::read_csv("data_public/bootcamp-2026-registrations-public.csv", show_col_types = FALSE)
dim(bootcamp_26)
[1] 94  8
str(bootcamp_26)
spc_tbl_ [94 × 8] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ timestamp  : POSIXct[1:94], format: "2026-03-05 10:40:52" "2026-03-05 10:42:42" ...
 $ attend_days: chr [1:94] "Mon May 11, Tue May 12" "Mon May 11, Tue May 12" "Mon May 11, Tue May 12" "Mon May 11, Tue May 12" ...
 $ dept       : chr [1:94] "Psychology" "Psychology" "Psychology" "Psychology" ...
 $ position   : chr [1:94] "Graduate student" "Graduate student" "Graduate student" "Graduate student" ...
 $ dropped_out: chr [1:94] NA NA NA NA ...
 $ college    : chr [1:94] "CLA" "CLA" "CLA" "CLA" ...
 $ .default   : chr [1:94] "Unknown" "Unknown" "Unknown" "Unknown" ...
 $ .missing   : chr [1:94] "Unknown" "Unknown" "Unknown" "Unknown" ...
 - attr(*, "spec")=
  .. cols(
  ..   timestamp = col_datetime(format = ""),
  ..   attend_days = col_character(),
  ..   dept = col_character(),
  ..   position = col_character(),
  ..   dropped_out = col_character(),
  ..   college = col_character(),
  ..   .default = col_character(),
  ..   .missing = col_character()
  .. )
 - attr(*, "problems")=<pointer: 0xae0fad360> 

Tabular summaries

Positions

bootcamp_26 |>
  dplyr::group_by(position) |>
  dplyr::summarise(n_registrants = n())
# A tibble: 7 × 2
  position                    n_registrants
  <chr>                               <int>
1 Graduate student                       47
2 Instructor/Teaching Faculty             3
3 Postdoc/Research Faculty               10
4 Staff                                  10
5 Tenure-stream Faculty                  12
6 Undergraduate student                  10
7 <NA>                                    2
NoteVisualization can lead to more data cleaning

We note that we do not have a position assigned to two individuals.

We will need to go back to the cleaning steps to diagnose and fix that problem manually.

What I might do in this case is look the person up in the Penn State directory.

Then, if I found their department, I would add that as a step in my cleaning protocol, leaving the original raw data untouched.

Colleges

bootcamp_26 |>
  dplyr::group_by(college) |>
  dplyr::summarise(n_registrants = n()) |>
  kableExtra::kbl()
college n_registrants
AgSci 4
CLA 22
Comm 1
ECoS 7
EMS 2
Education 4
Engineering 16
HHD 22
ICDS 1
IST 2
Libraries 2
Medicine 1
OVPR 1
NA 9
NoteMissing colleges

We see that n=9 people have missing values for the college variable. Let’s see if we can learn more about these.

bootcamp_26 |>
  dplyr::filter(is.na(college)) |>
  dplyr::select(position, dept, college)
# A tibble: 9 × 3
  position                 dept                               college
  <chr>                    <chr>                              <chr>  
1 Staff                    NARC                               <NA>   
2 Tenure-stream Faculty    Psychology, CLA                    <NA>   
3 Graduate student         <NA>                               <NA>   
4 Postdoc/Research Faculty Meteorology & Atmospheric Sciences <NA>   
5 Undergraduate student    <NA>                               <NA>   
6 <NA>                     <NA>                               <NA>   
7 Postdoc/Research Faculty Meteorology & Atmospheric Sciences <NA>   
8 Graduate student         Energy Technology and Management   <NA>   
9 Graduate student         Nutrition Sciences                 <NA>   
NoteBack to the drawing board

We see some odd department (‘NARC’), one non-standard department (“Psychology, CLA”) we can standardize, one department that should be easy to standardize (“Meteorology & Atmospheric Sciences”), and three others we’ll need names for to understand further.

Departments

bootcamp_26 |>
  dplyr::group_by(dept) |>
  dplyr::summarise(n_registrants = n())
# A tibble: 47 × 2
   dept                                  n_registrants
   <chr>                                         <int>
 1 Agricultural & Biological Engineering             1
 2 BBH                                               4
 3 Biology                                           2
 4 Biomedical Engineering                            1
 5 CTSI                                              1
 6 Chemical Engineering                              4
 7 Chemical/Biomedical Engineering                   1
 8 Civil Engineering                                 1
 9 College of Education                              1
10 Communication Arts & Sciences                     1
# ℹ 37 more rows

Position by college

Since there are missing values for college, the cross-tabulation will have issues, but let’s make one anyway.

Using the {xtabs} package.

xtabs(formula = ~ college + position, data = bootcamp_26)
             position
college       Graduate student Instructor/Teaching Faculty
  AgSci                      1                           0
  CLA                       19                           1
  Comm                       1                           0
  ECoS                       2                           2
  Education                  3                           0
  EMS                        0                           0
  Engineering                4                           0
  HHD                       13                           0
  ICDS                       0                           0
  IST                        1                           0
  Libraries                  0                           0
  Medicine                   0                           0
  OVPR                       0                           0
             position
college       Postdoc/Research Faculty Staff Tenure-stream Faculty
  AgSci                              2     1                     0
  CLA                                0     1                     1
  Comm                               0     0                     0
  ECoS                               0     2                     0
  Education                          0     0                     0
  EMS                                1     0                     1
  Engineering                        2     0                     4
  HHD                                1     2                     4
  ICDS                               0     1                     0
  IST                                1     0                     0
  Libraries                          0     1                     1
  Medicine                           1     0                     0
  OVPR                               0     1                     0
             position
college       Undergraduate student
  AgSci                           0
  CLA                             0
  Comm                            0
  ECoS                            1
  Education                       1
  EMS                             0
  Engineering                     6
  HHD                             1
  ICDS                            0
  IST                             0
  Libraries                       0
  Medicine                        0
  OVPR                            0

Or using {tidyverse} functions.

bootcamp_26 |>
  dplyr::count(college, position) |>
  tidyr::pivot_wider(names_from = position, values_from = n, values_fill = 0) |>
  gt()
college Graduate student Postdoc/Research Faculty Staff Instructor/Teaching Faculty Tenure-stream Faculty Undergraduate student NA
AgSci 1 2 1 0 0 0 0
CLA 19 0 1 1 1 0 0
Comm 1 0 0 0 0 0 0
ECoS 2 0 2 2 0 1 0
EMS 0 1 0 0 1 0 0
Education 3 0 0 0 0 1 0
Engineering 4 2 0 0 4 6 0
HHD 13 1 2 0 4 1 1
ICDS 0 0 1 0 0 0 0
IST 1 1 0 0 0 0 0
Libraries 0 0 1 0 1 0 0
Medicine 0 1 0 0 0 0 0
OVPR 0 0 1 0 0 0 0
NA 3 2 1 0 1 1 1