Data Visualization

The following shows some simple visualizations of the survey data.

Filter “test” responses

The responses prior to 2022-12-22 were used to test the survey, and aren’t really data.

suppressPackageStartupMessages(library(tidyverse))

message("Reimporting clean data file: `",
        "csv/open-science-survey-2022-fall-clean.csv`")
## Reimporting clean data file: `csv/open-science-survey-2022-fall-clean.csv`
survey <-
  readr::read_csv("csv/open-science-survey-2022-fall-clean.csv",
                  show_col_types = FALSE)

survey <- survey |>
  dplyr::filter(timestamp > lubridate::as_date("2022-12-22"))

There are \(n=\) 34 questions in total.

As of 2023-05-08 08:28:50, we have had \(n\)= 100 responses.

Time series of responses by date

survey <- survey |>
  mutate(resp_index = seq_along(timestamp))

survey |>
  ggplot() +
  aes(timestamp, resp_index) +
  geom_point() +
  geom_line()
Time series of responses

Figure 22.1: Time series of responses

Penn State campus

Question 1: What Penn State campus do you represent?

survey |>
  dplyr::filter(!is.na(campus)) |>
  ggplot2::ggplot() +
  ggplot2::aes(campus) +
  ggplot2::geom_bar() +
  theme_light()
What Penn State campus do you represent?

Figure 22.2: What Penn State campus do you represent?

Primary department or unit

Question 2: What is your primary department/unit?

survey |>
  dplyr::filter(!is.na(department)) |>
  ggplot2::ggplot() +
  ggplot2::aes(department) +
  ggplot2::geom_bar() +
  labs(x = NULL, y = "N responses") +
  coord_flip() +
  theme_light()
What is your primary department or unit?

Figure 22.3: What is your primary department or unit?

Position

Question 3: What is your position at Penn State?

survey |>
  ggplot2::ggplot() +
  ggplot2::aes(position) +
  ggplot2::geom_bar() +
  labs(x = NULL, y = "N responses") +
  coord_flip() +
  theme_light()
What is your position at Penn State?

Figure 22.4: What is your position at Penn State?

Highest post-secondary degree

Question 33: What is the highest post-secondary degree you have earned?

survey |>
  dplyr::filter(!is.na(highest_degree_earned)) |>
  ggplot2::ggplot() +
  ggplot2::aes(highest_degree_earned) +
  ggplot2::geom_bar() +
  labs(x = NULL, y = "N responses") +
  coord_flip() +
  theme_light()
What is the highest post-secondary degree you have earned?

Figure 22.5: What is the highest post-secondary degree you have earned?

Years since highest degree

Question 4: How many years have passed since you completed that degree?

survey |>
  dplyr::mutate(years_since_degree = factor(
    years_since_degree,
    c("< 2 years", "2-5 years", "5-10 years", "10+"),
    ordered = TRUE
  )) |>
  dplyr::filter(!is.na(years_since_degree)) |>
  ggplot2::ggplot() +
  ggplot2::aes(years_since_degree) +
  ggplot2::geom_bar() +
  labs(x = NULL, y = "N responses") +
  coord_flip() +
  theme_light()
How many years have passed since you completed that degree?

Figure 22.6: How many years have passed since you completed that degree?

Primary types of data

Question 5: What are the primary types of digital data that are used in your research? (choose all that apply)

data_types <- survey |>
  dplyr::select(contains("collect_")) |>
  tidyr::pivot_longer(
    cols = c(
      'collect_audio',
      'collect_video',
      'collect_photos',
      'collect_computer_data',
      'collect_sensor',
      'collect_docs',
      'collect_models',
      'collect_obs',
      'collect_sims',
      'collect_procedures',
      'collect_txt',
      'collect_genomic',
      'collect_image',
      'collect_surveys',
      'collect_spreadsheets',
      'collect_interviews',
      'collect_gis',
      'collect_sketches',
      'collect_vr',
      'collect_xml_json',
      'collect_web_social'
    ),
    names_to = "data_collect_types",
    values_to = "data_collect_vals"
  ) |>
  dplyr::mutate(data_collect_types = str_remove(data_collect_types, "collect_"))

data_types |>
  ggplot2::ggplot() +
  ggplot2::aes(data_collect_types,
               as.numeric(data_collect_vals)) +
  ggplot2::geom_col() +
  xlab("Types of data collected") +
  ylab("N responses") +
  coord_flip() +
  theme_light()
## Warning: Removed 21 rows containing missing values
## (`position_stack()`).
What are the primary types of digital data that are used in your research?

Figure 22.7: What are the primary types of digital data that are used in your research?

Restricted data

Question 6: Do you collect data that have legal or ethical restrictions governing who may access it or how it may be used?

restricted <- survey |>
  dplyr::select(-restricted_data) |>
  tidyr::pivot_longer(
    cols = contains("restricted"),
    names_to = "restricted_data_types",
    values_to = "restricted_data_vals"
  ) |>
  dplyr::mutate(restricted_data_types = str_remove(restricted_data_types, "restricted_"))

restricted |>
  ggplot2::ggplot() +
  ggplot2::aes(restricted_data_types,
               as.numeric(restricted_data_vals)) +
  ggplot2::geom_col() +
  xlab("Types of restrictions") +
  ylab("N responses") +
  coord_flip() +
  theme_light()
## Warning: Removed 4 rows containing missing values
## (`position_stack()`).
Respondents who collect restricted data of varied types

Figure 22.8: Respondents who collect restricted data of varied types

Where store data for active projects?

Question 7: Where do you store data for active projects where data collection and analysis is still ongoing?

storage_active <- survey |>
  dplyr::select(-storage_active_projects) |>
  tidyr::pivot_longer(cols = contains("store"),
                      names_to = "store_where",
                      values_to = "store_where_vals") |>
  dplyr::mutate(store_where = str_remove(store_where, "store_"))

storage_active |>
  ggplot2::ggplot() +
  ggplot2::aes(store_where,
               as.numeric(store_where_vals)) +
  ggplot2::geom_col() +
  # scale_y_continuous(name = "N responses",
  #                    breaks = c(0:length(storage_active$store_where_vals))) +
  xlab("Where data is stored") +
  ylab("N responses") +
  coord_flip() +
  theme_light()
Locations for storing active research data

Figure 22.9: Locations for storing active research data

Importance of sharing with research collaborators

Question 8: How important to you is sharing data from active projects with research collaborators at Penn State or outside of Penn State?

survey |>
  dplyr::mutate(importance_sharing_collab = factor(
    importance_sharing_collab,
    c(
      "Not Important",
      "Slightly important",
      "Moderately important",
      "Important",
      "Very important"
    )
  )) |>
  dplyr::filter(!is.na(importance_sharing_collab)) |>
  ggplot2::ggplot() +
  ggplot2::aes(importance_sharing_collab) +
  ggplot2::geom_bar() +
  theme_light()
How important to you is sharing data from active projects with research collaborators at Penn State or outside of Penn State?

Figure 22.10: How important to you is sharing data from active projects with research collaborators at Penn State or outside of Penn State?

Convenience of sharing with research collaborators

Question 9: How convenient is it for you to share data from active projects with research collaborators at Penn State or outside of Penn State?

survey |>
  dplyr::mutate(convenience_sharing_collab = factor(
    convenience_sharing_collab,
    c(
      "Very inconvenient",
      "Inconvenient",
      "Neither",
      "Convenient",
      "Very convenient",
      "Not applicable"
    )
  )) |>
  dplyr::filter(!is.na(convenience_sharing_collab)) |>
  ggplot2::ggplot() +
  ggplot2::aes(convenience_sharing_collab) +
  ggplot2::geom_bar() +
  theme_light()

Barriers to sharing with research collaborators

Question 10: What are the main barriers to sharing data from active projects with research collaborators?

survey |>
  dplyr::filter(!is.na(barriers_sharing_collab)) |>
  dplyr::select(barriers_sharing_collab) |>
  knitr::kable(format = 'html')
barriers_sharing_collab
Different people use different ways of sharing/storing data (Box, Dropbox, Github, Drive, etc.)
getting access approved for external users
ensuring that users adhere to data security and the time required to prepare data sets for sharing
Lots of different file types, collaborators not used to using file structure/naming structure
I need to use dropbox since nobody has box or one drive at other institutions.
Onedrive has a bizarre sharing UI. I am starting to understand it but it is quite counterintuitive compared to google drive
Getting permissions correctly established on the ICDS system. Adding members on ICDS. Getting liberal arts computers to communicate with ICDS systems (i.e., mapping drives). For whatever reason this seems to take days-weeks, every time we have a new person.
Creating clean public use files and clear documentation.
Access to different platforms that might not allow colleagues at other universities to access the data
No technological or legal barriers. But sharing is not always necessary or desirable prior to a project’s completion, so the barrier (if you could call it that) is one deliberately created by me as researcher.
size
The ethical approvals required and the types of data that can be shared. I think things are getting better since we are learning how to share these types of data
having a shared box outside people can access that is not a Google drive
There are no barriers; my data is publicly available and if it goes missing / gets deleted I share the backups I make with collaborators.
Sharing legal administrative data and PII are difficult
na
Getting them on OneDrive if they are not already using it
None really.
N/A
When my collaborators are outside of Penn State sometimes they have challenges to log in to OneDrive.
No big barriers. Email etc. is fine when needed.
OneDrive can be confusing; sometimes hard to share with people outside of PSU
Data organization - When you share a folder, the person who receives it is unable to organize that folder within their system in the place they want it to appear. So, I have a large list of shared folders, none of which are organized in a way that makes sense to me.
File size and collaborators’ lack of access to/use of OneDrive
That PSU changes from Box to One Drive to Fill-In-The-Blank on a semi-frequent basis.
IRB
Tools that make it easy. We fight with Globus (and it doesn’t work for some other partner Universities) and using NCBI SRA database is clunky, with limited metadata entry that can occur.
Large file sizes and PSU’s data security protocols
office of sponsored projects/legal places unreasonable restrictions on data sharing and data in general; their default is to be hypercautious and treat all data as if it were highly sensitive; for convincing them otherwise, the process takes MONTHS. This is the biggest impediment to human subjects research at PSU. Not IRB. Not our external data providers or collaborators. it is the risk-averse legal/bureaucratic environment at PSU, which is out of line with our peer institutions.
Not applicable
It can be cumbersome to make large numbers of image datasets available through globus or by transferring to Sharepoint. Many major collaborators of ours are set up on ICDS with sponsored accounts, but even then it isn’t always easy to get others set up with globus or access.
The IRB requirements for data sharing sometimes get in the way
As an ethnographer, the primary concern is sharing data due to ethical concerns.
OSF is not user friendly, and there is some concern that materials will be used before we have a chance to get our research published.
No major barriers. File-sharing and version control software isn’t perfect, but it’s come a long way!
Analysis of large video datasets is too slow over the web.
lack of stability- moving from one system to another (Box to Onedrive) causes confusion, wastes time
None
my own organization skills; laziness; need for me to spend extra time getting labels and format into shape that others can use
We often resort to google drives, but I find cloud-based repositories of data often difficult to navigate
Confidentiality
Access to PSU server; PSU and IRB changing cloud services and where it is acceptable to house data
Security of online systems
Knowing which server to use
providing collaborators outside Penn State with access to secure Penn State resources like OneDrive/Sharepoint is possible but requires quite a bit of effort. Data use agreements between institutions sometimes take a long time to finalize.
Getting collaborators access to shared drives or folders (e.g. having to request someone have access to folders on Roar).
Concerns of deductive disclosure, the time to build the needed infrastructure, time to properly make data FAIR without funds to support it
Sharing my own data is no problem. It is getting data from others that is difficult.
Office of Research Protections Human Subject data research restrictions
Confidentiality
Penn State changing their storage company too often
Usability of sharepoint and microsoft drive
Biggest barrier is to external collaborators
Regulatory restrictions (e.g. HIPAA, FERPA), and interoperability issues (e.g. ROAR->OneDrive). Plus, just tracking who can access what when data are sensitive in parts, and tracking who has which version of versioned data sets.
Person time to provide data and documentation
Granting/maintaining access
TIME, data management knowledge and skills, versioning and changes after you’ve shared the data, knowing what to share (raw vs. cleaned vs. scored), knowing where to store shared files, participant privacy concerns
Technical infrastructure that supports a single source of truth for data and metadata
scale of data
The Sharepoint file sharing process isn’t fantastic. Links that were created correctly stop working at random times, or permissions change.
volume of data (few files, large sizes)
Penn State switching data sharing sources (i.e. Box to One Drive) every couple of years.
None
Size of the data
Sometimes onedrive does not work well for collaborating with institutions outside of penn state.
If I was a researcher: knowing who to collaborate with, scheduling and timelines with busy collaborators, and some data may not be shared easily (if done electronically).
the default is that you should not do this
Understandability
Interview and survey results have PII
The microsoft products (OneDrive etc) are much worse than our former Box system. I avoid using them. In contrast, Box was fine. Dropbox also is fine (I have a self-paid Dropbox account I use now since Box was discontinued at PSU).

Importance of sharing with research community

Question 11: How important to you is sharing data from completed projects with the broader research community (i.e., not direct collaborators)?

survey |>
  dplyr::mutate(importance_share_community = factor(
    importance_share_community,
    c(
      "Not Important",
      "Slightly important",
      "Moderately important",
      "Important",
      "Very important"
    )
  )) |>
  dplyr::filter(!is.na(importance_share_community)) |>
  ggplot2::ggplot() +
  ggplot2::aes(importance_share_community) +
  ggplot2::geom_bar() +
  theme_light()
How important to you is sharing data from completed projects with the broader research community (i.e., not direct collaborators)?

Figure 22.11: How important to you is sharing data from completed projects with the broader research community (i.e., not direct collaborators)?

Obstacles to sharing with research community

Question 12: Which of the following obstacles make sharing data with the research community harder for you? Mark all that apply.

sharing_comm_obstacles <- survey |>
  dplyr::select(-barriers_share_community, -barriers_sharing_collab) |>
  tidyr::pivot_longer(cols = contains("barriers"),
                      names_to = "barriers_comm_what",
                      values_to = "barriers_comm_vals") |>
  dplyr::mutate(barriers_comm_what = str_remove(barriers_comm_what, "barriers_sharing_"))

sharing_comm_obstacles |>
  ggplot2::ggplot() +
  ggplot2::aes(barriers_comm_what,
               as.numeric(barriers_comm_vals)) +
  ggplot2::geom_col() +
  xlab("Barriers to sharing with the research community") +
  ylab("N responses") +
  coord_flip() +
  theme_light()
## Warning: Removed 35 rows containing missing values
## (`position_stack()`).
Obstacles to sharing with research community

Figure 22.12: Obstacles to sharing with research community

Requirements for data sharing from funders

Question 13: Do research funders in your field require data sharing?

survey |>
  ggplot2::ggplot() +
  ggplot2::aes(funders_require_data_sharing) +
  ggplot2::geom_bar() +
  theme_light()
Do research sponsors/funders in your field require data sharing?

Figure 22.13: Do research sponsors/funders in your field require data sharing?

Requirements for data sharing from journals

Question 14: Do journals in your field require data sharing?

survey |>
  ggplot2::ggplot() +
  ggplot2::aes(journals_require_data_sharing) +
  ggplot2::geom_bar() +
  theme_light()
Do journals in your field require data sharing?

Figure 22.14: Do journals in your field require data sharing?

Where has data been shared?

Question 15: If you have shared data with the research community, where have you shared it?

sharing_where <- survey |>
  dplyr::select(
    -where_shared_community,
    -share_analysis_code_collab,
    -share_analysis_code_community
  ) |>
  tidyr::pivot_longer(
    cols = c(
      'share_inst_repo',
      'share_journal_suppl',
      'share_lab_web',
      'share_ext_repo',
      'share_govt_repo',
      'share_consortia'
    ),
    names_to = "share_where_target",
    values_to = "share_where_vals"
  ) |>
  dplyr::mutate(share_where_target = str_remove(share_where_target, "share_"))

sharing_where |>
  ggplot2::ggplot() +
  ggplot2::aes(share_where_target,
               as.numeric(share_where_vals)) +
  ggplot2::geom_col() +
  xlab("Where data are shared") +
  ylab("N responses") +
  coord_flip() +
  theme_light()
## Warning: Removed 114 rows containing missing values
## (`position_stack()`).
Where data has been shared

Figure 22.15: Where data has been shared

How well-equipped are we to meet data management and sharing requirements?

Question 16: How well-equipped do you feel you, your colleagues, and trainees are to meet data management and sharing requirements of sponsors/funders or journals?

survey |>
  dplyr::mutate(equipped_data_mgmt_sharing = factor(
                  equipped_data_mgmt_sharing,
                  c(
                    "Not equipped at all",
                    "Slightly equipped",
                    "Moderately equipped",
                    "Equipped",
                    "Very well equipped"
                  )
                )) |>
  dplyr::filter(!is.na(equipped_data_mgmt_sharing)) |>
  ggplot2::ggplot() +
  ggplot2::aes(equipped_data_mgmt_sharing) +
  ggplot2::geom_bar() +
  theme_light()
How well-equipped do you feel you, your colleagues, and trainees are to meet data management and sharing requirements of sponsors/funders or journals?

Figure 22.16: How well-equipped do you feel you, your colleagues, and trainees are to meet data management and sharing requirements of sponsors/funders or journals?


Frequency of code generation

Question 17: How often do you create computer scripts or data analysis code in the conduct of your research?

survey |>
  dplyr::mutate(create_analysis_code = factor(
    create_analysis_code,
    c("Never", "Rarely",
      "Sometimes", "Often", "Always")
  )) |>
  dplyr::filter(!is.na(create_analysis_code)) |>
  ggplot2::ggplot() +
  ggplot2::aes(create_analysis_code) +
  ggplot2::geom_bar() +
  theme_light()
How often do you create computer scripts or data analysis code in the conduct of your research?

Figure 22.17: How often do you create computer scripts or data analysis code in the conduct of your research?

Frequency of code sharing with collaborators

Question 18: How often do you share computer scripts or data analysis code with direct research collaborators ?

survey |>
  dplyr::mutate(share_analysis_code_collab = factor(
                  share_analysis_code_collab,
                  c("Never", "Rarely",
                    "Sometimes", "Often", "Always")
                )) |>
  dplyr::filter(!is.na(share_analysis_code_collab)) |>
  ggplot2::ggplot() +
  ggplot2::aes(share_analysis_code_collab) +
  ggplot2::geom_bar() +
  theme_light()
How often do you share computer scripts or data analysis code with direct research collaborators ?

Figure 22.18: How often do you share computer scripts or data analysis code with direct research collaborators ?

Creation of other types of software

Question 19: Do you create other kinds of software in the conduct of your research?

survey |>
  ggplot2::ggplot() +
  ggplot2::aes(create_other_code) +
  ggplot2::geom_bar() +
  theme_light()
Do you create other kinds of software in the conduct of your research?

Figure 22.19: Do you create other kinds of software in the conduct of your research?

Frequency of use of open source code-sharing tools

Question 20: How often do you use open source code sharing tools (e.g., GitHub, GitLab, BitBucket)?

survey |>
  dplyr::mutate(use_code_sharing_tools = factor(use_code_sharing_tools, 
                                                      c("Never", "Rarely", 
                                                        "Sometimes", "Often", "Always"))) |>
  dplyr::filter(!is.na(use_code_sharing_tools)) |>
  ggplot2::ggplot() +
  ggplot2::aes(use_code_sharing_tools) +
  ggplot2::geom_bar() +
  theme_light()
How often do you use open source code sharing tools (e.g., GitHub, GitLab, BitBucket)?

Figure 22.20: How often do you use open source code sharing tools (e.g., GitHub, GitLab, BitBucket)?

Requirements for code sharing from funders

Question 21: Do funders in your field require code sharing?

survey |>
  dplyr::filter(!is.na(funders_require_code_sharing)) |>
  ggplot2::ggplot() +
  ggplot2::aes(funders_require_code_sharing) +
  ggplot2::geom_bar() +
  theme_light()
Do sponsors/funders in your field require code sharing?

Figure 22.21: Do sponsors/funders in your field require code sharing?

Requirements for code sharing from journals

Question 22: Do journals in your field require code sharing?

survey |>
  dplyr::filter(!is.na(funders_require_code_sharing)) |>
  ggplot2::ggplot() +
  ggplot2::aes(journals_require_code_sharing) +
  ggplot2::geom_bar() +
  theme_light()
Do journals in your field require code sharing?

Figure 22.22: Do journals in your field require code sharing?

Frequency of open code sharing

Question 34: How often do you share computer scripts or data analysis code openly?

survey |>
  dplyr::mutate(share_analysis_code_community = factor(
                  share_analysis_code_community,
                  c("Never", "Rarely",
                    "Sometimes", "Often", "Always")
                )) |>
  dplyr::filter(!is.na(share_analysis_code_community)) |>
  ggplot2::ggplot() +
  ggplot2::aes(share_analysis_code_community) +
  ggplot2::geom_bar() +
  theme_light()
How often do you share computer scripts or data analysis code openly?

Figure 22.23: How often do you share computer scripts or data analysis code openly?

Share other materials

Question 23: How often do you openly share other materials related to your research (protocols, reagents, samples, apparatus, designs, etc.) with other researchers?

survey |>
  dplyr::mutate(share_materials_community = factor(
                  share_materials_community,
                  c(
                    "Never",
                    "Rarely",
                    "Sometimes",
                    "Often",
                    "Always",
                    "Not applicable"
                  )
                )) |>
  dplyr::filter(!is.na(share_materials_community)) |>
  ggplot2::ggplot() +
  ggplot2::aes(share_materials_community) +
  ggplot2::geom_bar() +
  theme_light()
How often do you openly share other materials related to your research (protocols, reagents, samples, apparatus, designs, etc.) with other researchers?

Figure 22.24: How often do you openly share other materials related to your research (protocols, reagents, samples, apparatus, designs, etc.) with other researchers?


Experience/Knowledge of Open Science

Question 24: What is your experience with/knowledge of open science practices?

survey |>
  dplyr::mutate(knowledge_open_science = factor(
    knowledge_open_science,
    c("None", "Limited",
      "Some", "Considerable", "Extensive")
  )) |>
  dplyr::filter(!is.na(knowledge_open_science)) |>
  ggplot2::ggplot() +
  ggplot2::aes(knowledge_open_science) +
  ggplot2::geom_bar() +
  theme_light()
What is your experience with/knowledge of open science practices?

Figure 22.25: What is your experience with/knowledge of open science practices?

Awareness of FAIR principles

Question 25: Describe your awareness of the FAIR (findable, accessible, interoperable, reusable) principles pertaining to research data.

survey |>
  dplyr::mutate(awareness_FAIR = factor(
                  awareness_FAIR,
                  c("None", "Limited",
                    "Some", "Considerable", "Extensive")
                )) |>
  dplyr::filter(!is.na(awareness_FAIR)) |>
  ggplot2::ggplot() +
  ggplot2::aes(awareness_FAIR) +
  ggplot2::geom_bar() +
  theme_light()
Describe your awareness of the FAIR (findable, accessible, interoperable, reusable) principles pertaining to research data.

Figure 22.26: Describe your awareness of the FAIR (findable, accessible, interoperable, reusable) principles pertaining to research data.

Application of FAIR principles in your own work

Question 26: Do you apply FAIR principles in your own data management and sharing practices?

survey |>
  dplyr::mutate(apply_FAIR = factor(
                  apply_FAIR,
                  c("Never", "Rarely",
                    "Sometimes", "Often", "Always", "Not applicable")
                )) |>
  dplyr::filter(!is.na(apply_FAIR)) |>
  ggplot2::ggplot() +
  ggplot2::aes(apply_FAIR) +
  ggplot2::geom_bar() +
  theme_light()
Do you apply FAIR principles in your own data management and sharing practices?

Figure 22.27: Do you apply FAIR principles in your own data management and sharing practices?

Heard of reproducibility crisis?

Question 27: Have you heard of the “reproducibility crisis” in science?

survey |>
  dplyr::filter(!is.na(heardof_reproducibility_crisis)) |>
  ggplot2::ggplot() +
  ggplot2::aes(heardof_reproducibility_crisis) +
  ggplot2::geom_bar() +
  theme_light()
Have you heard of the 'reproducibility crisis' in science?

Figure 22.28: Have you heard of the ‘reproducibility crisis’ in science?

Is there a crisis in your area?

Question 28: Is there a reproducibility crisis in your area of research?

survey |>
  dplyr::mutate(my_area_reproducibility_crisis = factor(
    my_area_reproducibility_crisis,
    c("Don't know", "No, there is no crisis", "Yes, a slight crisis", "Yes, a significant crisis")
  )) |>
  dplyr::filter(!is.na(my_area_reproducibility_crisis)) |>
  ggplot2::ggplot() +
  ggplot2::aes(my_area_reproducibility_crisis) +
  ggplot2::geom_bar() +
  theme_light()
Is there a reproducibility crisis in your area of research?

Figure 22.29: Is there a reproducibility crisis in your area of research?

Benefit from PSU center

Question 29: How much benefit would you derive from a center at Penn State focused on supporting the adoption of best practices in data management and sharing, code sharing, open science, and reproducible research?

survey |>
  dplyr::mutate(benefit_psu_center = factor(
                  benefit_psu_center,
                  levels = c(
                    "None",
                    "Minimal",
                    "Some",
                    "Considerable",
                    "Extensive",
                    "Not applicable"
                  )
                )) |>
  dplyr::filter(!is.na(benefit_psu_center)) |>
  ggplot2::ggplot() +
  ggplot2::aes(benefit_psu_center) +
  ggplot2::geom_bar() +
  theme_light()
How much benefit would you derive from a center at Penn State focused on supporting the adoption of best practices in data management and sharing, code sharing, open science, and reproducible research?

Figure 22.30: How much benefit would you derive from a center at Penn State focused on supporting the adoption of best practices in data management and sharing, code sharing, open science, and reproducible research?

Services from PSU Center

Question 30: Select the services that would most benefit your research if offered by such a center.

services_center <- survey |>
  dplyr::select(contains("help_")) |>
  tidyr::pivot_longer(
    cols = c(
      'help_data_review_qa',
      'help_data_mgmt_plan',
      'help_data_doc',
      'help_data_analysis_verif',
      'help_student_staff_train',
      'help_data_deidentif',
      'help_funder_compliance',
      'help_where_to_share'
    ),
    names_to = "center_services_types",
    values_to = "center_services_vals"
  ) |>
  dplyr::mutate(center_services_types = str_remove(center_services_types, "help_"))

services_center |>
  ggplot2::ggplot() +
  ggplot2::aes(center_services_types,
               as.numeric(center_services_vals)) +
  ggplot2::geom_col() +
  xlab("Services that would benefit research") +
  ylab("N responses") +
  coord_flip() +
  theme_light()
## Warning: Removed 72 rows containing missing values
## (`position_stack()`).
Select the services that would most benefit your research if offered by such a center.

Figure 22.31: Select the services that would most benefit your research if offered by such a center.

Comments

Question 31: Any final comments about data management, data sharing, and open science?

survey |>
  dplyr::filter(!is.na(comments)) |>
  dplyr::select(comments) |>
  knitr::kable(format = 'html')
comments
I think that this is a particularly challenging problem for longitudinal research studies and studies collecting data from vulnerable or sensitive populations. A thoughtful approach that takes developmental and contextual concerns would be very helpful! Overall I think you have hit on the key aspects of open science issues/questions. I’m not 100% sure if there is anything specific about developmental research that should be included, but that is something rather unique that might be worth considering as an additional focus/question
The practices of data and materials sharing is very important to me! I think in most cases this training should be a required part of advanced degrees in fields dealing with data. This center would be a huge step in that direction.
The Assoc Dean for Research in the College of Liberal Arts asked the college’s faculty to complete this survey, but neither its concepts and language nor its goals and purpose fit what humanities faculty do. "Open science" or "open access" concepts are not usefully applicable to the humanities. In fact, they undermine the very foundations of the international system of research and publication that sustains our scholarship. If such concepts were imposed upon the humanities, all its disciplines would be rapidly be destroyed.
I work in the areas on open science, specifically on the (reproducibility and) replicability of research and in linguistics this is a major concern for the field. Progress is slow, but you can see movement. I incorporate open science initiatives into my graduate teaching training because I think a lot of this has to do with education (e.g., why preregister, why share materials/data, questionable research practices)
In my field I have to create new data and I can not rely on the quality of data that others say they use or have. It is usually not up to the standards that I feel are needed for accurate social science research that addresses historic questions.
none
How would this work for the Humanities?
A center at Penn State would be great, but especially if they could provide actual direct help in terms of time and work with the data. Open Science is a great goal, but the amount of time and energy it can take to make things accessible is a major barrier since it’s ON TOP of all the other tasks of research which have not changed.
My field requires sharing code and data for virtually all journals, so this isn’t an issue for me. 5-10 years ago it was a problem.
Knowledge about resources, especially here at Penn State but also more broadly, would be really useful.
information and technical knowledge are relatively easy barriers to overcome; the bureaucratic/compliance/hyperlegal environment at PSU is the real barrier to open science and data sharing. good luck changing that…
I’m committed to sharing data, protocols, and code. I’ve been somewhat less effective than I would like in implementing plans and protocols throughout data collection and processing that would ensure effective sharing of raw data through online repositories (e.g., MorphoSource). We also generate a lot of code for our work and share most of it on GitHub but we don’t have the resources or skill to make that code easy to implement for novices or researchers outside our collaborative group (i.e., no software engineering or program development abilities that would make sharing code really useful for others).
Although this issues are not applicable to my research directly, they are certainly relevant to colleagues whose work I support.
Our field of education faces the tension between strict IRB requirements and the emerging requirements for data sharing. PIs often feels caught between these two ethical principles for quality research.
Perhaps change needs to come from the publishers of research. If all required open science then there would be greater incentives to prepare for that at outset of research
This seems more applicable to bio-behavioral sciences, RCTs, and so forth, none of which I really do.
need funding for open access
I want to participate in open science and teach my students to do so but it seems hard and confusing. I have attended a few trainings/workshops, but a center offering comprehensive, end-to-end support and services would be extremely valuable to me. I can’t attend the meeting on Feb 17 because I teach, I hope it will be recorded.
Although I aways try to share data, code, protocols, etc., the time it requires takes away from other areas of productivity and isn’t recognized in any way. It often feels like I am shooting myself in the foot by taking the time to prepare for sharing data because it is takes away time from papers and grants that are recognized in promotion and job applications.
For decades I have tried to employ what is now called "open science." And I still do.
It’s very difficult to say how helpful this kind of initiative would be without knowing what it would do. People associated with this process need to be embedded in pre-award services, and should be available to help when people come in to cost out computation resources and set up Data Monitoring Plans. Openness should to be built in from the start, and included in the grant costs.
Our team has been focused on a lot of these topics! I am part of a data management support group for women in education research, which has been extremely helpful. I’ve networked with and learned from people from other universities in child development and education sciences facing similar challenges. I’d be happy to share those resources and knowledge if helpful. Here at Penn State, one of our team members is a computer programmer and scientist and has been creating an automatic way for us to de-identify, process, clean, merge, and create documentation for our data. It has been extremely helpful as a prototype, and we are hoping to share it with others because we think it could help others, too! They recently one 2nd place in Penn State’s Techcelerator competition for this work.
I am a education researcher with a background in software engineering, and have been working on software to facilitate data management workflows in large studies. I see a lot of barriers in the research community to using best data management practices (e.g. git, data pipelines like snakemake) and am passionate about writing software that bridges these gaps. I can speak both researcher and software engineer! Feel free to reach out if you’d like to chat more.
No
Large data is hard to share
It is incredibly important!
No
Good!
While my field does not benefit as much from these services, they are vital for many STEM fields
Too many different sites/rules/repositories. Having just one would be much better.