Data Management and Sharing Plan

The following was updated on 2023-03-01. It is based on document OMB No. 0925-0001 and 0925-0002 (Rev. 07/2022 Approved Through TBD).

Note that the following contains some URLs provided by the user. These are prohibited in a document submitted to NIH. Since this document is not part of an NIH study, we provide them here for transparency.

If any of the proposed research in the application involves the generation of scientific data, this application is subject to the NIH Policy for Data Management and Sharing and requires submission of a Data Management and Sharing Plan. If the proposed research in the application will generate large-scale genomic data, the Genomic Data Sharing Policy also applies and should be addressed in this Plan. Refer to the detailed instructions in the application guide for developing this plan as well as to additional guidance on https://sharing.nih.gov. The Plan is recommended not to exceed two pages. Text in italics should be deleted. There is no “form page” for the Data Management and Sharing Plan. The DMS Plan may be provided in the format shown below.

Public reporting burden for this collection of information is estimated to average 2 hours per response, including the time for reviewing instructions, searching existing data sources, gathering, and maintaining the data needed, and completing and reviewing the collection of information. An agency may not conduct or sponsor, and a person is not required to respond to, a collection of information unless it displays a currently valid OMB control number. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to: NIH, Project Clearance Branch, 6705 Rockledge Drive, MSC 7974, Bethesda, MD 20892-7974, ATTN: PRA (0925-0001 and 0925-0002). Do not return the completed form to this address.

Element 1: Data Type

A. Types and amount of scientific data expected to be generated in the project:

Summarize the types and estimated amount of scientific data expected to be generated in the project.

This project will collect responses to surveys from approximately 100-500 students, faculty, administrators and staff at The Pennsylvania State University. The survey responses will be collected in the form of a single text-based data file that will be stored as a comma-separated value (CSV) text file that is less than 1 GB in size. Derivative CSV files with subsets of the data may also be collected. The specific questions asked and the IRB application protocol will also be stored as text files. The R language code used to analyze the data will also be stored.

B. Scientific data that will be preserved and shared, and the rationale for doing so:

Describe which scientific data from the project will be preserved and shared and provide the rationale for this decision.

The Penn State Institutional Review Board determined that the project was exempt. Summary results are shared publicly via the web.

At a later date, we may seek permission from the IRB to share non-identifiable case-level data with the public. But we did not request individual-level sharing permission from participants at the outset of the study.

C. Metadata, other relevant data, and associated documentation:

Briefly list the metadata, other relevant data, and any associated documentation (e.g., study protocols and data collection instruments) that will be made accessible to facilitate interpretation of the scientific data.

The full protocol, including survey questions is shared publicly via a website on the code/documentation sharing website, GitHub.

Element 3: Standards

State what common data standards will be applied to the scientific data and associated metadata to enable interoperability of datasets and resources, and provide the name(s) of the data standards that will be applied and describe how these data standards will be applied to the scientific data generated by the research proposed in this project. If applicable, indicate that no consensus standards exist.

There are no consensus standards that apply to these data to our knowledge. However, our publicly shared data cleaning code documents that ways that we modified the original dataset to make it more human-readable. CSV text files are considered to be interoperable.

Element 4: Data Preservation, Access, and Associated Timelines

A. Repository where scientific data and metadata will be archived:

Provide the name of the repository(ies) where scientific data and metadata arising from the project will be archived; see Selecting a Data Repository.

When the project is complete and we receive permission to do so, the data may be archived on Databrary (https://databrary.org). Metadata and summary data/visualizations will be archived on GitHub, https://penn-state-open-science.github.io/survey-fall-2022/.

B. How scientific data will be findable and identifiable:

Describe how the scientific data will be findable and identifiable, i.e., via a persistent unique identifier or other standard indexing tools.

If the project data are shared publicly, Databrary will create a persistent identifier (e.g., DOI).

C. When and how long the scientific data will be made available:

Describe when the scientific data will be made available to other users (i.e., no later than time of an associated publication or end of the performance period, whichever comes first) and for how long data will be available.

Summary data are available now. Individual-level data, if shared, will be made immediately available to the public.

Element 5: Access, Distribution, or Reuse Considerations:

A. Factors affecting subsequent access, distribution, or reuse of scientific data:

NIH expects that in drafting Plans, researchers maximize the appropriate sharing of scientific data. Describe and justify any applicable factors or data use limitations affecting subsequent access, distribution, or reuse of scientific data related to informed consent, privacy and confidentiality protections, and any other considerations that may limit the extent of data sharing. See Frequently Asked Questions for examples of justifiable reasons for limiting sharing of data.

Some respondents elected to provide us contact information for follow-up discussions. Those data elements will be removed from the data before sharing. The identifying information (emails, names) are stored on a local, password-protected, computer that is used to render process the data and render the analyses.

B. Whether access to scientific data will be controlled:

State whether access to the scientific data will be controlled (i.e., made available by a data repository only after approval).

Databrary does not approve data shared by researchers. Authorization to upload and share data is governed by a formal Databrary Access Agreement that binds Databrary and New York University with a researcher’s home institution. The DAA gives institutionally authorized researchers the right to share data and materials with Databrary consistent with the policies of their institution, ethics board approvals, and the permission of research participants. Similarly, while researchers can share data on Databrary and control or approve individual access themselves, Databrary does not require it.

Once shared, data from this project will be available to anyone.

C. Protections for privacy, rights, and confidentiality of human research participants:

If generating scientific data derived from humans, describe how the privacy, rights, and confidentiality of human research participants will be protected (e.g., through de-identification, Certificates of Confidentiality, and other protective measures).

We will remove names, phone numbers, and email addresses (if provided) before sharing any individual-level data.

IRB submission history

Data Gathering and Cleaning