<- openai::create_image(prompt = "good enough data management practices", size="512x512", response_format = "url") ai_img
About
This page has links and images that support the material presented in the workshop.
Information sources
Here are links to some of the sources mentioned during the workshop presentations.
Good enough data management and project management practices (Alaina Pearce)
Talk slides (PDF).
The Turing Way handbook to reproducible, ethical and collaborative data science.
Goodman, A., Pepe, A., Blocker, A. W., Borgman, C. L., Cranmer, K., Crosas, M., Di Stefano, R., Gil, Y., Groth, P., Hedstrom, M., Hogg, D. W., Kashyap, V., Mahabal, A., Siemiginowska, A., & Slavkovic, A. (2014). Ten simple rules for the care and feeding of scientific data. PLoS Computational Biology, 10(4), e1003542. https://doi.org/10.1371/journal.pcbi.1003542.
Version control
- GitHub, a web service for version control using git.
- Subversion, a web service for version control using git
Penn State Open Science GitHub repo
Wickham, H. (2014). Tidy Data, Journal of Statistical Software, https://doi.org/10.18637/jss.v059.i10.
Data management policies (Briana Wham and Ana Enriquez)
Talk slides (pptx)
The (Dr. Alondra) Nelson memo on “Ensuring Free, Immediate, and Equitable Access to Federally Funded Research”
Data Management Plan Tool (DMPTool)
Penn State Policy IP01 “Ownership and Management of Intellectual Property (Formerly RA11 - Patents and Copyrights [Intellectual Property])”
Case study
- Sample “untidy” data
- Problem with MS Excel’s automatic conversion names for genes
Discussion
- Catch-22 and Joseph Heller’s career at Penn State
Consulting OpenAI
Just for fun, we thought we would ask OpenAI for some insights about data management practices.
The following code sketch is not run and will not work as-is, but is used as a sketch for how to do this sort of thing.
Similarly, the response below was generated from the prompt “suggestions about good enough data management practices”.
create_chat_completion(model = "gpt-3.5-turbo",
messages = list(
list("role" = "user",
"content" = "suggestions about good enough data management practices")
))
- Regularly backup your data to prevent data loss in case of system failure or accidental deletion.
- Keep your data organized by creating folders and subfolders with clear and consistent naming conventions.
- Use encryption to protect sensitive data from unauthorized access.
- Implement access controls to ensure that only authorized individuals can view or modify certain data.
- Regularly review and update data security protocols to stay ahead of potential threats.
- Dispose of outdated data securely and in line with data protection regulations.
- Conduct regular data audits to ensure the accuracy and completeness of your data.
- Train employees on data management best practices to ensure everyone is following proper procedures.
- Utilize data management software or tools to automate and streamline data processes.
- Seek guidance from data management experts or consultants for additional strategies and recommendations.