About

This page has links and images that support the material presented in the workshop.

Information sources

Here are links to some of the sources mentioned during the workshop presentations.

Good enough data management and project management practices (Alaina Pearce)

  • Talk slides (PDF).

  • The Turing Way handbook to reproducible, ethical and collaborative data science.

  • Goodman, A., Pepe, A., Blocker, A. W., Borgman, C. L., Cranmer, K., Crosas, M., Di Stefano, R., Gil, Y., Groth, P., Hedstrom, M., Hogg, D. W., Kashyap, V., Mahabal, A., Siemiginowska, A., & Slavkovic, A. (2014). Ten simple rules for the care and feeding of scientific data. PLoS Computational Biology, 10(4), e1003542. https://doi.org/10.1371/journal.pcbi.1003542.

  • Version control

    • GitHub, a web service for version control using git.
    • Subversion, a web service for version control using git
  • Penn State Open Science GitHub repo

  • Wickham, H. (2014). Tidy Data, Journal of Statistical Software, https://doi.org/10.18637/jss.v059.i10.

Data management policies (Briana Wham and Ana Enriquez)

Case study

  • Sample “untidy” data

Comments on untidy data sample

Comments on untidy data sample

Discussion

Consulting OpenAI

Just for fun, we thought we would ask OpenAI for some insights about data management practices.

The following code sketch is not run and will not work as-is, but is used as a sketch for how to do this sort of thing.

ai_img <- openai::create_image(prompt = "good enough data management practices", size="512x512", response_format = "url")

OpenAI response to “good enough data management practices”

Similarly, the response below was generated from the prompt “suggestions about good enough data management practices”.

create_chat_completion(model = "gpt-3.5-turbo",
                       messages = list(
                         list("role" = "user",
                              "content" = "suggestions about good enough data management practices")
                       ))
  1. Regularly backup your data to prevent data loss in case of system failure or accidental deletion.
  2. Keep your data organized by creating folders and subfolders with clear and consistent naming conventions.
  3. Use encryption to protect sensitive data from unauthorized access.
  4. Implement access controls to ensure that only authorized individuals can view or modify certain data.
  5. Regularly review and update data security protocols to stay ahead of potential threats.
  6. Dispose of outdated data securely and in line with data protection regulations.
  7. Conduct regular data audits to ensure the accuracy and completeness of your data.
  8. Train employees on data management best practices to ensure everyone is following proper procedures.
  9. Utilize data management software or tools to automate and streamline data processes.
  10. Seek guidance from data management experts or consultants for additional strategies and recommendations.