Reproducible workflows

…with R Markdown, Quarto

Rick Gilmore

Psychology/CSC

Jennifer Valcin

PSU Libraries

Motivation

What’s your project’s bus number?

  • Could a colleague regenerate/reproduce your work?
  • Methods reproducibility (Goodman, Fanelli, & Ioannidis, 2016)
    • Precursor to/prerequisite for other kinds of reproducibility1

DRY WIT

  • Don’t Repeat Yourself
    • Script/automate
    • Use functions
  • Write It Down
    • It = what you did and how
    • In a form someone else can make use of

  • One tool
  • Transparent, reproducible, version-controlled
  • Many outputs…

Background

What is R Markdown?

Donald Knuth (Roberts, 2018)

Literate programming is a methodology that combines a programming language with a documentation language, thereby making programs more robust, more portable, more easily maintained…

(Knuth, n.d.)

Donald Knuth (Roberts, 2018)

…and arguably more fun to write than programs that are written only in a high-level language. The main idea is to treat a program as a piece of literature, addressed to human beings rather than to a computer.

(Knuth, n.d.)

Don’t worry

  • You don’t have to be a programmer, or a good programmer, to use these tools!

  • But using them will probably inspire you to become a programmer or a better one.

  • And you don’t have to use R. The tools support other languages1.

  • The main idea is to treat a program as a piece of literature, addressed to human beings rather than to a computer.(Knuth, n.d.)

Markdown

  • Write human-readable text documents
    • Words, images, videos, web links
  • Convert to computer-readable documents (.html, .pdf, .docx, .pptx)
# My brilliant report

*Rick Gilmore*

## Introduction

A **really** important point^[I mean really]
## Methods

$$1+1=2$$

| Letters | Numbers |
|--------:|:-------:|
| A      | 1      |
| B      | 2      |
[Best Bootcamp Evah!](https://penn-state-open-science.github.io/bootcamp-2023/)

![https://imgs.xkcd.com/comics/free_fallin.png](https://imgs.xkcd.com/comics/free_fallin.png)

https://quarto.org/docs/authoring/markdown-basics.html

R Markdown

  • adds executable computer code in chunks
x <- 1 + 1
x
[1] 2
  • In multiple languages

Benefits of using R Markdown for Open Science

  • Seamless integration of data analysis and reporting
  • Dynamic documents for real-time updates
  • Wide variety of output formats (HTML, PDF, Word, etc.)
  • Version control and collaboration with Git/GitHub
  • Enhances both research transparency and efficiency

Reproducible Data Analysis with R Markdown

  • Project structure and file organization
  • Managing data and code files
  • Demonstrating reproducibility in R Markdown steps
  • Rerunning code chunks and regenerating outputs
  • Utilizing R package checkpoints for consistent environments

Visualizations and Results Reporting

  • Create interactive and static visualizations
  • Utilize popular R packages for data visualization
  • Incorporate interactive plots using Shiny and other tools
  • Customize and styling output documents
  • Themes and templates for consistent branding
  • Control layout and appearance of the final report

Collaboration and Sharing in Open Science

  • Public accessibility and transparency of scientific communication
  • Web-based tools facilitate scientific collaboration
  • Using Git/GitHub for version control
  • Creating and cloning repositories
  • Collaborating with colleagues in a shared repository
  • Sharing R Markdown reports with the community

Benefits of using coding languages like R

  • Reproducibility
    • Code-based workflows allow for easy replication and sharing of analyses.
    • Results can be reproduced precisely, ensuring transparency and accountability.
  • Flexibility and Customization
    • Coding languages offer more control over data manipulation, analysis, and visualization.
    • Users can create custom functions and packages tailored to their specific needs.

Benefits of using coding languages like R

  • Automation and Efficiency
    • Automate repetitive tasks and streamline data analysis pipelines.
    • Batch processing and scripting enable efficient handling of large datasets.
  • Integration with Open Science Practices
    • R facilitates reproducible research with tools like R Markdown and version control.
    • Supports data sharing, collaboration, and open-source development.

Benefits of using coding languages like R

  • Cost and Accessibility
    • R is open-source and free to use, making it accessible to researchers and students worldwide.
    • Reduces the need for expensive software licenses.
  • Continuous Development and Innovation
    • Coding languages continuously evolve with new features and improvements.
    • Users can leverage cutting-edge statistical methods and techniques.

Benefits of using coding languages like R

  • Scalability
    • Coding languages can handle larger datasets and more complex analyses.
    • Ideal for research with growing data requirements.
  • Data Privacy and Security
    • Local data processing in coding languages offers more control over data security.
    • Especially relevant for sensitive or confidential data.

Challenges in Reproducibility with Point and Click Programs

  • Often hides certain data manipulation steps performed in the background.
  • Users may overlook or forget to document these steps, leading to non-reproducible results.
  • Sharing analyses with others can be challenging.
  • Recipients may not have access to the same version or settings, causing discrepancies.
  • Inadequate tracking of changes with no version control, decreasing transparency and reproducibility.

Challenges in Reproducibility with Point and Click Programs

  • Manual data manipulation is prone to human error.
  • Limited scripting capabilities.
  • Vendor lock-in ($$$!) and dependency on the software’s availability and updates.
  • Outputs lack visibility into the underlying code and calculations.
  • Difficult to assess the validity of results without access to the analysis steps.
    • Large-Scale projects present a challenge.
    • Difficult to maintain consistency and reproducibility over time.

Using R Markdown you can create:

  • Readable documents
  • Dashboards
  • Interactive documents
  • Presentations
  • Books
  • Websites and more!

Specific use cases (ROG)

How to start

  • Use R? Consider RStudio.
  • Use Python or other language? Consider VS Code or RStudio.
  • Use SPSS? You can still use the tools to document (but not execute) your workflow.
  • Quarto Qrew
  • University Libraries workshops this fall

What can you make with R Markdown/Quarto?

Resources

Online

Quarto

This talk was prepared using Quarto. Quarto enables you to weave together content and executable code into a finished presentation. To learn more about Quarto presentations see https://quarto.org/docs/presentations/.

RStudio

RStudio is an integrated development environment (IDE) for R and Python.
It is a free and open source software program provided by Posit.

GitHub

The files are rendered into a web site that is hosted on GitHub https://github.com.

GitHub is a web service for sharing computer code. It has a “pages” feature that also allows the (free) hosting of simple websites.

GitHub supports git, a computer program used to put documents under version control.

References

Allaire, J., Xie, Y., Dervieux, C., McPherson, J., Luraschi, J., Ushey, K., … Iannone, R. (2023). Rmarkdown: Dynamic documents for r. Retrieved from https://github.com/rstudio/rmarkdown
Goodman, S. N., Fanelli, D., & Ioannidis, J. P. A. (2016). What does research reproducibility mean? Science Translational Medicine, 8(341), 341ps12–341ps12. https://doi.org/10.1126/scitranslmed.aaf5027
Knuth, D. E. (n.d.). Literate programming. https://www-cs-faculty.stanford.edu/~knuth/lp.html. Retrieved from https://www-cs-faculty.stanford.edu/~knuth/lp.html
Knuth, D. E. (1984). Literate programming. Computer Journal, 27(2), 97–111. https://doi.org/10.1093/comjnl/27.2.97
R Core Team. (2023). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Retrieved from https://www.R-project.org/
Roberts, S. (2018). The yoda of silicon valley. The New York Times. Retrieved from https://www.nytimes.com/2018/12/17/science/donald-knuth-computers-algorithms-programming.html
Xie, Y., Allaire, J. J., & Grolemund, G. (2018). R markdown: The definitive guide. Boca Raton, Florida: Chapman; Hall/CRC. Retrieved from https://bookdown.org/yihui/rmarkdown
Xie, Y., Dervieux, C., & Riederer, E. (2020). R markdown cookbook. Boca Raton, Florida: Chapman; Hall/CRC. Retrieved from https://bookdown.org/yihui/rmarkdown-cookbook