Many of the ideals and methods in reproducible research can, and often are, be applied to data journalism.
At the New Zealand Herald we have been putting some of these principals to work. In particular automation, we try to ensure that any interactive article can be completely rebuilt and deployed via a single command when the source data is updated.
All our interactive articles are based on a template that incorporates most of the logic we need to do this.
Workflowr is one of a number of tools designed to make it easier to share not only the final results of an analysis, but also the steps taken along the way. This is something I would like to be able to do with our data journalism.
Workflowr’s vignettes are fantastic and provide a good guide to getting started with the package. Here I just want to outline the steps I use to getting a project up and running with Workflowr — and drake, renv, and git.
Workflowr provides a way to create project directories and setup git repositories. But I like to manage my own. Assumes git is installed.
git init project-name
I use renv to manage a project’s dependencies. Needs R with the renv library.
cd project-name Rscript -e 'renv::init()'
Install the R libraries most projects need.
Rscript -e "renv::install(c('drake', 'workflowr', 'tidyverse'))"
I usually create a new RStudio project pointed at the git repository at this stage.
Initialise the workflowr project
Assuming you have an R session running in your project directory run the following commands.
library(workflowr) wflow_start('.', name = 'Project Name', existing = T)
Workflowr will build your analysis - assuming you follow worflowr’s patterns - into a
sharable website. The file the controls the appearence of the website is
But default it looks like this:
name: "Project Name" output_dir: ../docs navbar: title: "Project Name" left: - text: Home href: index.html - text: About href: about.html - text: License href: license.html output: workflowr::wflow_html: toc: yes toc_float: yes theme: cosmo highlight: textmate
I like to change it too:
name: "Project Name" output_dir: ../docs navbar: title: "Project Name" left: - text: Home href: index.html - text: About href: about.html - text: License href: license.html right: - icon: fa-github text: Source code href: https://github.com/nzherald/project-name output: workflowr::wflow_html: toc: yes theme: readable highlight: textmate css: nzh-style.css dev: svg includes: in_header: header.html before_body: doc_prefix.html after_body: doc_suffix.html
This has the effect of:
I usually set the license file too:
All source code and software in this repository are made available under the terms of the [MIT license](https://opensource.org/licenses/mit-license.html). Note that the data is released under different agreements - these will be detailed here prior to publication.
Then commit all the files and push to your git repository.
The next step is to use drake to grab some data to analyse - but that is the topic of the next post.