In this 3h. session we introduce the concepts of workflow, openness and reproducability. In the first part, We argue why they are important and what as social scientists we can learn from data scientists. Our main argument is that, even though in the social sciences complete reproducability is often infeasible, we should strive for research to become as reproducable as possible.
In the second part we lay out the road-map for the rest of the workshop. Most importantly, we explain why in this workshop we make use a set of particular tools, namely:
R
and RStudio
(with Yihui Xie’s knitr
package)Bibdesk
/Mendeley
Git
and Github
GNU make
We are aware that using a particular data analysis tool is costly in terms of time investment and is in terms of preferences and needs ideosyncratic. However, in this workshop we decided to make use of the combination R
and RStudio
for two main reasons: (i) it works the best out of the box for our purposes and (ii) at the moment most researchers probably work with this combination for reproducability (at least it gets the biggest buzz…)
You will need several tools to be installed on your machine to follow the workshop along with your laptop. Head over to the Requirements page to see how to install them if you haven’t yet.
After this session you should:
Reproducability :
Rstudio
environment (though slightly more advanced and does not work immediately out of the box.)