Open by Default talk

July 14, 2022

⊕This text was originally posted as a Twitter thread

I’m super glad the lovely @RSSMerseyside/@hipyliv folks put together the “Using open data sources” event and made the recording available:

I contributed the talk “Open by default - Developing reproducible, computational research”. You can find more info about the talk at here. In this post, I’ll provide a bit of background.

⊕
Slide

For the last +2 years, Martin Fleischmann and I have been working on the Urban Grammar project, funded by ESRC and the Alan Turing Institute to develop new ways of understanding (urban) form and function through (satellite) data and machine learning (all project info at the project page). From Day 1, we had very clear we wanted to take an open approach to the project, and that this shouldn’t only apply to its outputs, but also to the process to get there. From that point onward, almost every decision we made in the project was guided by that principle. The talk covers in a (roughly) structured the areas we paid particular attention, namely aspects of process (a.k.a. the “kitchen”) and aspects of output (a.k.a. the “sausage”).

Our taken on “Open by default” (by no means invented by us!) is based on three key principles:

⊕The quote, like so much inspiration, has been blatantly stolen from a paper by Serge Rey (ironically, paywalled).

We’re not claiming this will work for every research project, but it has for a data/computation intensive one like ours. We have discovered many things we didn’t anticipate (mostly how useful it is even just for us to constantly work in the open), and this talk is an attempt to put a bit of structure into those thoughts. The key takeaways though are simple:

In academia, we talk a lot about outputs, but very little about process. How we do research and why different approaches yield different outcomes even if the results are seemingly the same matters. We should discuss it more, share best practices and learn from each other.