Open Data Products paper

October 22, 2021

⊕This text was originally posted as a Twitter thread

Together with Mark Green, Francisco Rowe, and Alex Singleton, we have a new paper out in the Journal of Geographical Systems pushing forward the idea of Open Data Products as a vehicle to make (Geographic) Data Science more impactful.

We define Open Data Product as “the open result of transparent processes through which a variety of data (open and not) are turned into accessible information through a service, infrastructure, analytics or a combination of all of them, where each step of development follows open principles. We argue there’re at least three building blocks of an ODP: a need, a value proposition, Geographic Data Science, and outreach. Let me unpack each a bit.

First, the need. ODPs are born out of trying to scratch an itch. We think this process is most useful when co-produced with stake-holders who’ll benefit from it most. The recent award to our own Mark Green and Jacob MacDonald for Local Data Spaces is a great example.

Second, value: it’s not enough to identify a gap, ODPs also need to fill it beyond what is available. This means publishing existing datasets on their own is usually not enough. So, how do you add value?

Third, Geographic Data Science. Geographic Data Science: we argue, given the undergoing data revolution where many datasets are not quite ready for use at the point of release, much of the contribution of ODPs resides in applying (Geographic) Data Science to turn them into analysis-ready.

Fourth, outreach. Just because something is “out there” doesn’t mean people know about it or will use it. Paraphrasing the old say: “if you build it, they’ll come… if you call them, signpost the way, and welcome them”. Another home-grown example of good outreach includes the work led by our own Francisco Rowe and Nikos Patias to inform policies on inequalities in the UK.

In the paper we also identify challenges that will probably make the world see less ODPs than it should, mostly related to incentives & sustainability. We need to be mindful and work around them… And, finally, we close drawing parallels w/ the open source software revolution from 20 years ago. The “new kid in the block” is not software anymore but data, and most research will be unlocked (or not) by what we do with them.

A last note: we started this four years ago and a lot has happened in the world of data. If anything, my personal view is we underestimated the effects of some of the developments we documented.

In any case, it is now out for everyone to read and, hopefully, comment/rant/(dis)agree and stimulate the conversation so we can make the most out of what we believe is the most significant changes for the (social) sciences this century.