This block is all about grouping; grouping of similar observations, areas, records… We start by discussing why grouping, or clustering in statistical parlance, is important and what it can do for us. Then we move on different types of clustering. We focus on two: one is traditional non-spatial clustering, or unsupervised learning, for which we cover the most popular technique; the other one is explicitly spatial clustering, or regionalisation, which imposes additional (geographic) constraints when grouping observations.
The need to group data#
This video motivates the block: what do we mean by “grouping data” and why is it useful?
Non-spatial clustering is the most common form of data grouping. In this section, we cover the basics and mention a few approaches. We wrap it up with an example of clustering very dear to human geography: geodemographics.
In the clip above, we talk about K-Means, by far the most common clustering algorithm. Watch the video on the expandable to get the intuition behind the algorithm and better understand how it does its “magic”.
Show code cell outputs Hide code cell outputs
For a striking visual comparison of how K-Means compares to other clustering algorithms, check out this figure produced by the
scikit-learn project, a Python package for machine learning (more on this later):
If you are interested in Geodemographics, a very good reference to get a broader perspective on the idea, origins and history of the field is “The Predictive Postcode” webber2018predictive, by Richard Webber and Roger Burrows. In particular, the first four chapters provide an excellent overview.
Furthermore, the clip mentions the Output Area Classification (OAC), which you can access, for example, through the CDRC Maps platform:
Regionalisation is explicitly spatial clustering. We cover the conceptual basics in the following clip:
If you are interested in the idea of regionalisation, a very good place to continue reading is Duque et al. (2007) duque2007supervised, which was an important inspiration in structuring the clip.
A similar coverage of clustering and regionalisation as provided here, but with a bit more detail, is available on the corresponding chapter of the GDS book (in progress) reyABwolf.