Geographic Data Science - Lecture VII

Grouping Data over Space

Dani Arribas-Bel

Today

  • The need to group data
  • Geodemographic analysis
  • Non-spatial clustering
  • Regionalization
  • Examples “in the wild”

The need to group data

Everything should be made as simple as possible, but not simpler

Albert Einstein

The need to group data

  • The world is complex and multidimensional
  • Univariate analysis focuses on only one dimension
  • Sometimes, world issues are best understood as multivariate. E.g.
    • Percentage of foreign-born Vs. What is a neighborhood?
    • Years of schooling Vs. Human development
    • Monthly income Vs. Deprivation

Grouping as simplifying

  • Define a given number of categories based on many characteristics (multi-dimensional)
  • Find the category where each observation fits best
  • Reduce complexity, keep all the relevant information
  • Produce easier-to-understand outputs

Geodemographic analysis

Geodemographic analysis

  • 1970’s, Richard Webber
  • Identify similar neighborhoods Target urban deprivation funding
  • Public Sector (policy) Private sector (marketing and business intelligence)
Predictive Postcode

Source

How do you segment/cluster observations over space?

  • Statistical clustering
  • Explicitly spatial clustering (regionalization)

Non-spatial clustering

Split a dataset into groups of observations that are similar within the group and dissimilar between groups, based on a series of attributes

Machine learning

The computer learns some of the properties of the dataset without the human specifying them

Unsupervised

There is no a-priori structure imposed on the classification before the analysis, no observations is in a category

Intuition

Clustering

K-means [Source]

More clustering…

  • Hierarchical clustering
  • Agglomerative clustering
  • Spectral clustering
  • Neural networks (e.g. Self-Organizing Maps)
  • DBSCAN

Different properties, different best usecases

See interesting comparison table

Regionalization

Unsupervised Spatial Machine Learning

Aggregating basic spatial units (areas) into larger units (regions)

Regionalization

Split a dataset into groups of observations that are similar within the group and dissimilar between groups, based on a series of attributes

…with the additional constraint observations need to be spatial neighbors

Regionalization

  • All the methods aggregate geographical areas into a predefined number of regions, while optimizing a particular aggregation criterion;
  • The areas within a region must be geographically connected (the spatial contiguity constraint);
  • The number of regions must be smaller than or equal to the number of areas;
  • Each area must be assigned to one and only one region;
  • Each region must contain at least one area.

Duque et al. (2007)

Regionalization

  • All the methods aggregate geographical areas into a predefined number of regions, while optimizing a particular aggregation criterion;
  • The areas within a region must be geographically connected (the spatial contiguity constraint);
  • The number of regions must be smaller than or equal to the number of areas;
  • Each area must be assigned to one and only one region;
  • Each region must contain at least one area.

Duque et al. (2007)

Algorithms

  • Automated Zoning Procedure (AZP)
  • Arisel
  • Max-P

See Duque et al. (2007) for an excellent, though advanced, overview

Examples

Non-spatial clustering

Regionalisation

Census geographies

Choropleth

Livehoods

Recapitulation

  • Some problems are truly highly dimensional and univariate representations are not appropriate
  • Clustering can help reduce complexity by creating categories that retain statistical information but are easier to understand
  • Two main types of clustering in this context:
    • Geo-demographic analysis
    • Regionalization

<a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a><br /><span xmlns:dct="http://purl.org/dc/terms/" property="dct:title">Geographic Data Science'19</span> by <a xmlns:cc="http://creativecommons.org/ns#" href="http://darribas.org" property="cc:attributionName" rel="cc:attributionURL">Dani Arribas-Bel</a> is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>.