import geopandas
import contextily

Task I: NYC Geodemographics

We are going to try to get at the (geographic) essence of New York City. For that, we will rely on the same set up Census tracts for New York City we used a few blocks ago. Once you have the nyc object loaded, create a geodemographic classification using the following variables:

  • european: Total Population White

  • asian: Total Population Asian American

  • american: Total Population American Indian

  • african: Total Population African American

  • hispanic: Total Population Hispanic

  • mixed: Total Population Mixed race

  • pacific: Total Population Pacific Islander

For this, make sure you standardise the table by the size of each tract. That is, compute a column with the total population as the sum of all the ethnic groups and divide each of them by that column. This way, the values will range between 0 (no population of a given ethnic group) and 1 (all the population in the tract is of that group).

Once this is ready, get to work with the following tasks:

  1. Pick a number of clusters (e.g. 10)

  2. Run K-Means for that number of clusters

  3. Plot the different clusters on a map

  4. Analyse the results:

    • What do you find?

    • What are the main characteristics of each cluster?

    • How are clusters distributed geographically?

    • Can you identify some groups concentrated on particular areas (e.g. China Town, Little Italy)?

Task II: Regionalisation of Dar Es Salaam

For this task we will travel to Tanzania’s Dar Es Salaam. We are using a dataset assembled to describe the built environment of the city centre. Let’s load up the dataset before anything:

# Read the file in
db = geopandas.read_file(


Instead of reading the file directly off the web, it is possible to download it manually, store it on your computer, and read it locally. To do that, you can follow these steps:

  1. Download the file by right-clicking on this link and saving the file

  2. Place the file on the same folder as the notebook where you intend to read it

  3. Replace the code in the cell above by:

br = geopandas.read_file("dar_es_salaam.geojson")

Geographically, this is what we are looking at:

ax = db.plot(
    figsize=(9, 9)

We can inspect the table:
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 1291 entries, 0 to 1290
Data columns (total 7 columns):
 #   Column             Non-Null Count  Dtype   
---  ------             --------------  -----   
 0   index              1291 non-null   object  
 1   id                 1291 non-null   object  
 2   street_length      1291 non-null   float64 
 3   street_linearity   1291 non-null   float64 
 4   building_density   1291 non-null   float64 
 5   building_coverage  1291 non-null   float64 
 6   geometry           1291 non-null   geometry
dtypes: float64(4), geometry(1), object(2)
memory usage: 70.7+ KB

Two main aspects of the built environment are considered: the street network and buildings. To capture those, the following variables are calculated at for the H3 hexagonal grid system, zoom level 8:

  • Building density: number of buildings per hexagon

  • Building coverage: proportion of the hexagon covered by buildings

  • Street length: total length of streets within the hexagon

  • Street linearity: a measure of how regular the street network is

With these at hand, your task is the following:

Develop a regionalisation that partitions Dar Es Salaam based on its built environment

For that, you can follow these suggestions:

  • Create a spatial weights matrix to capture spatial relationships between hexagons

  • Set up a regionalisation algorithm with a given number of clusters (e.g. seven)

  • Generate a geography that contains only the boundaries of each region and visualise it (ideally with a satellite image as basemap for context)

  • Rinse and repeat with several combinations of variables and number of clusters

  • Pick your best. Why have you selected it? What does it show? What are the main groups of areas based on the built environment?