Prepare LSOA/MSOA table for Liverpool#

We need the following two datasets:

  • LSOAs originally downloaded from the CDRC data store (original link).

  • LSOA to MSOA crosswalk from ONS.

LSOAs come from the IMD package from the CDRC. The dataset was most easily downloaded from the CDRC data store (link) and, since it already comes both in tabular as well as spatial data format (shapefile), it does not need merging or joining to additional geometries.

In addition, we will be using the lookup between LSOAs and Medium Super Output Areas (MSOAs), which can be downloaded on this link. This connects each LSOA polygon to the MSOA they belong to. MSOAs are a coarser geographic delineation from the Office of National Statistics (ONS), within which LSOAs are nested. That is, no LSOA boundary crosses any of an MSOA.

import pandas
import geopandas
  • We read the LSOAs

lsoas = geopandas.read_file("../../E08000012_IMD/shapefiles/E08000012.shp")
lsoas.info()
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 298 entries, 0 to 297
Data columns (total 13 columns):
 #   Column      Non-Null Count  Dtype   
---  ------      --------------  -----   
 0   LSOA11CD    298 non-null    object  
 1   imd_rank    298 non-null    int64   
 2   imd_score   298 non-null    float64 
 3   income      298 non-null    float64 
 4   employment  298 non-null    float64 
 5   education   298 non-null    float64 
 6   health      298 non-null    float64 
 7   crime       298 non-null    float64 
 8   housing     298 non-null    float64 
 9   living_env  298 non-null    float64 
 10  idaci       298 non-null    float64 
 11  idaopi      298 non-null    float64 
 12  geometry    298 non-null    geometry
dtypes: float64(10), geometry(1), int64(1), object(1)
memory usage: 30.4+ KB
  • We also need the crosswalk between LSOA and MSOA

cw = pandas.read_csv("../../E08000012_IMD/OA11_LSOA11_MSOA11_LAD11_EW_LUv2.csv", 
                     encoding="iso-8859-1"
                    )
cw.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 181408 entries, 0 to 181407
Data columns (total 8 columns):
 #   Column    Non-Null Count   Dtype 
---  ------    --------------   ----- 
 0   OA11CD    181408 non-null  object
 1   LSOA11CD  181408 non-null  object
 2   LSOA11NM  181408 non-null  object
 3   MSOA11CD  181408 non-null  object
 4   MSOA11NM  181408 non-null  object
 5   LAD11CD   181408 non-null  object
 6   LAD11NM   181408 non-null  object
 7   LAD11NMW  10036 non-null   object
dtypes: object(8)
memory usage: 11.1+ MB
/opt/conda/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3072: DtypeWarning: Columns (7) have mixed types.Specify dtype option on import or set low_memory=False.
  interactivity=interactivity, compiler=compiler, result=result)
  • Grab MSOA codes for Liverpool LSOA

msoas = cw[['LSOA11CD', 'MSOA11CD']]\
          .drop_duplicates(keep='last')\
          .set_index('LSOA11CD')
  • Build the table

msoas.head()
MSOA11CD
LSOA11CD
E01000002 E02000001
E01032740 E02000001
E01000005 E02000001
E01000009 E02000017
E01000008 E02000016
db = lsoas.join(msoas, on="LSOA11CD")\
          [["LSOA11CD", "MSOA11CD", "geometry"]]
db.info()
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 298 entries, 0 to 297
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype   
---  ------    --------------  -----   
 0   LSOA11CD  298 non-null    object  
 1   MSOA11CD  298 non-null    object  
 2   geometry  298 non-null    geometry
dtypes: geometry(1), object(2)
memory usage: 7.1+ KB
  • Write as Geopackage

! rm -f liv_lsoas.gpkg
db.to_file("liv_lsoas.gpkg", driver="GPKG")