Prepare Brexit dataset

Prepare Brexit dataset#

This notebook shows how the table with results from the EU referendum has been assembled for this course.

The dataset is the same as the Brexit data used in the GDS Book reyABwolf, but here we have just assembled it into a single file that can be read remotely. For more details on sources, please refer to:

https://geographicdata.science/book/data/brexit/README.html

import pandas, geopandas

Referendum results#

# Updated to work on Sep. 15th 2020
url = ("https://www.electoralcommission.org.uk/sites/default/"
       "files/2019-07/EU-referendum-result-data.csv")
url

'https://www.electoralcommission.org.uk/sites/default/files/2019-07/EU-referendum-result-data.csv'

ref_res = pandas.read_csv(url)
ref_res.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 382 entries, 0 to 381
Data columns (total 21 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 id                       382 non-null    int64  
 Region_Code              382 non-null    object 
 Region                   382 non-null    object 
 Area_Code                382 non-null    object 
 Area                     382 non-null    object 
 Electorate               382 non-null    int64  
 ExpectedBallots          382 non-null    int64  
 VerifiedBallotPapers     382 non-null    int64  
 Pct_Turnout              382 non-null    float64
 Votes_Cast               382 non-null    int64  
Valid_Votes              382 non-null    int64  
Remain                   382 non-null    int64  
Leave                    382 non-null    int64  
Rejected_Ballots         382 non-null    int64  
No_official_mark         382 non-null    int64  
Voting_for_both_answers  382 non-null    int64  
Writing_or_mark          382 non-null    int64  
Unmarked_or_void         382 non-null    int64  
Pct_Remain               382 non-null    float64
Pct_Leave                382 non-null    float64
Pct_Rejected             382 non-null    float64
dtypes: float64(4), int64(13), object(4)
memory usage: 62.8+ KB

Local authorities#

As of Sep. 15th 2020, the local authority boundaries could be accessed from the following site:

https://data.gov.uk/dataset/65f48bab-e65f-491c-90f5-729eef098196/local-authority-districts-december-2016-generalised-clipped-boundaries-in-the-uk-wgs84

We pull the GeoJSON:

url = ("http://geoportal1-ons.opendata.arcgis.com/datasets/"\
       "e3634984fe1143da9fb31671627f5443_2.geojson"\
       "?outSR={%22latestWkid%22:4326,%22wkid%22:4326}")
url

'http://geoportal1-ons.opendata.arcgis.com/datasets/e3634984fe1143da9fb31671627f5443_2.geojson?outSR={%22latestWkid%22:4326,%22wkid%22:4326}'

lads = geopandas.read_file(url)
lads.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 391 entries, 0 to 390
Data columns (total 11 columns):
 #   Column          Non-Null Count  Dtype   
---  ------          --------------  -----   
 0   objectid        391 non-null    int64   
 1   lad16cd         391 non-null    object  
 2   lad16nm         391 non-null    object  
 3   lad16nmw        391 non-null    object  
 4   bng_e           391 non-null    int64   
 5   bng_n           391 non-null    int64   
 6   long            391 non-null    float64 
 7   lat             391 non-null    float64 
 8   st_areashape    391 non-null    float64 
 9   st_lengthshape  391 non-null    float64 
 10  geometry        391 non-null    geometry
dtypes: float64(4), geometry(1), int64(3), object(3)
memory usage: 33.7+ KB

Join#

Link up both tables, keep only required columns for a simpler and slimmer table, and drop areas without values:

db = lads.join(ref_res.set_index("Area_Code"), on="lad16cd")\
         [['objectid', 'lad16cd', 'lad16nm', 'Pct_Leave', 'geometry']]\
         .dropna()
db.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
Int64Index: 380 entries, 0 to 390
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype   
---  ------     --------------  -----   
 0   objectid   380 non-null    int64   
 1   lad16cd    380 non-null    object  
 2   lad16nm    380 non-null    object  
 3   Pct_Leave  380 non-null    float64 
 4   geometry   380 non-null    geometry
dtypes: float64(1), geometry(1), int64(1), object(2)
memory usage: 17.8+ KB

Write out#

We save as a Geopackage:

! rm -f brexit.gpkg
db.to_file("brexit.gpkg", driver="GPKG")