Prepare Brexit dataset

This notebook shows how the table with results from the EU referendum has been assembled for this course.

The dataset is the same as the Brexit data used in the GDS Book :cite:`reyABwolf`, but here we have just assembled it into a single file that can be read remotely. For more details on sources, please refer to:

import pandas, geopandas

Referendum results

# Updated to work on Sep. 15th 2020
url = ("https://www.electoralcommission.org.uk/sites/default/"
       "files/2019-07/EU-referendum-result-data.csv")
url
'https://www.electoralcommission.org.uk/sites/default/files/2019-07/EU-referendum-result-data.csv'
ref_res = pandas.read_csv(url)
ref_res.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 382 entries, 0 to 381
Data columns (total 21 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   id                       382 non-null    int64  
 1   Region_Code              382 non-null    object 
 2   Region                   382 non-null    object 
 3   Area_Code                382 non-null    object 
 4   Area                     382 non-null    object 
 5   Electorate               382 non-null    int64  
 6   ExpectedBallots          382 non-null    int64  
 7   VerifiedBallotPapers     382 non-null    int64  
 8   Pct_Turnout              382 non-null    float64
 9   Votes_Cast               382 non-null    int64  
 10  Valid_Votes              382 non-null    int64  
 11  Remain                   382 non-null    int64  
 12  Leave                    382 non-null    int64  
 13  Rejected_Ballots         382 non-null    int64  
 14  No_official_mark         382 non-null    int64  
 15  Voting_for_both_answers  382 non-null    int64  
 16  Writing_or_mark          382 non-null    int64  
 17  Unmarked_or_void         382 non-null    int64  
 18  Pct_Remain               382 non-null    float64
 19  Pct_Leave                382 non-null    float64
 20  Pct_Rejected             382 non-null    float64
dtypes: float64(4), int64(13), object(4)
memory usage: 62.8+ KB

Local authorities

As of Sep. 15th 2020, the local authority boundaries could be accessed from the following site:

We pull the GeoJSON:

url = ("http://geoportal1-ons.opendata.arcgis.com/datasets/"\
       "e3634984fe1143da9fb31671627f5443_2.geojson"\
       "?outSR={%22latestWkid%22:4326,%22wkid%22:4326}")
url
'http://geoportal1-ons.opendata.arcgis.com/datasets/e3634984fe1143da9fb31671627f5443_2.geojson?outSR={%22latestWkid%22:4326,%22wkid%22:4326}'
lads = geopandas.read_file(url)
lads.info()
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 391 entries, 0 to 390
Data columns (total 11 columns):
 #   Column          Non-Null Count  Dtype   
---  ------          --------------  -----   
 0   objectid        391 non-null    int64   
 1   lad16cd         391 non-null    object  
 2   lad16nm         391 non-null    object  
 3   lad16nmw        391 non-null    object  
 4   bng_e           391 non-null    int64   
 5   bng_n           391 non-null    int64   
 6   long            391 non-null    float64 
 7   lat             391 non-null    float64 
 8   st_areashape    391 non-null    float64 
 9   st_lengthshape  391 non-null    float64 
 10  geometry        391 non-null    geometry
dtypes: float64(4), geometry(1), int64(3), object(3)
memory usage: 33.7+ KB

Join

Link up both tables, keep only required columns for a simpler and slimmer table, and drop areas without values:

db = lads.join(ref_res.set_index("Area_Code"), on="lad16cd")\
         [['objectid', 'lad16cd', 'lad16nm', 'Pct_Leave', 'geometry']]\
         .dropna()
db.info()
<class 'geopandas.geodataframe.GeoDataFrame'>
Int64Index: 380 entries, 0 to 390
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype   
---  ------     --------------  -----   
 0   objectid   380 non-null    int64   
 1   lad16cd    380 non-null    object  
 2   lad16nm    380 non-null    object  
 3   Pct_Leave  380 non-null    float64 
 4   geometry   380 non-null    geometry
dtypes: float64(1), geometry(1), int64(1), object(2)
memory usage: 17.8+ KB

Write out

We save as a Geopackage:

! rm -f brexit.gpkg
db.to_file("brexit.gpkg", driver="GPKG")