London AirBnb dataset prep#
In this notebook we put together the dataset of London properties on AirBnb.
import pandas, geopandas
Source data#
AirBnb listings#
We use the recorded listings in January 2020:
url = ("http://data.insideairbnb.com/united-kingdom/"\
"england/london/2020-01-09/data/listings.csv.gz")
url
'http://data.insideairbnb.com/united-kingdom/england/london/2020-01-09/data/listings.csv.gz'
# Accessed on Sep. 16th 2020
listings = pandas.read_csv(url)
/opt/conda/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3072: DtypeWarning: Columns (61,62,94,95) have mixed types.Specify dtype option on import or set low_memory=False.
interactivity=interactivity, compiler=compiler, result=result)
Let’s keep tidy the variables we will want to pull out:
x = ['id', 'longitude', 'latitude', 'property_type', 'room_type', 'accommodates', \
'bathrooms', 'bedrooms', 'beds', 'price', 'security_deposit', \
'number_of_reviews', 'reviews_per_month', \
'review_scores_rating', 'review_scores_accuracy', 'review_scores_cleanliness', \
'review_scores_checkin', 'review_scores_communication', 'review_scores_location', \
'review_scores_value']
Now let’s turn listings
into a GeoDataFrame
:
pts = geopandas.points_from_xy(listings["longitude"], listings["latitude"])
geo_listings = geopandas.GeoDataFrame(listings[x].assign(geometry=pts),
crs="EPSG:4326")
London geographies#
From the London Datastore, we can download a .zip
file with statistical boundaries for London (as of Sep. 16th 2020):
! wget https://data.london.gov.uk/download/statistical-gis-boundary-files-london/9ba8c833-6370-4b11-abdc-314aa020d5e0/statistical-gis-boundaries-london.zip
--2020-09-16 19:06:02-- https://data.london.gov.uk/download/statistical-gis-boundary-files-london/9ba8c833-6370-4b11-abdc-314aa020d5e0/statistical-gis-boundaries-london.zip
Resolving data.london.gov.uk (data.london.gov.uk)... 99.84.10.83, 99.84.10.96, 99.84.10.34, ...
Connecting to data.london.gov.uk (data.london.gov.uk)|99.84.10.83|:443... connected.
HTTP request sent, awaiting response... 302 Moved Temporarily
Location: https://airdrive-secure.s3-eu-west-1.amazonaws.com/london/dataset/statistical-gis-boundary-files-london/2016-10-03T13%3A52%3A28/statistical-gis-boundaries-london.zip?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAJJDIMAIVZJDICKHA%2F20200916%2Feu-west-1%2Fs3%2Faws4_request&X-Amz-Date=20200916T190556Z&X-Amz-Expires=300&X-Amz-Signature=5bf7051cbe47f79bbdcd6a3b094f3d55d5f0f263bfe5426cff4bcdf4b723fe4d&X-Amz-SignedHeaders=host [following]
--2020-09-16 19:06:02-- https://airdrive-secure.s3-eu-west-1.amazonaws.com/london/dataset/statistical-gis-boundary-files-london/2016-10-03T13%3A52%3A28/statistical-gis-boundaries-london.zip?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAJJDIMAIVZJDICKHA%2F20200916%2Feu-west-1%2Fs3%2Faws4_request&X-Amz-Date=20200916T190556Z&X-Amz-Expires=300&X-Amz-Signature=5bf7051cbe47f79bbdcd6a3b094f3d55d5f0f263bfe5426cff4bcdf4b723fe4d&X-Amz-SignedHeaders=host
Resolving airdrive-secure.s3-eu-west-1.amazonaws.com (airdrive-secure.s3-eu-west-1.amazonaws.com)... 52.218.109.32, 52.218.88.56
Connecting to airdrive-secure.s3-eu-west-1.amazonaws.com (airdrive-secure.s3-eu-west-1.amazonaws.com)|52.218.109.32|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 28666674 (27M) [application/zip]
Saving to: ‘statistical-gis-boundaries-london.zip’
statistical-gis-bou 100%[===================>] 27.34M 2.48MB/s in 12s
2020-09-16 19:06:15 (2.29 MB/s) - ‘statistical-gis-boundaries-london.zip’ saved [28666674/28666674]
Unpack the compressed files:
! unzip statistical-gis-boundaries-london.zip
Archive: statistical-gis-boundaries-london.zip
creating: statistical-gis-boundaries-london/ESRI/
inflating: statistical-gis-boundaries-london/ESRI/London_Borough_Excluding_MHW.dbf
inflating: statistical-gis-boundaries-london/ESRI/London_Borough_Excluding_MHW.GSS_CODE.atx
inflating: statistical-gis-boundaries-london/ESRI/London_Borough_Excluding_MHW.NAME.atx
inflating: statistical-gis-boundaries-london/ESRI/London_Borough_Excluding_MHW.prj
inflating: statistical-gis-boundaries-london/ESRI/London_Borough_Excluding_MHW.sbn
inflating: statistical-gis-boundaries-london/ESRI/London_Borough_Excluding_MHW.sbx
inflating: statistical-gis-boundaries-london/ESRI/London_Borough_Excluding_MHW.shp
inflating: statistical-gis-boundaries-london/ESRI/London_Borough_Excluding_MHW.shp.xml
inflating: statistical-gis-boundaries-london/ESRI/London_Borough_Excluding_MHW.shx
inflating: statistical-gis-boundaries-london/ESRI/London_Ward.BOROUGH.atx
extracting: statistical-gis-boundaries-london/ESRI/London_Ward.cpg
inflating: statistical-gis-boundaries-london/ESRI/London_Ward.dbf
inflating: statistical-gis-boundaries-london/ESRI/London_Ward.GSS_CODE.atx
inflating: statistical-gis-boundaries-london/ESRI/London_Ward.LB_GSS_CD.atx
inflating: statistical-gis-boundaries-london/ESRI/London_Ward.prj
inflating: statistical-gis-boundaries-london/ESRI/London_Ward.sbn
inflating: statistical-gis-boundaries-london/ESRI/London_Ward.sbx
inflating: statistical-gis-boundaries-london/ESRI/London_Ward.shp
inflating: statistical-gis-boundaries-london/ESRI/London_Ward.shp.xml
inflating: statistical-gis-boundaries-london/ESRI/London_Ward.shx
inflating: statistical-gis-boundaries-london/ESRI/London_Ward_CityMerged.BOROUGH.atx
extracting: statistical-gis-boundaries-london/ESRI/London_Ward_CityMerged.cpg
inflating: statistical-gis-boundaries-london/ESRI/London_Ward_CityMerged.dbf
inflating: statistical-gis-boundaries-london/ESRI/London_Ward_CityMerged.GSS_CODE.atx
inflating: statistical-gis-boundaries-london/ESRI/London_Ward_CityMerged.LB_GSS_CD.atx
inflating: statistical-gis-boundaries-london/ESRI/London_Ward_CityMerged.prj
inflating: statistical-gis-boundaries-london/ESRI/London_Ward_CityMerged.sbn
inflating: statistical-gis-boundaries-london/ESRI/London_Ward_CityMerged.sbx
inflating: statistical-gis-boundaries-london/ESRI/London_Ward_CityMerged.shp
inflating: statistical-gis-boundaries-london/ESRI/London_Ward_CityMerged.shp.xml
inflating: statistical-gis-boundaries-london/ESRI/London_Ward_CityMerged.shx
inflating: statistical-gis-boundaries-london/ESRI/LSOA_2004_London_Low_Resolution.dbf
inflating: statistical-gis-boundaries-london/ESRI/LSOA_2004_London_Low_Resolution.prj
inflating: statistical-gis-boundaries-london/ESRI/LSOA_2004_London_Low_Resolution.shp
inflating: statistical-gis-boundaries-london/ESRI/LSOA_2004_London_Low_Resolution.shx
inflating: statistical-gis-boundaries-london/ESRI/LSOA_2011_London_gen_MHW.dbf
inflating: statistical-gis-boundaries-london/ESRI/LSOA_2011_London_gen_MHW.prj
inflating: statistical-gis-boundaries-london/ESRI/LSOA_2011_London_gen_MHW.sbn
inflating: statistical-gis-boundaries-london/ESRI/LSOA_2011_London_gen_MHW.sbx
inflating: statistical-gis-boundaries-london/ESRI/LSOA_2011_London_gen_MHW.shp
inflating: statistical-gis-boundaries-london/ESRI/LSOA_2011_London_gen_MHW.shp.xml
inflating: statistical-gis-boundaries-london/ESRI/LSOA_2011_London_gen_MHW.shx
inflating: statistical-gis-boundaries-london/ESRI/MSOA_2004_London_High_Resolution.dbf
inflating: statistical-gis-boundaries-london/ESRI/MSOA_2004_London_High_Resolution.prj
inflating: statistical-gis-boundaries-london/ESRI/MSOA_2004_London_High_Resolution.sbn
inflating: statistical-gis-boundaries-london/ESRI/MSOA_2004_London_High_Resolution.sbx
inflating: statistical-gis-boundaries-london/ESRI/MSOA_2004_London_High_Resolution.shp
inflating: statistical-gis-boundaries-london/ESRI/MSOA_2004_London_High_Resolution.shp.xml
inflating: statistical-gis-boundaries-london/ESRI/MSOA_2004_London_High_Resolution.shx
inflating: statistical-gis-boundaries-london/ESRI/MSOA_2011_London_gen_MHW.dbf
inflating: statistical-gis-boundaries-london/ESRI/MSOA_2011_London_gen_MHW.prj
inflating: statistical-gis-boundaries-london/ESRI/MSOA_2011_London_gen_MHW.sbn
inflating: statistical-gis-boundaries-london/ESRI/MSOA_2011_London_gen_MHW.sbx
inflating: statistical-gis-boundaries-london/ESRI/MSOA_2011_London_gen_MHW.shp
inflating: statistical-gis-boundaries-london/ESRI/MSOA_2011_London_gen_MHW.shp.xml
inflating: statistical-gis-boundaries-london/ESRI/MSOA_2011_London_gen_MHW.shx
inflating: statistical-gis-boundaries-london/ESRI/OA_2011_London_gen_MHW.dbf
inflating: statistical-gis-boundaries-london/ESRI/OA_2011_London_gen_MHW.prj
inflating: statistical-gis-boundaries-london/ESRI/OA_2011_London_gen_MHW.sbn
inflating: statistical-gis-boundaries-london/ESRI/OA_2011_London_gen_MHW.sbx
inflating: statistical-gis-boundaries-london/ESRI/OA_2011_London_gen_MHW.shp
inflating: statistical-gis-boundaries-london/ESRI/OA_2011_London_gen_MHW.shp.xml
inflating: statistical-gis-boundaries-london/ESRI/OA_2011_London_gen_MHW.shx
inflating: statistical-gis-boundaries-london/Geography-licensing.pdf
creating: statistical-gis-boundaries-london/MapInfo/
inflating: statistical-gis-boundaries-london/MapInfo/London_Borough_Excluding_MHW.dat
inflating: statistical-gis-boundaries-london/MapInfo/London_Borough_Excluding_MHW.id
inflating: statistical-gis-boundaries-london/MapInfo/London_Borough_Excluding_MHW.ind
inflating: statistical-gis-boundaries-london/MapInfo/London_Borough_Excluding_MHW.map
inflating: statistical-gis-boundaries-london/MapInfo/London_Borough_Excluding_MHW.tab
inflating: statistical-gis-boundaries-london/MapInfo/London_Ward_CityMerged.dat
inflating: statistical-gis-boundaries-london/MapInfo/London_Ward_CityMerged.id
inflating: statistical-gis-boundaries-london/MapInfo/London_Ward_CityMerged.ind
inflating: statistical-gis-boundaries-london/MapInfo/London_Ward_CityMerged.map
inflating: statistical-gis-boundaries-london/MapInfo/London_Ward_CityMerged.tab
inflating: statistical-gis-boundaries-london/MapInfo/LSOA_2004_London_Low_Resolution.DAT
inflating: statistical-gis-boundaries-london/MapInfo/LSOA_2004_London_Low_Resolution.ID
inflating: statistical-gis-boundaries-london/MapInfo/LSOA_2004_London_Low_Resolution.IND
inflating: statistical-gis-boundaries-london/MapInfo/LSOA_2004_London_Low_Resolution.MAP
inflating: statistical-gis-boundaries-london/MapInfo/LSOA_2004_London_Low_Resolution.TAB
inflating: statistical-gis-boundaries-london/MapInfo/LSOA_2011_London_gen_MHW.DAT
inflating: statistical-gis-boundaries-london/MapInfo/LSOA_2011_London_gen_MHW.ID
inflating: statistical-gis-boundaries-london/MapInfo/LSOA_2011_London_gen_MHW.MAP
inflating: statistical-gis-boundaries-london/MapInfo/LSOA_2011_London_gen_MHW.tab
inflating: statistical-gis-boundaries-london/MapInfo/MSOA_2004_London_Low_Resolution.DAT
inflating: statistical-gis-boundaries-london/MapInfo/MSOA_2004_London_Low_Resolution.ID
inflating: statistical-gis-boundaries-london/MapInfo/MSOA_2004_London_Low_Resolution.MAP
inflating: statistical-gis-boundaries-london/MapInfo/MSOA_2004_London_Low_Resolution.TAB
inflating: statistical-gis-boundaries-london/MapInfo/MSOA_2011_London_gen_MHW.DAT
inflating: statistical-gis-boundaries-london/MapInfo/MSOA_2011_London_gen_MHW.ID
inflating: statistical-gis-boundaries-london/MapInfo/MSOA_2011_London_gen_MHW.MAP
inflating: statistical-gis-boundaries-london/MapInfo/MSOA_2011_London_gen_MHW.tab
inflating: statistical-gis-boundaries-london/MapInfo/OA_2011_London_gen_MHW.DAT
inflating: statistical-gis-boundaries-london/MapInfo/OA_2011_London_gen_MHW.ID
inflating: statistical-gis-boundaries-london/MapInfo/OA_2011_London_gen_MHW.MAP
inflating: statistical-gis-boundaries-london/MapInfo/OA_2011_London_gen_MHW.tab
Read in MSOAs:
msoas = geopandas.read_file("statistical-gis-boundaries-london/ESRI/MSOA_2004_London_High_Resolution.shp")
msoas.info()
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 983 entries, 0 to 982
Data columns (total 10 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 MSOA_CODE 983 non-null object
1 MSOA_NAME 983 non-null object
2 LA_CODE 983 non-null object
3 LA_NAME 983 non-null object
4 GEOEAST 983 non-null int64
5 GEONORTH 983 non-null int64
6 POPEAST 983 non-null int64
7 POPNORTH 983 non-null int64
8 AREA_KM2 983 non-null float64
9 geometry 983 non-null geometry
dtypes: float64(1), geometry(1), int64(4), object(4)
memory usage: 76.9+ KB
Read in boroughs:
boroughs = geopandas.read_file("statistical-gis-boundaries-london/ESRI/London_Borough_Excluding_MHW.shp")
boroughs.info()
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 33 entries, 0 to 32
Data columns (total 8 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 NAME 33 non-null object
1 GSS_CODE 33 non-null object
2 HECTARES 33 non-null float64
3 NONLD_AREA 33 non-null float64
4 ONS_INNER 33 non-null object
5 SUB_2009 0 non-null object
6 SUB_2006 0 non-null object
7 geometry 33 non-null geometry
dtypes: float64(2), geometry(1), object(5)
memory usage: 2.2+ KB
Aggregate variables to MSOA#
Attach MSOA code to each point
db = geopandas.sjoin(geo_listings,
msoas[["geometry", "MSOA_CODE", "MSOA_NAME"]].to_crs(geo_listings.crs),
how="left"
)
Aggregate stats to MSOA level and reattach geometries
g = db.groupby("MSOA_CODE")
msoas_abb = g.mean().drop("index_right", axis=1)
msoas_abb["property_count"] = g.size()
msoas_abb = geopandas.GeoDataFrame(msoas_abb.join(msoas.set_index("MSOA_CODE")[["geometry"]]),
crs=msoas.crs
)
Attach boroughs#
We attach to a MSOA the borough where its centroid falls within.
msoa_cents = geopandas.GeoDataFrame({"MSOA11CD": msoas_abb.index,
"geometry": msoas_abb.centroid
}, crs=msoas_abb.crs
)
msoa2borough = geopandas.sjoin(msoa_cents,
boroughs[["NAME", "GSS_CODE", "geometry"]]\
.to_crs(msoas_abb.crs),
how="left"
)
msoa2borough.head()
MSOA11CD | geometry | index_right | NAME | GSS_CODE | |
---|---|---|---|---|---|
MSOA_CODE | |||||
E02000001 | E02000001 | POINT (532464.075 181219.688) | 32 | City of London | E09000001 |
E02000002 | E02000002 | POINT (548312.704 189878.010) | 31 | Barking and Dagenham | E09000002 |
E02000003 | E02000003 | POINT (548456.427 188399.878) | 31 | Barking and Dagenham | E09000002 |
E02000004 | E02000004 | POINT (551009.985 186307.533) | 31 | Barking and Dagenham | E09000002 |
E02000005 | E02000005 | POINT (548666.108 186902.593) | 31 | Barking and Dagenham | E09000002 |
And add the borough code to the main table:
abb = msoas_abb.reset_index()\
.join(msoa2borough[["NAME", "GSS_CODE"]], on="MSOA_CODE")\
.rename({"NAME": "BOROUGH"}, axis=1)\
.drop(["id", "longitude", "latitude"], axis=1)
abb.info()
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 981 entries, 0 to 980
Data columns (total 18 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 MSOA_CODE 981 non-null object
1 accommodates 981 non-null float64
2 bathrooms 981 non-null float64
3 bedrooms 981 non-null float64
4 beds 981 non-null float64
5 number_of_reviews 981 non-null float64
6 reviews_per_month 977 non-null float64
7 review_scores_rating 977 non-null float64
8 review_scores_accuracy 977 non-null float64
9 review_scores_cleanliness 977 non-null float64
10 review_scores_checkin 976 non-null float64
11 review_scores_communication 977 non-null float64
12 review_scores_location 976 non-null float64
13 review_scores_value 976 non-null float64
14 property_count 981 non-null int64
15 geometry 981 non-null geometry
16 BOROUGH 981 non-null object
17 GSS_CODE 981 non-null object
dtypes: float64(13), geometry(1), int64(1), object(3)
memory usage: 138.1+ KB
Keep Inner London#
To keep only MSOAs with higher density of properties, we restrict the dataset to Inner London.
il = [
"City of London",
"Camden",
"Greenwich",
"Hackney",
"Hammersmith and Fulham",
"Islington",
"Kensington and Chelsea",
"Lambeth",
"Lewisham",
"Southwark",
"Tower Hamlets",
"Wandsworth",
"Westminster",
]
fltr = abb["BOROUGH"].isin(il)
abb_il = abb[fltr]
This is why we keep Inner London only:
ax = abb.plot(column="property_count", scheme="quantiles")
geopandas.GeoSeries(abb_il.unary_union).plot(ax=ax, edgecolor="red", facecolor="none")
<AxesSubplot:>
Let’s also use the Inner London borough list to write them into a separate file:
! rm -f london_inner_boroughs.geojson
boroughs[boroughs["NAME"].isin(il)]\
.to_crs(epsg=4326)\
.to_file("london_inner_boroughs.geojson",
driver="GeoJSON"
)
Write out and clean up#
! rm -f london_abb.gpkg
abb_il.to_crs(epsg=4326).to_file("london_abb.gpkg", driver="GPKG")
! du -h london_abb.gpkg
4.0M london_abb.gpkg
! rm -rf statistical-gis-boundaries-london.zip statistical-gis-boundaries-london/