Tokyo districts administrative boundaries#
Here we prepare a GeoJSON for the administrative areas the Flickr dataset for Tokyo covers.
import pandas, geopandas
Get extent of photos#
Read table in:
db = pandas.read_csv("https://geographicdata.science/book/_downloads/7fb86b605af15b3c9cbd9bfcbead23e9/tokyo_clean.csv")
db.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 user_id 10000 non-null object
1 longitude 10000 non-null float64
2 latitude 10000 non-null float64
3 date_taken 10000 non-null object
4 photo/video_page_url 10000 non-null object
5 x 10000 non-null float64
6 y 10000 non-null float64
dtypes: float64(4), object(3)
memory usage: 547.0+ KB
Turn it into a GeoDataFrame
:
pts = geopandas.points_from_xy(db["longitude"], db["latitude"])
photos = geopandas.GeoDataFrame({"geometry": pts},
crs="EPSG:4326"
).join(db)
Access boundaries for Japan#
We rely on the excellent GADM project for the Japan file:
! wget https://biogeo.ucdavis.edu/data/gadm3.6/gpkg/gadm36_JPN_gpkg.zip
--2020-09-18 11:38:33-- https://biogeo.ucdavis.edu/data/gadm3.6/gpkg/gadm36_JPN_gpkg.zip
Resolving biogeo.ucdavis.edu (biogeo.ucdavis.edu)... 128.120.228.172
Connecting to biogeo.ucdavis.edu (biogeo.ucdavis.edu)|128.120.228.172|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://data.biogeo.ucdavis.edu/data/gadm3.6/gpkg/gadm36_JPN_gpkg.zip [following]
--2020-09-18 11:38:34-- https://data.biogeo.ucdavis.edu/data/gadm3.6/gpkg/gadm36_JPN_gpkg.zip
Resolving data.biogeo.ucdavis.edu (data.biogeo.ucdavis.edu)... 128.120.228.172
Connecting to data.biogeo.ucdavis.edu (data.biogeo.ucdavis.edu)|128.120.228.172|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 12035160 (11M) [application/zip]
Saving to: ‘gadm36_JPN_gpkg.zip’
gadm36_JPN_gpkg.zip 100%[===================>] 11.48M 1.54MB/s in 8.5s
2020-09-18 11:38:43 (1.35 MB/s) - ‘gadm36_JPN_gpkg.zip’ saved [12035160/12035160]
Unzip it:
! unzip gadm36_JPN_gpkg.zip
Archive: gadm36_JPN_gpkg.zip
inflating: gadm36_JPN.gpkg
inflating: license.txt
Read in the table for smallest areas:
areas = geopandas.read_file("gadm36_JPN.gpkg", layer=0)
Remove unnecessary files:
! rm gadm36_JPN_gpkg.zip license.txt gadm36_JPN.gpkg
Clip areas#
Identify areas with at least one photo:
j = geopandas.sjoin(photos,
areas,
how="inner"
)
ids_to_keep = j["GID_2"].unique()
Filter irrelevant areas and columns out:
vars_to_keep = ["GID_1",
"NAME_1",
"GID_2",
"NAME_2",
"ENGTYPE_2",
"geometry"
]
areas_to_keep = areas.loc[areas["GID_2"].isin(ids_to_keep), vars_to_keep]
areas_to_keep.info()
<class 'geopandas.geodataframe.GeoDataFrame'>
Int64Index: 39 entries, 138 to 1665
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 GID_1 39 non-null object
1 NAME_1 39 non-null object
2 GID_2 39 non-null object
3 NAME_2 39 non-null object
4 ENGTYPE_2 39 non-null object
5 geometry 39 non-null geometry
dtypes: geometry(1), object(5)
memory usage: 2.1+ KB
Write out#
! rm -f tokyo_admin_boundaries.geojson
areas_to_keep.to_file("tokyo_admin_boundaries.geojson", driver="GeoJSON")