Tokyo districts administrative boundaries

Here we prepare a GeoJSON for the administrative areas the Flickr dataset for Tokyo covers.

import pandas, geopandas

Get extent of photos

Read table in:

db = pandas.read_csv("https://geographicdata.science/book/_downloads/7fb86b605af15b3c9cbd9bfcbead23e9/tokyo_clean.csv")
db.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 7 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   user_id               10000 non-null  object 
 1   longitude             10000 non-null  float64
 2   latitude              10000 non-null  float64
 3   date_taken            10000 non-null  object 
 4   photo/video_page_url  10000 non-null  object 
 5   x                     10000 non-null  float64
 6   y                     10000 non-null  float64
dtypes: float64(4), object(3)
memory usage: 547.0+ KB

Turn it into a GeoDataFrame:

pts = geopandas.points_from_xy(db["longitude"], db["latitude"])
photos = geopandas.GeoDataFrame({"geometry": pts},
                                crs="EPSG:4326"
                               ).join(db)

Access boundaries for Japan

We rely on the excellent GADM project for the Japan file:

! wget https://biogeo.ucdavis.edu/data/gadm3.6/gpkg/gadm36_JPN_gpkg.zip
--2020-09-18 11:38:33--  https://biogeo.ucdavis.edu/data/gadm3.6/gpkg/gadm36_JPN_gpkg.zip
Resolving biogeo.ucdavis.edu (biogeo.ucdavis.edu)... 128.120.228.172
Connecting to biogeo.ucdavis.edu (biogeo.ucdavis.edu)|128.120.228.172|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://data.biogeo.ucdavis.edu/data/gadm3.6/gpkg/gadm36_JPN_gpkg.zip [following]
--2020-09-18 11:38:34--  https://data.biogeo.ucdavis.edu/data/gadm3.6/gpkg/gadm36_JPN_gpkg.zip
Resolving data.biogeo.ucdavis.edu (data.biogeo.ucdavis.edu)... 128.120.228.172
Connecting to data.biogeo.ucdavis.edu (data.biogeo.ucdavis.edu)|128.120.228.172|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 12035160 (11M) [application/zip]
Saving to: ‘gadm36_JPN_gpkg.zip’

gadm36_JPN_gpkg.zip 100%[===================>]  11.48M  1.54MB/s    in 8.5s    

2020-09-18 11:38:43 (1.35 MB/s) - ‘gadm36_JPN_gpkg.zip’ saved [12035160/12035160]

Unzip it:

! unzip gadm36_JPN_gpkg.zip
Archive:  gadm36_JPN_gpkg.zip
  inflating: gadm36_JPN.gpkg         
  inflating: license.txt             

Read in the table for smallest areas:

areas = geopandas.read_file("gadm36_JPN.gpkg", layer=0)

Remove unnecessary files:

! rm gadm36_JPN_gpkg.zip license.txt gadm36_JPN.gpkg

Clip areas

Identify areas with at least one photo:

j = geopandas.sjoin(photos, 
                    areas,
                    how="inner"
                   )
ids_to_keep = j["GID_2"].unique()

Filter irrelevant areas and columns out:

vars_to_keep = ["GID_1", 
                "NAME_1", 
                "GID_2", 
                "NAME_2", 
                "ENGTYPE_2", 
                "geometry"
               ]
areas_to_keep = areas.loc[areas["GID_2"].isin(ids_to_keep), vars_to_keep]
areas_to_keep.info()
<class 'geopandas.geodataframe.GeoDataFrame'>
Int64Index: 39 entries, 138 to 1665
Data columns (total 6 columns):
 #   Column     Non-Null Count  Dtype   
---  ------     --------------  -----   
 0   GID_1      39 non-null     object  
 1   NAME_1     39 non-null     object  
 2   GID_2      39 non-null     object  
 3   NAME_2     39 non-null     object  
 4   ENGTYPE_2  39 non-null     object  
 5   geometry   39 non-null     geometry
dtypes: geometry(1), object(5)
memory usage: 2.1+ KB

Write out

! rm -f tokyo_admin_boundaries.geojson
areas_to_keep.to_file("tokyo_admin_boundaries.geojson", driver="GeoJSON")