Hands-on#

Mapping in Python#

import geopandas
import osmnx
import contextily as cx
import matplotlib.pyplot as plt
/tmp/ipykernel_7827/2961306976.py:1: UserWarning: Shapely 2.0 is installed, but because PyGEOS is also installed, GeoPandas will still use PyGEOS by default for now. To force to use and test Shapely 2.0, you have to set the environment variable USE_PYGEOS=0. You can do this before starting the Python process, or in your code before importing geopandas:

import os
os.environ['USE_PYGEOS'] = '0'
import geopandas

In a future release, GeoPandas will switch to using Shapely by default. If you are using PyGEOS directly (calling PyGEOS functions on geometries from GeoPandas), this will then stop working and you are encouraged to migrate from PyGEOS to Shapely 2.0 (https://shapely.readthedocs.io/en/latest/migration_pygeos.html).
  import geopandas

In this lab, we will learn how to load, manipulate and visualize spatial data. In some senses, spatial data are usually included simply as “one more column” in a table. However, spatial is special sometimes and there are few aspects in which geographic data differ from standard numerical tables. In this session, we will extend the skills developed in the previous one about non-spatial data, and combine them. In the process, we will discover that, although with some particularities, dealing with spatial data in Python largely resembles dealing with non-spatial data.

Datasets#

To learn these concepts, we will be playing with three main datasets. Same as in the previous block, these datasets can be loaded dynamically from the web, or you can download them manually, keep a copy on your computer, and load them from there.

Important

Make sure you are connected to the internet when you run these cells as they need to access data hosted online

Cities#

First we will use a polygon geography. We will use an open dataset that contains the boundaries of Spanish cities. We can read it into an object named cities by:

cities = geopandas.read_file("https://ndownloader.figshare.com/files/20232174")

Note the code cell above requires internet connectivity. If you are not online but have a full copy of the GDS course in your computer (downloaded as suggested in the infrastructure page), you can read the data with the following line of code:

cities = geopandas.read_file("../data/web_cache/cities.gpkg")

Streets#

In addition to polygons, we will play with a line layer. For that, we are going to use a subset of street network from the Spanish city of Madrid.

The data is available on the following web address:

url = (
    "https://github.com/geochicasosm/lascallesdelasmujeres"
    "/raw/master/data/madrid/final_tile.geojson"
)
url
'https://github.com/geochicasosm/lascallesdelasmujeres/raw/master/data/madrid/final_tile.geojson'

And you can read it into an object called streets with:

streets = geopandas.read_file(url)

Note the code cell above requires internet connectivity. If you are not online but have a full copy of the GDS course in your computer (downloaded as suggested in the infrastructure page), you can read the data with the following line of code:

streets = geopandas.read_file("../data/web_cache/streets.gpkg")

Bars#

The final dataset we will rely on is a set of points demarcating the location of bars in Madrid. To obtain it, we will use osmnx, a Python library that allows us to query OpenStreetMap. Note that we use the method pois_from_place, which queries for points of interest (POIs, or pois) in a particular place (Madrid in this case). In addition, we can specify a set of tags to delimit the query. We use this to ask only for amenities of the type “bar”:

pois = osmnx.geometries_from_place(
    "Madrid, Spain", tags={"amenity": "bar"}
)

You do not need to know at this point what happens behind the scenes when we run geometries_from_place but, if you are curious, we are making a query to OpenStreetMap (almost as if you typed “bars in Madrid, Spain” within Google Maps) and getting the response as a table of data, instead of as a website with an interactive map. Pretty cool, huh?


Note the code cell above requires internet connectivity. If you are not online but have a full copy of the GDS course in your computer (downloaded as suggested in the infrastructure page), you can read the data with the following line of code:

pois = geopandas.read_parquet("../data/web_cache/pois_bars_madrid.parquet")

Inspecting spatial data#

The most direct way to get from a file to a quick visualization of the data is by loading it as a GeoDataFrame and calling the plot command. The main library employed for all of this is geopandas which is a geospatial extension of the pandas library, already introduced before. geopandas supports the same functionality that pandas does, plus a wide range of spatial extensions that make manipulation and general “munging” of spatial data similar to non-spatial tables.

In two lines of code, we will obtain a graphical representation of the spatial data contained in a file that can be in many formats; actually, since it uses the same drivers under the hood, you can load pretty much the same kind of vector files that Desktop GIS packages like QGIS permit. Let us start by plotting single layers in a crude but quick form, and we will build style and sophistication into our plots later on.

Polygons#

Now lsoas is a GeoDataFrame. Very similar to a traditional, non-spatial DataFrame, but with an additional column called geometry:

cities.head()
city_id n_building geometry
0 ci000 2348 POLYGON ((385390.071 4202949.446, 384488.697 4...
1 ci001 2741 POLYGON ((214893.033 4579137.558, 215258.185 4...
2 ci002 5472 POLYGON ((690674.281 4182188.538, 691047.526 4...
3 ci003 14608 POLYGON ((513378.282 4072327.639, 513408.853 4...
4 ci004 2324 POLYGON ((206989.081 4129478.031, 207275.702 4...

This allows us to quickly produce a plot by executing the following line:

cities.plot()
<Axes: >
../../_images/d8365e49e0bf402709a3c0df4c0c0aeef37beaf502dc4e58789bb43f32752d55.png

This might not be the most aesthetically pleasant visual representation of the LSOAs geography, but it is hard to argue it is not quick to produce. We will work on styling and customizing spatial plots later on.

Pro-tip: if you call a single row of the geometry column, it’ll return a small plot ith the shape:

cities.loc[0, 'geometry']
../../_images/af5f847b462dc190bcc39f00abe764fbc49a76221d1551f9f42755709cd3ebe0.svg

Lines#

Similarly to the polygon case, if we pick the "geometry" column of a table with lines, a single row will display the geometry as well:

streets.loc[0, 'geometry']
../../_images/4b07b3cb9d2083c277f5b2122d869148b199b65280772a41191ce615c3065ecc.svg

A quick plot is similarly generated by:

streets.plot()
<Axes: >
../../_images/ba9664ce49684b3965b3449aec1cd86bfe793e9c5af1248b2de1cbd00c635376.png

Again, this is not the prettiest way to display the streets maybe, and you might want to change a few parameters such as colors, etc. All of this is possible, as we will see below, but this gives us a quick check of what lines look like.

Points#

Points take a similar approach for quick plotting:

pois.plot()
<Axes: >
../../_images/5cc4dfcde131e94b4fee40b9f414fedce888ad041f37d01c8c8a7e248d20f97d.png

Styling plots#

It is possible to tweak several aspects of a plot to customize if to particular needs. In this section, we will explore some of the basic elements that will allow us to obtain more compelling maps.

NOTE: some of these variations are very straightforward while others are more intricate and require tinkering with the internal parts of a plot. They are not necessarily organized by increasing level of complexity.

Changing transparency#

The intensity of color of a polygon can be easily changed through the alpha attribute in plot. This is specified as a value betwee zero and one, where the former is entirely transparent while the latter is the fully opaque (maximum intensity):

pois.plot(alpha=0.1)
<Axes: >
../../_images/5bd9086757bdabe6fd774806e9d44115e3faa8a560af755ef73b1e77c9d4df02.png

Removing axes#

Although in some cases, the axes can be useful to obtain context, most of the times maps look and feel better without them. Removing the axes involves wrapping the plot into a figure, which takes a few more lines of aparently useless code but that, in time, it will allow you to tweak the map further and to create much more flexible designs:

# Setup figure and axis
f, ax = plt.subplots(1)
# Plot layer of polygons on the axis
cities.plot(ax=ax)
# Remove axis frames
ax.set_axis_off()
# Display
plt.show()
../../_images/42f261a02fb4d6bd7df6ae9cbc93b93f399d9342aad6083ce5f690fa001c2c30.png

Let us stop for a second a study each of the previous lines:

  1. We have first created a figure named f with one axis named ax by using the command plt.subplots (part of the library matplotlib, which we have imported at the top of the notebook). Note how the method is returning two elements and we can assign each of them to objects with different name (f and ax) by simply listing them at the front of the line, separated by commas.

  2. Second, we plot the geographies as before, but this time we tell the function that we want it to draw the polygons on the axis we are passing, ax. This method returns the axis with the geographies in them, so we make sure to store it on an object with the same name, ax.

  3. On the third line, we effectively remove the box with coordinates.

  4. Finally, we draw the entire plot by calling plt.show().

Adding a title#

Adding a title is an extra line, if we are creating the plot within a figure, as we just did. To include text on top of the figure:

# Setup figure and axis
f, ax = plt.subplots(1)
# Add layer of polygons on the axis
streets.plot(ax=ax)
# Add figure title
f.suptitle("Streets in Madrid")
# Display
plt.show()
../../_images/c9d10a2208039646c566a973f628b000675bc330b21af61624fe3aa6d8606e2c.png

Changing the size of the map#

The size of the plot is changed equally easily in this context. The only difference is that it is specified when we create the figure with the argument figsize. The first number represents the width, the X axis, and the second corresponds with the height, the Y axis.

# Setup figure and axis with different size
f, ax = plt.subplots(1, figsize=(12, 12))
# Add layer of polygons on the axis
cities.plot(ax=ax)
# Display
plt.show()
../../_images/313c207cb310830534908c9b8f22f5e1c8ccc2e3c3190046b50174c29dd9e3b0.png

Modifying borders#

Border lines sometimes can distort or impede proper interpretation of a map. In those cases, it is useful to know how they can be modified. Although not too complicated, the way to access borders in geopandas is not as straightforward as it is the case for other aspects of the map, such as size or frame. Let us first see the code to make the lines thicker and black, and then we will work our way through the different steps:

# Setup figure and axis
f, ax = plt.subplots(1, figsize=(12, 12))
# Add layer of polygons on the axis, set fill color (`facecolor`) and boundary
# color (`edgecolor`)
cities.plot(
    linewidth=1, 
    facecolor='red', 
    edgecolor='black', 
    ax=ax
);
../../_images/dee9ceea4659c21b36760010a521af02820278766f710226656017a1a1fcf617.png

Note how the lines are thicker. In addition, all the polygons are colored in the same (default) color, light red. However, because the lines are thicker, we can only see the polygon filling for those cities with an area large enough.

Let us examine line by line what we are doing in the code snippet:

  • We begin by creating the figure (f) object and one axis inside it (ax) where we will plot the map.

  • Then, we call plot as usual, but pass in two new arguments: linewidth for the width of the line; facecolor, to control the color each polygon is filled with; and edgecolor, to control the color of the boundary.

This approach works very similarly with other geometries, such as lines. For example, if we wanted to plot the streets in red, we would simply:

# Setup figure and axis
f, ax = plt.subplots(1)
# Add layer with lines, set them red and with different line width
# and append it to the axis `ax`
streets.plot(linewidth=2, color='red', ax=ax)
<Axes: >
../../_images/40f14609d0375925696c3fe05666a80d060134c8d826e12b91c45f668d70666b.png

Important, note that in the case of lines the parameter to control the color is simply color. This is because lines do not have an area, so there is no need to distinguish between the main area (facecolor) and the border lines (edgecolor).

Transforming CRS#

The coordindate reference system (CRS) is the way geographers and cartographers have to represent a three-dimentional object, such as the round earth, on a two-dimensional plane, such as a piece of paper or a computer screen. If the source data contain information on the CRS of the data, we can modify this in a GeoDataFrame. First let us check if we have the information stored properly:

cities.crs
<Projected CRS: EPSG:25830>
Name: ETRS89 / UTM zone 30N
Axis Info [cartesian]:
- E[east]: Easting (metre)
- N[north]: Northing (metre)
Area of Use:
- name: Europe between 6°W and 0°W: Faroe Islands offshore; Ireland - offshore; Jan Mayen - offshore; Norway including Svalbard - offshore; Spain - onshore and offshore.
- bounds: (-6.0, 35.26, 0.01, 80.49)
Coordinate Operation:
- name: UTM zone 30N
- method: Transverse Mercator
Datum: European Terrestrial Reference System 1989 ensemble
- Ellipsoid: GRS 1980
- Prime Meridian: Greenwich

As we can see, there is information stored about the reference system: it is using the standard Spanish projection, which is expressed in meters. There are also other less decipherable parameters but we do not need to worry about them right now.

If we want to modify this and “reproject” the polygons into a different CRS, the quickest way is to find the EPSG code online (epsg.io is a good one, although there are others too). For example, if we wanted to transform the dataset into lat/lon coordinates, we would use its EPSG code, 4326:

# Reproject (`to_crs`) and plot (`plot`) polygons
cities.to_crs(epsg=4326).plot()
# Set equal axis
lims = plt.axis('equal')
../../_images/59017339b095933766dba0ef75745f30cd189831294dd88f550d40b0505f7fa6.png

The shape of the polygons is slightly different. Furthermore, note how the scale in which they are plotted differs.

Composing multi-layer maps#

So far we have considered many aspects of plotting a single layer of data. However, in many cases, an effective map will require more than one: for example we might want to display streets on top of the polygons of neighborhoods, and add a few points for specific locations we want to highlight. At the very heart of GIS is the possibility to combine spatial information from different sources by overlaying it on top of each other, and this is fully supported in Python.

For this section, let’s select only Madrid from the streets table and convert it to lat/lon so it’s aligned with the streets and POIs layers:

mad = cities.loc[[12], :].to_crs(epsg=4326)
mad
city_id n_building geometry
12 ci012 193714 POLYGON ((-3.90016 40.30421, -3.90019 40.30457...

Combining different layers on a single map boils down to adding each of them to the same axis in a sequential way, as if we were literally overlaying one on top of the previous one. For example, let’s plot the boundary of Madrid and its bars:

# Setup figure and axis
f, ax = plt.subplots(1)
# Add a layer with polygon on to axis `ax`
mad.plot(ax=ax, color="yellow")
# Add a layer with lines on top in axis `ax`
pois.plot(ax=ax, color="green")
<Axes: >
../../_images/1d9c20992e59cb5a67015f25e1ae7765e3110015df798ad454b9a7c5d9616ae4.png

Saving maps to figures#

Once we have produced a map we are content with, we might want to save it to a file so we can include it into a report, article, website, etc. Exporting maps in Python involves replacing plt.show by plt.savefig at the end of the code block to specify where and how to save it. For example to save the previous map into a png file in the same folder where the notebook is hosted:

# Setup figure and axis
f, ax = plt.subplots(1)
# Add a layer with polygon on to axis `ax`
mad.plot(ax=ax, color="yellow")
# Add a layer with lines on top in axis `ax`
pois.plot(ax=ax, color="green")
# Save figure to a PNG file
plt.savefig('madrid_bars.png')
../../_images/1d9c20992e59cb5a67015f25e1ae7765e3110015df798ad454b9a7c5d9616ae4.png

If you now check on the folder, you’ll find a png (image) file with the map.

The command plt.savefig contains a large number of options and additional parameters to tweak. Given the size of the figure created is not very large, we can increase this with the argument dpi, which stands for “dots per inch” and it’s a standard measure of resolution in images. For example, for a high quality image, we could use 500:

# Setup figure and axis
f, ax = plt.subplots(1)
# Add a layer with polygon on to axis `ax`
mad.plot(ax=ax, color="yellow")
# Add a layer with lines on top in axis `ax`
pois.plot(ax=ax, color="green")
# Save figure to a PNG file
plt.savefig('madrid_bars.png', dpi=500)
../../_images/1d9c20992e59cb5a67015f25e1ae7765e3110015df798ad454b9a7c5d9616ae4.png

Manipulating spatial tables (GeoDataFrames)#

Once we have an understanding of how to visually display spatial information contained, let us see how it can be combined with the operations learnt in the previous session about manipulating non-spatial tabular data. Essentially, the key is to realize that a GeoDataFrame contains most of its spatial information in a single column named geometry, but the rest of it looks and behaves exactly like a non-spatial DataFrame (in fact, it is). This concedes them all the flexibility and convenience that we saw in manipulating, slicing, and transforming tabular data, with the bonus that spatial data is carried away in all those steps. In addition, GeoDataFrames also incorporate a set of explicitly spatial operations to combine and transform data. In this section, we will consider both.

GeoDataFrames come with a whole range of traditional GIS operations built-in. Here we will run through a small subset of them that contains some of the most commonly used ones.

Area calculation#

One of the spatial aspects we often need from polygons is their area. “How big is it?” is a question that always haunts us when we think of countries, regions, or cities. To obtain area measurements, first make sure you GeoDataFrame is projected. If that is the case, you can calculate areas as follows:

city_areas = cities.area
city_areas.head()
0    8.449666e+06
1    9.121270e+06
2    1.322653e+07
3    6.808121e+07
4    1.072284e+07
dtype: float64

This indicates that the area of the first city in our table takes up 8,450,000 squared metres. If we wanted to convert into squared kilometres, we can divide by 1,000,000:

areas_in_sqkm = city_areas / 1000000
areas_in_sqkm.head()
0     8.449666
1     9.121270
2    13.226528
3    68.081212
4    10.722843
dtype: float64

Length#

Similarly, an equally common question with lines is their length. Also similarly, their computation is relatively straightforward in Python, provided that our data are projected. Here we will perform the projection (to_crs) and the calculation of the length at the same time:

street_length = streets.to_crs(epsg=25830).length
street_length.head()
0    120.776840
1    120.902920
2    396.494357
3    152.442895
4    101.392357
dtype: float64

Since the CRS we use (EPSG:25830) is expressed in metres, we can tell the first street segment is about 37m.

Centroid calculation#

Sometimes it is useful to summarize a polygon into a single point and, for that, a good candidate is its centroid (almost like a spatial analogue of the average). The following command will return a GeoSeries (a single column with spatial data) with the centroids of a polygon GeoDataFrame:

cents = cities.centroid
cents.head()
0    POINT (386147.759 4204605.994)
1    POINT (216296.159 4579397.331)
2    POINT (688901.588 4180201.774)
3    POINT (518262.028 4069898.674)
4    POINT (206940.936 4127361.966)
dtype: geometry

Note how cents is not an entire table but a single column, or a GeoSeries object. This means you can plot it directly, just like a table:

cents.plot()
<Axes: >
../../_images/f9064001788598118df31dd0708d926a9c08693d1f0e07e7919f12ac6ac29b00.png

But you don’t need to call a geometry column to inspect the spatial objects. In fact, if you do it will return an error because there is not any geometry column, the object cents itself is the geometry.

Point in polygon (PiP)#

Knowing whether a point is inside a polygon is conceptually a straightforward exercise but computationally a tricky task to perform. The way to perform this operation in GeoPandas is through the contains method, available for each polygon object.

poly = cities.loc[12, "geometry"]
pt1 = cents[0]
pt2 = cents[112]

And we can perform the checks as follows:

poly.contains(pt1)
False
poly.contains(pt2)
False

Performing point-in-polygon in this way is instructive and useful for pedagogical reasons, but for cases with many points and polygons, it is not particularly efficient. In these situations, it is much more advisable to perform then as a “spatial join”. If you are interested in these, see the link provided below to learn more about them.

Buffers#

Buffers are one of the classical GIS operations in which an area is drawn around a particular geometry, given a specific radious. These are very useful, for instance, in combination with point-in-polygon operations to calculate accessibility, catchment areas, etc.

For this example, we will use the bars table, but will project it to the same CRS as cities, so it is expressed in metres:

pois_projected = pois.to_crs(cities.crs)
pois_projected.crs
<Projected CRS: EPSG:25830>
Name: ETRS89 / UTM zone 30N
Axis Info [cartesian]:
- E[east]: Easting (metre)
- N[north]: Northing (metre)
Area of Use:
- name: Europe between 6°W and 0°W: Faroe Islands offshore; Ireland - offshore; Jan Mayen - offshore; Norway including Svalbard - offshore; Spain - onshore and offshore.
- bounds: (-6.0, 35.26, 0.01, 80.49)
Coordinate Operation:
- name: UTM zone 30N
- method: Transverse Mercator
Datum: European Terrestrial Reference System 1989 ensemble
- Ellipsoid: GRS 1980
- Prime Meridian: Greenwich

To create a buffer using geopandas, simply call the buffer method, passing in the radious. For example, to draw a 500m. buffer around every bar in Madrid:

buf = pois_projected.buffer(500)
buf.head()
0    POLYGON ((440085.759 4475244.528, 440083.352 4...
1    POLYGON ((441199.443 4482099.370, 441197.035 4...
2    POLYGON ((440012.154 4473848.877, 440009.747 4...
3    POLYGON ((441631.862 4473439.094, 441629.454 4...
4    POLYGON ((441283.067 4473680.493, 441280.659 4...
dtype: geometry

And plotting it is equally straighforward:

f, ax = plt.subplots(1)
# Plot buffer
buf.plot(ax=ax, linewidth=0)
# Plot named places on top for reference
# [NOTE how we modify the dot size (`markersize`)
# and the color (`color`)]
pois_projected.plot(ax=ax, markersize=1, color='yellow')
<Axes: >
../../_images/612fc03e9379595a84aa3f7a715529631227f0d1aaeca1297aed1fe2d6ef4300.png

Adding base layers from web sources#

Many single datasets lack context when displayed on their own. A common approach to alleviate this is to use web tiles, which are a way of quickly obtaining geographical context to present spatial data. In Python, we can use contextily to pull down tiles and display them along with our own geographic data.

We can begin by creating a map in the same way we would do normally, and then use the add_basemap command to, er, add a basemap:

ax = cities.plot(color="black")
cx.add_basemap(ax, crs=cities.crs);
../../_images/31552869e0d1cf1f9208ab813682e5eb86798382cad24b9630b5eb2731bbcc1c.png

Note that we need to be explicit when adding the basemap to state the coordinate reference system (crs) our data is expressed in, contextily will not be able to pick it up otherwise. Conversely, we could change our data’s CRS into Pseudo-Mercator, the native reference system for most web tiles:

cities_wm = cities.to_crs(epsg=3857)
ax = cities_wm.plot(color="black")
cx.add_basemap(ax);
../../_images/be287defeb152abae8644ba2c75b3e83fcee184a1cd1ba095843fa851df70da5.png

Note how the coordinates are different but, if we set it right, either approach aligns tiles and data nicely.

Web tiles can be integrated with other features of maps in a similar way as we have seen above. So, for example, we can change the size of the map, and remove the axis. Let’s use Madrid for this example:

f, ax = plt.subplots(1, figsize=(9, 9))
mad.plot(alpha=0.25, ax=ax)
cx.add_basemap(ax, crs=mad.crs)
ax.set_axis_off()
../../_images/b21e8b38f02ff31229742bbbe2e7cb3cf013aff7d5bb1f7aa5d50f39d0f191ca.png

Now, contextily offers a lot of options in terms of the sources and providers you can use to create your basemaps. For example, we can use satellite imagery instead:

f, ax = plt.subplots(1, figsize=(9, 9))
mad.plot(alpha=0.25, ax=ax)
cx.add_basemap(
    ax, 
    crs=mad.crs,
    source=cx.providers.Esri.WorldImagery
)
ax.set_axis_off()
../../_images/0c87082a101b6352240f8df2c923b07a456188dd350a9b5e208d1dc0e22d3183.png

Have a look at this Twitter thread to get some further ideas on providers:

And consider checking out the documentation website for the package:

Interactive maps#

Everything we have seen so far relates to static maps. These are useful for publication, to include in reports or to print. However, modern web technologies afford much more flexibility to explore spatial data interactively.

We will use the state-of-the-art Leaflet integration into geopandas. This integration connects GeoDataFrame objects with the popular web mapping library Leaflet.js. In this context, we will only show how you can take a GeoDataFrame into an interactive map in one line of code:

# Display interactive map
streets.explore()
Make this Notebook Trusted to load map: File -> Trust Notebook

Further resources#

More advanced GIS operations are possible in geopandas and, in most cases, they are extensions of the same logic we have used in this document. If you are thinking about taking the next step from here, the following two operations (and the documentation provided) will give you the biggest “bang for the buck”:

  • Spatial joins

  • Spatial overlays