This session1 This note is part of Spatial Analysis Notes
Flows – Exploring flows visually and through spatial interaction by Dani Arribas-Bel is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. covers spatial interaction flows. Using open data from the city of San Francisco about trips on its bikeshare system, we will estimate spatial interaction models that try to capture and explain the variation in the amount of trips on each given route. After visualizing the dataset, we begin with a very simple model and then build complexity progressively by augmenting it with more information, refined measurements, and better modeling approaches. Throughout the note, we explore different ways to grasp the predictive performance of each model. We finish with a prediction example that illustrates how these models can be deployed in a real-world application.
Content is based on the following references, which are great follow-up’s on the topic:
This tutorial is part of Spatial Analysis Notes, a compilation hosted as a GitHub repository that you can access in a few ways:
.zip
file that contains all the materials.This tutorial relies on the following libraries that you will need to have installed on your machine to be able to interactively follow along2 You can install package mypackage
by running the command install.packages("mypackage")
on the R prompt or through the Tools --> Install Packages...
menu in RStudio.. Once installed, load them up with the following commands:
# Layout
library(tufte)
# Spatial Data management
library(rgdal)
# Pretty graphics
library(ggplot2)
# Thematic maps
library(tmap)
# Pretty maps
library(ggmap)
# Simulation methods
library(arm)
Before we start any analysis, let us set the path to the directory where we are working. We can easily do that with setwd()
. Please replace in the following line the path to the folder where you have placed this file -and where the sf_bikes
folder with the data lives.
setwd('.')
In this note, we will use data from the city of San Francisco representing bike trips on their public bike share system. The original source is the SF Open Data portal (link) and the dataset comprises both the location of each station in the Bay Area as well as information on trips (station of origin to station of destination) undertaken in the system from September 2014 to August 2015 and the following year. Since this note is about modeling and not data preparation, a cleanly reshaped version of the data, together with some additional information, has been created and placed in the sf_bikes
folder. The data file is named flows.geojson
and, in case you are interested, the (Python) code required to created from the original files in the SF Data Portal is also available on the flows_prep.ipynb
notebook [url], also in the same folder.
Let us then directly load the file with all the information necessary:
db <- readOGR(dsn='sf_bikes/flows.geojson', layer='OGRGeoJSON')
## OGR data source with driver: GeoJSON
## Source: "sf_bikes/flows.geojson", layer: "OGRGeoJSON"
## with 1722 features
## It has 9 fields
rownames(db@data) <- db$flow_id
db@data$flow_id <- NULL
Note how the interface is slightly different since we are reading a GeoJSON
file instead of a shapefile.
The data contains the geometries of the flows, as calculated from the Google Maps API, as well as a series of columns with characteristics of each flow:
head(db@data)
## dest orig straight_dist street_dist total_down total_up trips15
## 39-41 41 39 1452.201 1804.1150 11.205753 4.698162 68
## 39-42 42 39 1734.861 2069.1557 10.290236 2.897886 23
## 39-45 45 39 1255.349 1747.9928 11.015596 4.593927 83
## 39-46 46 39 1323.303 1490.8361 3.511543 5.038044 258
## 39-47 47 39 715.689 769.9189 0.000000 3.282495 127
## 39-48 48 39 1996.778 2740.1290 11.375186 3.841296 81
## trips16
## 39-41 68
## 39-42 29
## 39-45 50
## 39-46 163
## 39-47 73
## 39-48 56
where orig
and dest
are the station IDs of the origin and destination, street/straight_dist
is the distance in metres between stations measured along the street network or as-the-crow-flies, total_down/up
is the total downhil and climb in the trip, and tripsXX
contains the amount of trips undertaken in the years of study.
The easiest way to get a quick preview of what the data looks like spatially is to make a simple plot:
Potential routes
plot(db)
Equally, if we want to visualize a single route, we can simply subset the table. For example, to get the shape of the trip from station 39
to station 48
, we can:
Trip from station 39 to 48
one39to48 <- db[ which(
db@data$orig == 39 & db@data$dest == 48
) , ]
plot(one39to48)
or, for the most popular route, we can:
Most popular trip
most_pop <- db[ which(
db@data$trips15 == max(db@data$trips15)
) , ]
plot(most_pop)
These however do not reveal a lot: there is no geographical context (why are there so many routes along the NE?) and no sense of how volumes of bikers are allocated along different routes. Let us fix those two.
The easiest way to bring in geographical context is by overlaying the routes on top of a background map of tiles downloaded from the internet. Let us download this using ggmap
:
sf_bb <- c(left=db@bbox['x', 'min'],
right=db@bbox['x', 'max'],
bottom=db@bbox['y', 'min'],
top=db@bbox['y', 'max'])
SanFran <- get_stamenmap(sf_bb,
zoom = 14,
maptype = "toner-lite")
## Source : http://tile.stamen.com/toner-lite/14/2620/6330.png
## Source : http://tile.stamen.com/toner-lite/14/2621/6330.png
## Source : http://tile.stamen.com/toner-lite/14/2622/6330.png
## Source : http://tile.stamen.com/toner-lite/14/2620/6331.png
## Source : http://tile.stamen.com/toner-lite/14/2621/6331.png
## Source : http://tile.stamen.com/toner-lite/14/2622/6331.png
## Source : http://tile.stamen.com/toner-lite/14/2620/6332.png
## Source : http://tile.stamen.com/toner-lite/14/2621/6332.png
## Source : http://tile.stamen.com/toner-lite/14/2622/6332.png
## Source : http://tile.stamen.com/toner-lite/14/2620/6333.png
## Source : http://tile.stamen.com/toner-lite/14/2621/6333.png
## Source : http://tile.stamen.com/toner-lite/14/2622/6333.png
and make sure it looks like we intend it to look: