Flows

Exploring flows visually and through spatial interaction

Dani Arribas-Bel

2017-03-15

This session1 This note is part of Spatial Analysis Notes Creative Commons License
Flows – Exploring flows visually and through spatial interaction by Dani Arribas-Bel is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
covers spatial interaction flows. Using open data from the city of San Francisco about trips on its bikeshare system, we will estimate spatial interaction models that try to capture and explain the variation in the amount of trips on each given route. After visualizing the dataset, we begin with a very simple model and then build complexity progressively by augmenting it with more information, refined measurements, and better modeling approaches. Throughout the note, we explore different ways to grasp the predictive performance of each model. We finish with a prediction example that illustrates how these models can be deployed in a real-world application.

Content is based on the following references, which are great follow-up’s on the topic:

This tutorial is part of Spatial Analysis Notes, a compilation hosted as a GitHub repository that you can access in a few ways:

Dependencies

This tutorial relies on the following libraries that you will need to have installed on your machine to be able to interactively follow along2 You can install package mypackage by running the command install.packages("mypackage") on the R prompt or through the Tools --> Install Packages... menu in RStudio.. Once installed, load them up with the following commands:

# Layout
library(tufte)
# Spatial Data management
library(rgdal)
# Pretty graphics
library(ggplot2)
# Thematic maps
library(tmap)
# Pretty maps
library(ggmap)
# Simulation methods
library(arm)

Before we start any analysis, let us set the path to the directory where we are working. We can easily do that with setwd(). Please replace in the following line the path to the folder where you have placed this file -and where the sf_bikes folder with the data lives.

setwd('.')

Data

In this note, we will use data from the city of San Francisco representing bike trips on their public bike share system. The original source is the SF Open Data portal (link) and the dataset comprises both the location of each station in the Bay Area as well as information on trips (station of origin to station of destination) undertaken in the system from September 2014 to August 2015 and the following year. Since this note is about modeling and not data preparation, a cleanly reshaped version of the data, together with some additional information, has been created and placed in the sf_bikes folder. The data file is named flows.geojson and, in case you are interested, the (Python) code required to created from the original files in the SF Data Portal is also available on the flows_prep.ipynb notebook [url], also in the same folder.

Let us then directly load the file with all the information necessary:

db <- readOGR(dsn='sf_bikes/flows.geojson', layer='OGRGeoJSON')
## OGR data source with driver: GeoJSON 
## Source: "sf_bikes/flows.geojson", layer: "OGRGeoJSON"
## with 1722 features
## It has 9 fields
rownames(db@data) <- db$flow_id
db@data$flow_id <- NULL

Note how the interface is slightly different since we are reading a GeoJSON file instead of a shapefile.

The data contains the geometries of the flows, as calculated from the Google Maps API, as well as a series of columns with characteristics of each flow:

head(db@data)
##       dest orig straight_dist street_dist total_down total_up trips15
## 39-41   41   39      1452.201   1804.1150  11.205753 4.698162      68
## 39-42   42   39      1734.861   2069.1557  10.290236 2.897886      23
## 39-45   45   39      1255.349   1747.9928  11.015596 4.593927      83
## 39-46   46   39      1323.303   1490.8361   3.511543 5.038044     258
## 39-47   47   39       715.689    769.9189   0.000000 3.282495     127
## 39-48   48   39      1996.778   2740.1290  11.375186 3.841296      81
##       trips16
## 39-41      68
## 39-42      29
## 39-45      50
## 39-46     163
## 39-47      73
## 39-48      56

where orig and dest are the station IDs of the origin and destination, street/straight_dist is the distance in metres between stations measured along the street network or as-the-crow-flies, total_down/up is the total downhil and climb in the trip, and tripsXX contains the amount of trips undertaken in the years of study.

Seeing” flows

The easiest way to get a quick preview of what the data looks like spatially is to make a simple plot:

Potential routes Potential routes

plot(db)

Equally, if we want to visualize a single route, we can simply subset the table. For example, to get the shape of the trip from station 39 to station 48, we can:

Trip from station 39 to 48 Trip from station 39 to 48

one39to48 <- db[ which(
          db@data$orig == 39 & db@data$dest == 48
          ) , ]
plot(one39to48)

or, for the most popular route, we can:

Most popular trip Most popular trip

most_pop <- db[ which(
          db@data$trips15 == max(db@data$trips15)
          ) , ]
plot(most_pop)

These however do not reveal a lot: there is no geographical context (why are there so many routes along the NE?) and no sense of how volumes of bikers are allocated along different routes. Let us fix those two.

The easiest way to bring in geographical context is by overlaying the routes on top of a background map of tiles downloaded from the internet. Let us download this using ggmap:

sf_bb <- c(left=db@bbox['x', 'min'],
           right=db@bbox['x', 'max'],
           bottom=db@bbox['y', 'min'],
           top=db@bbox['y', 'max'])
SanFran <- get_stamenmap(sf_bb, 
                         zoom = 14, 
                         maptype = "toner-lite")
## Source : http://tile.stamen.com/toner-lite/14/2620/6330.png
## Source : http://tile.stamen.com/toner-lite/14/2621/6330.png
## Source : http://tile.stamen.com/toner-lite/14/2622/6330.png
## Source : http://tile.stamen.com/toner-lite/14/2620/6331.png
## Source : http://tile.stamen.com/toner-lite/14/2621/6331.png
## Source : http://tile.stamen.com/toner-lite/14/2622/6331.png
## Source : http://tile.stamen.com/toner-lite/14/2620/6332.png
## Source : http://tile.stamen.com/toner-lite/14/2621/6332.png
## Source : http://tile.stamen.com/toner-lite/14/2622/6332.png
## Source : http://tile.stamen.com/toner-lite/14/2620/6333.png
## Source : http://tile.stamen.com/toner-lite/14/2621/6333.png
## Source : http://tile.stamen.com/toner-lite/14/2622/6333.png

and make sure it looks like we intend it to look: