Assessment

This course is assessed through four components, each with different weight.

Teams contribution (5%)

  • Type: Coursework

  • Continuous assessment

  • 5% of the final mark

  • Electronic submission only.

Students are encouraged to contribute to the online discussion forum set up for the module. The contribution to the discussion forum is assessed as an all-or-nothing 5% of the mark that can be obtained by contributing meaninfully to the online discussion board setup for the course before the end of the first month of the course. Meaningful contributions include both questions and answers that demonstrate the student is committed to make the forum a more useful resource for the rest of the group.

Test I (20%)

Information provided on labs.

Test II (25%)

Information provided on labs.

Computational essay (50%)

Here’s the premise. You will take the role of a real-world data scientist tasked to explore a dataset on the city of Toronto (Canada) and find useful insights for a variety of decision-makers. It does not matter if you have never been to Toronto. In fact, this will help you focus on what you can learn about the city through the data, without the influence of prior knowledge. Furthermore, the assessment will not be marked based on how much you know about Toronto but instead about how much you can show you have learned through analysing data.

A computational essay is an essay whose narrative is supported by code and computational results that are included in the essay itself. This piece of assessment is equivalent to 2,500 words. However, this is the overall weight. Since you will need to create not only English narrative but also code and figures, here are the requirements:

  • Maximum of 750 words (bibliography, if included, does not contribute to the word count)

  • Up to three maps or figures (a figure may include more than one map and will only count as one but needs to be integrated in the same matplotlib figure)

  • Up to one table

The assignment relies on two datasets provided below, and has two parts. Each of these pieces are explained with more detail below.

Data

To complete the assignment, the following two datasets are provided. Below we show how you can download them and what they contain.

import geopandas, pandas
  1. Socio-economic characteristics of Toronto neighbourhoods

This dataset contains a set of polygons representing the official neighbourhoods, as well as socio-economic information attached to each neighbourhood.

You can read the main file by running:

neis = geopandas.read_file("https://darribas.org/gds_course/_downloads/a2bdb4c2a088e602c3bd6490ab1d26fa/toronto_socio-economic.gpkg")
neis.info()
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 140 entries, 0 to 139
Data columns (total 24 columns):
 #   Column                Non-Null Count  Dtype   
---  ------                --------------  -----   
 0   _id                   140 non-null    int64   
 1   AREA_NAME             140 non-null    object  
 2   Shape__Area           140 non-null    float64 
 3   neighbourhood_name    140 non-null    object  
 4   population2016        140 non-null    float64 
 5   population_sqkm       140 non-null    float64 
 6   pop_0-14_yearsold     140 non-null    float64 
 7   pop_15-24_yearsold    140 non-null    float64 
 8   pop_25-54_yearsold    140 non-null    float64 
 9   pop_55-64_yearsold    140 non-null    float64 
 10  pop_65+_yearsold      140 non-null    float64 
 11  pop_85+_yearsold      140 non-null    float64 
 12  hh_median_income2015  140 non-null    float64 
 13  canadian_citizens     140 non-null    float64 
 14  deg_bachelor          140 non-null    float64 
 15  deg_medics            140 non-null    float64 
 16  deg_phd               140 non-null    float64 
 17  employed              140 non-null    float64 
 18  bedrooms_0            140 non-null    float64 
 19  bedrooms_1            140 non-null    float64 
 20  bedrooms_2            140 non-null    float64 
 21  bedrooms_3            140 non-null    float64 
 22  bedrooms_4+           140 non-null    float64 
 23  geometry              140 non-null    geometry
dtypes: float64(20), geometry(1), int64(1), object(2)
memory usage: 26.4+ KB

You can find more information on each of the socio-economic variables in the variable list file:

pandas.read_csv("https://darribas.org/gds_course/_downloads/8944151f1b7df7b1f38b79b7a73eb2d0/toronto_socio-economic_vars.csv")
_id name Category Topic Data Source Characteristic
0 3 population2016 Population Population and dwellings Census Profile 98-316-X2016001 Population, 2016
1 8 population_sqkm Population Population and dwellings Census Profile 98-316-X2016001 Population density per square kilometre
2 10 pop_0-14_yearsold Population Age characteristics Census Profile 98-316-X2016001 Children (0-14 years)
3 11 pop_15-24_yearsold Population Age characteristics Census Profile 98-316-X2016001 Youth (15-24 years)
4 12 pop_25-54_yearsold Population Age characteristics Census Profile 98-316-X2016001 Working Age (25-54 years)
5 13 pop_55-64_yearsold Population Age characteristics Census Profile 98-316-X2016001 Pre-retirement (55-64 years)
6 14 pop_65+_yearsold Population Age characteristics Census Profile 98-316-X2016001 Seniors (65+ years)
7 15 pop_85+_yearsold Population Age characteristics Census Profile 98-316-X2016001 Older Seniors (85+ years)
8 1018 hh_median_income2015 Income Income of households in 2015 Census Profile 98-316-X2016001 Total - Income statistics in 2015 for private ...
9 1149 canadian_citizens Immigration and citizenship Citizenship Census Profile 98-316-X2016001 Canadian citizens aged 18 and over
10 1711 deg_bachelor Education Highest certificate, diploma or degree Census Profile 98-316-X2016001 Bachelor's degree
11 1713 deg_medics Education Highest certificate, diploma or degree Census Profile 98-316-X2016001 Degree in medicine, dentistry, veterinar...
12 1714 deg_phd Education Highest certificate, diploma or degree Census Profile 98-316-X2016001 Earned doctorate
13 1887 employed Labour Labour force status Census Profile 98-316-X2016001 Employed
14 1636 bedrooms_0 Housing Household characteristics Census Profile 98-316-X2016001 No bedrooms
15 1637 bedrooms_1 Housing Household characteristics Census Profile 98-316-X2016001 1 bedroom
16 1638 bedrooms_2 Housing Household characteristics Census Profile 98-316-X2016001 2 bedrooms
17 1639 bedrooms_3 Housing Household characteristics Census Profile 98-316-X2016001 3 bedrooms
18 1641 bedrooms_4+ Housing Household characteristics Census Profile 98-316-X2016001 4 or more bedrooms
  1. Flickr photographs sample

This is a similar dataset to the Tokyo photographs we use in Block H but for the city of Toronto. It is a subsample of the 100 million Yahoo dataset that contains the location of photographs contributed to the Flickr service by its users. You can read it with:

photos = pandas.read_csv("https://darribas.org/gds_course/_downloads/fc771c3b1b9e0ee00e875bb2d293adcd/toronto_flickr_subset.csv")
photos.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2000 entries, 0 to 1999
Data columns (total 11 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   id                    2000 non-null   int64  
 1   user_id               2000 non-null   object 
 2   user_nickname         2000 non-null   object 
 3   date_taken            2000 non-null   object 
 4   date_uploaded         2000 non-null   int64  
 5   title                 1932 non-null   object 
 6   longitude             2000 non-null   float64
 7   latitude              2000 non-null   float64
 8   accuracy_coordinates  2000 non-null   float64
 9   page_url              2000 non-null   object 
 10  video_url             2000 non-null   object 
dtypes: float64(3), int64(2), object(6)
memory usage: 172.0+ KB

IMPORTANT - Students of ENVS563 will need to source, at least, two additional datasets relating to Toronto. You can use any dataset that will help you complete the tasks below but, if you need some inspiration, have a look at the Toronto Open Data Portal:

Part I - Common

This is the one everyone has to do in the same way. Complete the following tasks:

  1. Select two variables from the socio-economic dataset

  2. Explore the spatial distribution of the data using choropleths. Comment on the details of your maps and interpret the results

  3. Explore the degree of spatial autocorrelation. Describe the concepts behind your approach and interpret your results

Part II - Choose your own adventure

For this one, you need to pick one of the following three options. Only one, and make the most of it.

  1. Create a geodemographic classification and interpret the results. In the process, answer the following questions:

    • What are the main types of neighborhoods you identify?

    • Which characteristics help you delineate this typology?

    • If you had to use this classification to target areas in most need, how would you use it? why?

  2. Create a regionalisation and interpret the results. In the process, answer at least the following questions:

    • How is the city partitioned by your data?

    • What do you learn about the geography of the city from the regionalisation?

    • What would one useful application of this regionalisation in the context of urban policy?

  3. Using the photographs, complete the following tasks:

    • Visualise the dataset appropriately and discuss why you have taken your specific approach

    • Use DBSCAN to identify areas of the city with high density of photographs, which we will call areas of interest (AOI). In completing this, answer the following questions:

      • What parameters have you used to run DBSCAN? Why?

      • What do the clusters help you learn about areas of interest in the city?

      • Name one example of how these AOIs can be of use for the city. You can take the perspective of an urban planner, a policy maker, an operational practitioner (e.g. police, trash collection), an urban entrepreneur, or any other role you envision.

Marking criteria

This course follows the standard marking criteria (the general ones and those relating to GIS assignments in particular) set by the School of Environmental Sciences. In addition to these generic criteria, the following specific criteria relating to the code provided will be used:

  • 0-15: the code does not run and there is no documentation to follow it.

  • 16-39: the code does not run, or runs but it does not produce the expected outcome. There is some documentation explaining its logic.

  • 40-49: the code runs and produces the expected output. There is some documentation explaining its logic.

  • 50-59: the code runs and produces the expected output. There is extensive documentation explaining its logic.

  • 60-69: the code runs and produces the expected output. There is extensive documentation, properly formatted, explaining its logic.

  • 70-79: all as above, plus the code design includes clear evidence of skills presented in advanced sections of the course (e.g. custom methods, list comprehensions, etc.).

  • 80-100: all as above, plus the code contains novel contributions that extend/improve the functionality the student was provided with (e.g. algorithm optimizations, novel methods to perform the task, etc.).