Assessment#
This course is assessed through four components, each with different weight.
Teams contribution (5%)#
Type:
Coursework
Continuous assessment
5% of the final mark
Electronic submission only.
Students are encouraged to contribute to the online discussion forum set up for the module. The contribution to the discussion forum is assessed as an all-or-nothing 5% of the mark that can be obtained by contributing meaninfully to the online discussion board setup for the course before the end of the first month of the course. Meaningful contributions include both questions and answers that demonstrate the student is committed to make the forum a more useful resource for the rest of the group.
Test I (20%)#
Information provided on labs.
Test II (25%)#
Information provided on labs.
Computational essay (50%)#
Here’s the premise. You will take the role of a real-world data scientist tasked to explore a dataset on the city of Toronto (Canada) and find useful insights for a variety of decision-makers. It does not matter if you have never been to Toronto. In fact, this will help you focus on what you can learn about the city through the data, without the influence of prior knowledge. Furthermore, the assessment will not be marked based on how much you know about Toronto but instead about how much you can show you have learned through analysing data.
A computational essay is an essay whose narrative is supported by code and computational results that are included in the essay itself. This piece of assessment is equivalent to 2,500 words. However, this is the overall weight. Since you will need to create not only English narrative but also code and figures, here are the requirements:
Maximum of 750 words (bibliography, if included, does not contribute to the word count)
Up to three maps or figures (a figure may include more than one map and will only count as one but needs to be integrated in the same
matplotlib
figure)Up to one table
The assignment relies on two datasets provided below, and has two parts. Each of these pieces are explained with more detail below.
Data#
To complete the assignment, the following two datasets are provided. Below we show how you can download them and what they contain.
import geopandas, pandas
Socio-economic characteristics of Toronto neighbourhoods
This dataset contains a set of polygons representing the official neighbourhoods, as well as socio-economic information attached to each neighbourhood.
You can read the main file by running:
neis = geopandas.read_file("https://darribas.org/gds_course/_downloads/a2bdb4c2a088e602c3bd6490ab1d26fa/toronto_socio-economic.gpkg")
neis.info()
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 140 entries, 0 to 139
Data columns (total 24 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 _id 140 non-null int64
1 AREA_NAME 140 non-null object
2 Shape__Area 140 non-null float64
3 neighbourhood_name 140 non-null object
4 population2016 140 non-null float64
5 population_sqkm 140 non-null float64
6 pop_0-14_yearsold 140 non-null float64
7 pop_15-24_yearsold 140 non-null float64
8 pop_25-54_yearsold 140 non-null float64
9 pop_55-64_yearsold 140 non-null float64
10 pop_65+_yearsold 140 non-null float64
11 pop_85+_yearsold 140 non-null float64
12 hh_median_income2015 140 non-null float64
13 canadian_citizens 140 non-null float64
14 deg_bachelor 140 non-null float64
15 deg_medics 140 non-null float64
16 deg_phd 140 non-null float64
17 employed 140 non-null float64
18 bedrooms_0 140 non-null float64
19 bedrooms_1 140 non-null float64
20 bedrooms_2 140 non-null float64
21 bedrooms_3 140 non-null float64
22 bedrooms_4+ 140 non-null float64
23 geometry 140 non-null geometry
dtypes: float64(20), geometry(1), int64(1), object(2)
memory usage: 26.4+ KB
You can find more information on each of the socio-economic variables in the variable list file:
pandas.read_csv("https://darribas.org/gds_course/_downloads/8944151f1b7df7b1f38b79b7a73eb2d0/toronto_socio-economic_vars.csv")
_id | name | Category | Topic | Data Source | Characteristic | |
---|---|---|---|---|---|---|
0 | 3 | population2016 | Population | Population and dwellings | Census Profile 98-316-X2016001 | Population, 2016 |
1 | 8 | population_sqkm | Population | Population and dwellings | Census Profile 98-316-X2016001 | Population density per square kilometre |
2 | 10 | pop_0-14_yearsold | Population | Age characteristics | Census Profile 98-316-X2016001 | Children (0-14 years) |
3 | 11 | pop_15-24_yearsold | Population | Age characteristics | Census Profile 98-316-X2016001 | Youth (15-24 years) |
4 | 12 | pop_25-54_yearsold | Population | Age characteristics | Census Profile 98-316-X2016001 | Working Age (25-54 years) |
5 | 13 | pop_55-64_yearsold | Population | Age characteristics | Census Profile 98-316-X2016001 | Pre-retirement (55-64 years) |
6 | 14 | pop_65+_yearsold | Population | Age characteristics | Census Profile 98-316-X2016001 | Seniors (65+ years) |
7 | 15 | pop_85+_yearsold | Population | Age characteristics | Census Profile 98-316-X2016001 | Older Seniors (85+ years) |
8 | 1018 | hh_median_income2015 | Income | Income of households in 2015 | Census Profile 98-316-X2016001 | Total - Income statistics in 2015 for private ... |
9 | 1149 | canadian_citizens | Immigration and citizenship | Citizenship | Census Profile 98-316-X2016001 | Canadian citizens aged 18 and over |
10 | 1711 | deg_bachelor | Education | Highest certificate, diploma or degree | Census Profile 98-316-X2016001 | Bachelor's degree |
11 | 1713 | deg_medics | Education | Highest certificate, diploma or degree | Census Profile 98-316-X2016001 | Degree in medicine, dentistry, veterinar... |
12 | 1714 | deg_phd | Education | Highest certificate, diploma or degree | Census Profile 98-316-X2016001 | Earned doctorate |
13 | 1887 | employed | Labour | Labour force status | Census Profile 98-316-X2016001 | Employed |
14 | 1636 | bedrooms_0 | Housing | Household characteristics | Census Profile 98-316-X2016001 | No bedrooms |
15 | 1637 | bedrooms_1 | Housing | Household characteristics | Census Profile 98-316-X2016001 | 1 bedroom |
16 | 1638 | bedrooms_2 | Housing | Household characteristics | Census Profile 98-316-X2016001 | 2 bedrooms |
17 | 1639 | bedrooms_3 | Housing | Household characteristics | Census Profile 98-316-X2016001 | 3 bedrooms |
18 | 1641 | bedrooms_4+ | Housing | Household characteristics | Census Profile 98-316-X2016001 | 4 or more bedrooms |
Flickr photographs sample
This is a similar dataset to the Tokyo photographs we use in Block H but for the city of Toronto. It is a subsample of the 100 million Yahoo dataset that contains the location of photographs contributed to the Flickr service by its users. You can read it with:
photos = pandas.read_csv("https://darribas.org/gds_course/_downloads/fc771c3b1b9e0ee00e875bb2d293adcd/toronto_flickr_subset.csv")
photos.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2000 entries, 0 to 1999
Data columns (total 11 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 id 2000 non-null int64
1 user_id 2000 non-null object
2 user_nickname 2000 non-null object
3 date_taken 2000 non-null object
4 date_uploaded 2000 non-null int64
5 title 1932 non-null object
6 longitude 2000 non-null float64
7 latitude 2000 non-null float64
8 accuracy_coordinates 2000 non-null float64
9 page_url 2000 non-null object
10 video_url 2000 non-null object
dtypes: float64(3), int64(2), object(6)
memory usage: 172.0+ KB
IMPORTANT - Students of ENVS563
will need to source, at least, two additional datasets relating to Toronto. You can use any dataset that will help you complete the tasks below but, if you need some inspiration, have a look at the Toronto Open Data Portal:
Part I - Common#
This is the one everyone has to do in the same way. Complete the following tasks:
Select two variables from the socio-economic dataset
Explore the spatial distribution of the data using choropleths. Comment on the details of your maps and interpret the results
Explore the degree of spatial autocorrelation. Describe the concepts behind your approach and interpret your results
Part II - Choose your own adventure#
For this one, you need to pick one of the following three options. Only one, and make the most of it.
Create a geodemographic classification and interpret the results. In the process, answer the following questions:
What are the main types of neighborhoods you identify?
Which characteristics help you delineate this typology?
If you had to use this classification to target areas in most need, how would you use it? why?
Create a regionalisation and interpret the results. In the process, answer at least the following questions:
How is the city partitioned by your data?
What do you learn about the geography of the city from the regionalisation?
What would one useful application of this regionalisation in the context of urban policy?
Using the photographs, complete the following tasks:
Visualise the dataset appropriately and discuss why you have taken your specific approach
Use DBSCAN to identify areas of the city with high density of photographs, which we will call areas of interest (AOI). In completing this, answer the following questions:
What parameters have you used to run DBSCAN? Why?
What do the clusters help you learn about areas of interest in the city?
Name one example of how these AOIs can be of use for the city. You can take the perspective of an urban planner, a policy maker, an operational practitioner (e.g. police, trash collection), an urban entrepreneur, or any other role you envision.
Marking criteria#
This course follows the standard marking criteria (the general ones and those relating to GIS assignments in particular) set by the School of Environmental Sciences. In addition to these generic criteria, the following specific criteria relating to the code provided will be used:
0-15: the code does not run and there is no documentation to follow it.
16-39: the code does not run, or runs but it does not produce the expected outcome. There is some documentation explaining its logic.
40-49: the code runs and produces the expected output. There is some documentation explaining its logic.
50-59: the code runs and produces the expected output. There is extensive documentation explaining its logic.
60-69: the code runs and produces the expected output. There is extensive documentation, properly formatted, explaining its logic.
70-79: all as above, plus the code design includes clear evidence of skills presented in advanced sections of the course (e.g. custom methods, list comprehensions, etc.).
80-100: all as above, plus the code contains novel contributions that extend/improve the functionality the student was provided with (e.g. algorithm optimizations, novel methods to perform the task, etc.).