Assessment¶
Key information:
Type: Coursework
[Equivalent to 5,000 words] Up to five figures and three tables + code + comments + up to 2,000
Chance to be reassessed
Due on 14:00, March 9th 2021
Electronic submission through Turnitin/CANVAS. Static HTML with NO interactive cells
This module is assessed through a computational essay. To complete it successfully, you will need to demonstrate aptitude in at least three areas:
Data audacity
Python data skills
Machine learning and inference literacy
These translate in the following components of the computational essay:
1 - Find, prepare & explore a dataset¶
Find a dataset you are excited about and that meets the following characteristics:
It contains several characteristics (features) for a number of observations (samples)
At least two characteristics are continuous and at least two are categorical
You can think of ways in which clustering the observations based on their characteristics could tell an interesting story
You can imagine a situation in which one of the continuous characteristics can be explained in a supervised model as a function of some of the other characteristics
NOTE: please discuss with Dani the choice of dataset before the course finishes
With the dataset at hand:
Prepare it for analysis
Explore the dataset visually, identifying interesting patterns
2 - Unsupervised learning¶
Perform a clustering exercise & analyse the results. You are expected to try several clustering models, choose a preferred one, and present a critical argument about why that is your choice. To build your argument, you may rely on graphics, performance scores, and substantive reasoning. Demonstrate that you understand not only how the mechanics of the algorithms work but that you are able to translate those into an applied context to make sense of data.
3 - Supervised learning¶
Finally, build a predictive model based on linear regression and:
Interpret the coefficient
Evaluate its predictive performance both with and without cross-validation
Reflect on the differences between assessing the performance of a model cross-validating and not.
Similarly to the previous point, demonstrate that you both understand the workings of the algorithms and techniques but also how you can make the most of it to learn about your data. Critical thinking is critical.