Assessment

Key information:

  • Type: Coursework

  • [Equivalent to 5,000 words] Up to five figures and three tables + code + comments + up to 2,000

  • Chance to be reassessed

  • Due on 14:00, March 9th 2021

  • Electronic submission through Turnitin/CANVAS. Static HTML with NO interactive cells

This module is assessed through a computational essay. To complete it successfully, you will need to demonstrate aptitude in at least three areas:

  1. Data audacity

  2. Python data skills

  3. Machine learning and inference literacy

These translate in the following components of the computational essay:

1 - Find, prepare & explore a dataset

Find a dataset you are excited about and that meets the following characteristics:

  • It contains several characteristics (features) for a number of observations (samples)

  • At least two characteristics are continuous and at least two are categorical

  • You can think of ways in which clustering the observations based on their characteristics could tell an interesting story

  • You can imagine a situation in which one of the continuous characteristics can be explained in a supervised model as a function of some of the other characteristics

NOTE: please discuss with Dani the choice of dataset before the course finishes

With the dataset at hand:

  1. Prepare it for analysis

  2. Explore the dataset visually, identifying interesting patterns

2 - Unsupervised learning

Perform a clustering exercise & analyse the results. You are expected to try several clustering models, choose a preferred one, and present a critical argument about why that is your choice. To build your argument, you may rely on graphics, performance scores, and substantive reasoning. Demonstrate that you understand not only how the mechanics of the algorithms work but that you are able to translate those into an applied context to make sense of data.

3 - Supervised learning

Finally, build a predictive model based on linear regression and:

  • Interpret the coefficient

  • Evaluate its predictive performance both with and without cross-validation

  • Reflect on the differences between assessing the performance of a model cross-validating and not.

Similarly to the previous point, demonstrate that you both understand the workings of the algorithms and techniques but also how you can make the most of it to learn about your data. Critical thinking is critical.