This course uses a wide range of datasets. Many are sourced from different projects (such as the GDS Book); but some have been pre-processed and are stored as part of the materials for this course. For the latter group, the code used for processing, from the original state and source of the files to the final product used in this course, is made available.


These pages are NOT part of the course syllabus. They are provided for transparency and to facilitate reproducibility of the project. Consequently, each notebook does not have the depth, detail and pedagogy of the core materials in the course.

The degree of sophistication of the code in these notebooks is at times above what is expected for a student of this course. Consequently, if you peak into these notebooks, some parts might not make much sense or seem too complpicated. Do not worry, it is not part of the course content.

Below are links to the processing of each of those datasets: