Lab 1 - Tidy Data
Notebook
TIP: Here you can find a step-by-step tutorial about how to start up the notebook, download files, and access them through the notebook:
[HTML]
Part I - Tidy Data
Part II - Advanced Tricks
Data
This session uses the “Census socio-demographics” datasets. Go to the Datasets tab to find out more information as well as instructions to download it.
Additional materials
- A good extension of this session is (Wickham, 2014). The paper is published under an Open Access license so it is freely available on the journal’s site, but the author has also made available a public repository with the data and code used in the paper. Keep in mind the paper and the code that comes with it are based on R, not on Python.
- [Visualization] Python library
seaborn
tutorial.
- [Recommended] (McKinney, 2012): excellent introduction to Python for data analysis, with plenty of examples and code snippets (Publisher’s page link).
- NY Times article about the importance of cleaning data.
References
- McKinney, W. (2012). Python for data analysis: Data wrangling with Pandas, NumPy, and
IPython. O’Reilly Media, Inc.
- Wickham, H. (2014). Tidy Data. Journal Of Statistical Software, 59(10), ??–?? Retrieved from http://www.jstatsoft.org/v59/i10