Week 2 - Tidy Data
Notebook
Data
This session uses the “Census socio-demographics” dataset. Go to the
Datasets tab to find out more information as well as instructions to
download it.
Additional materials
- A good extension of this session is (Wickham, 2014). The
paper is published under an Open Access license so it is freely available on
the journal’s site, but the author has also made available a public
repository with the data and code
used in the paper. Keep in mind the paper and the code that comes with it
are based on R, not on Python.
- [Visualization] Python library
seaborn
tutorial.
- [Recommended] (McKinney, 2012): excellent introduction to
Python for data analysis, with plenty of examples and code snippets
(Publisher’s page link).
- NY Times
article about the importance of cleaning data.
References
- McKinney, W. (2012). Python for data analysis: Data wrangling with Pandas, NumPy, and
IPython. O’Reilly Media, Inc.
- Wickham, H. (2014). Tidy Data. Journal Of Statistical Software, 59(10), ??–?? Retrieved from http://www.jstatsoft.org/v59/i10