{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Do-It-Yourself"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This section is all about you taking charge of the steering wheel and choosing your own adventure. For this block, we are going to use what we've learnt [before](lab_B) to take a look at a dataset of casualties in the war in Afghanistan. The data was originally released by Wikileaks, and the version we will use is published by The Guardian.\n",
    "\n",
    "```{margin}\n",
    "You can read a bit more about the data at The Guardian's [data blog](http://www.theguardian.com/news/datablog/2010/jul/27/wikileaks-afghanistan-data-datajournalism)\n",
    "``` "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Data preparation\n",
    "\n",
    "Before you can set off on your data journey, the dataset needs to be read, and there's a couple of details we will get out of the way so it is then easier for you to start working.\n",
    "\n",
    "The data are published on a Google Sheet you can check out at:\n",
    "\n",
    "> [https://docs.google.com/spreadsheets/d/1EAx8_ksSCmoWW_SlhFyq2QrRn0FNNhcg1TtDFJzZRgc/edit?hl=en#gid=1](https://docs.google.com/spreadsheets/d/1EAx8_ksSCmoWW_SlhFyq2QrRn0FNNhcg1TtDFJzZRgc/edit?hl=en#gid=1)\n",
    "\n",
    "As you will see, each row includes casualties recorded month by month, split by Taliban, Civilians, Afghan forces, and NATO.\n",
    "\n",
    "To read it into a Python session, we need to slightly modify the URL to access it into:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'https://docs.google.com/spreadsheets/d/1EAx8_ksSCmoWW_SlhFyq2QrRn0FNNhcg1TtDFJzZRgc/export?format=csv&gid=1'"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "url = (\"https://docs.google.com/spreadsheets/d/\"\\\n",
    "       \"1EAx8_ksSCmoWW_SlhFyq2QrRn0FNNhcg1TtDFJzZRgc/\"\\\n",
    "       \"export?format=csv&gid=1\")\n",
    "url"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note how we split the url into three lines so it is more readable in narrow screens. The result however, stored in `url`, is the same as one long string.\n",
    "\n",
    "This allows us to read the data straight into a DataFrame, as we have done in the [previous session](lab_B):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "db = pandas.read_csv(url, skiprows=[0, -1], thousands=\",\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note also we use the `skiprows=[0, -1]` to avoid reading the top (`0`) and bottom (`-1`) rows which, if you check on the Google Sheet, involves the title of the table.\n",
    "\n",
    "Now we are good to go!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Year</th>\n",
       "      <th>Month</th>\n",
       "      <th>Taliban</th>\n",
       "      <th>Civilians</th>\n",
       "      <th>Afghan forces</th>\n",
       "      <th>Nato (detailed in spreadsheet)</th>\n",
       "      <th>Nato - official figures</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2004.0</td>\n",
       "      <td>January</td>\n",
       "      <td>15</td>\n",
       "      <td>51</td>\n",
       "      <td>23</td>\n",
       "      <td>NaN</td>\n",
       "      <td>11.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2004.0</td>\n",
       "      <td>February</td>\n",
       "      <td>NaN</td>\n",
       "      <td>7</td>\n",
       "      <td>4</td>\n",
       "      <td>5</td>\n",
       "      <td>2.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2004.0</td>\n",
       "      <td>March</td>\n",
       "      <td>19</td>\n",
       "      <td>2</td>\n",
       "      <td>NaN</td>\n",
       "      <td>2</td>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2004.0</td>\n",
       "      <td>April</td>\n",
       "      <td>5</td>\n",
       "      <td>3</td>\n",
       "      <td>19</td>\n",
       "      <td>NaN</td>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2004.0</td>\n",
       "      <td>May</td>\n",
       "      <td>18</td>\n",
       "      <td>29</td>\n",
       "      <td>56</td>\n",
       "      <td>6</td>\n",
       "      <td>9.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     Year     Month Taliban Civilians Afghan forces  \\\n",
       "0  2004.0   January      15        51            23   \n",
       "1  2004.0  February     NaN         7             4   \n",
       "2  2004.0     March      19         2           NaN   \n",
       "3  2004.0     April       5         3            19   \n",
       "4  2004.0       May      18        29            56   \n",
       "\n",
       "  Nato (detailed in spreadsheet)  Nato - official figures  \n",
       "0                            NaN                     11.0  \n",
       "1                              5                      2.0  \n",
       "2                              2                      3.0  \n",
       "3                            NaN                      3.0  \n",
       "4                              6                      9.0  "
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "db.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Tasks"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, the challenge is to put to work what we have learnt in this block. For that, the suggestion is that you carry out an analysis of the Afghan Logs in a similar way as how we looked at population composition in Liverpool. These are of course very different datasets reflecting immensely different realities. Their structure, however, is relatively parallel: both capture counts aggregated by a spatial (neighbourhood) or temporal unit (month), and each count is split by a few categories.\n",
    "\n",
    "Try to answer the following questions:\n",
    "\n",
    "- Obtain the minimum number of civilian casualties (in what month was that?)\n",
    "- How many NATO casualties were registered in August 2008?\n",
    "````{margin}\n",
    "```{tip}\n",
    "You will need to first create a column with total counts\n",
    "```\n",
    "````\n",
    "- What is the month with the most total number of casualties?\n",
    "- Can you make a plot of the distribution of casualties over time?"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}