"Twitterhoods"
May 23, 2015
[NOTE: reposted from the University of Birmingham's official Tumbler. This text was prepared in the lead to the Pint of Science event in Birmingham, where I also presented]
We all know what a neighbourhood is, right? It’s a part of a city that shares a common character and feel, either defined by the people who live, work or visit there. The problem comes when you try to put it on a map, then it becomes much harder.
In fact, for the most part, the study of cities has been constrained by “given” boundaries such as postcodes, which actually have very little meaning in themselves. This is about to change, thanks to the data and computational revolutions currently underway. This will open up many doors to explore different methods to quantify, visualize and imagine the city in radically new ways.
Take Twitter, for example. The language in which a tweet is posted reveals a host of information about its “poster”.
Tweets in uncommon languages likely come from members of ethnic minorities; or tweets in English posted in non-english speaking countries can signal a more cosmopolitan author. If we combine this simple piece of metadata, with the location where the tweet was sent, it is possible to relate different languages to different parts of a city.
Areas where only the local language is spoken are likely to be very different from those where a variety of languages coexist. Touristy areas, for example, will have a relatively smaller proportion of local tweets and a larger share of foreign posts.
Although a person can start to imagine how these different maps overlay and reveal the character of each area, it is difficult for the human brain to process all the information at once.
Luckily, this is one of the tasks that a properly taught (programmed) computer can do very well. Using a family of techniques called machine learning, a computer can process all the tweets in a city and return a map that summarizes them into neighbourhoods. These neighbourhoods contain much more substantive meaning than the ones we are accustomed to use (e.g. post codes or administrative boundaries), and provide a representation of the city that, although we’ve always know was there, we could not picture. How cool is that?