Language lessons told through Twitter
Posted on: 27 Oct 2012
The social media channel is a valuable tool for linguists, allowing them to explore how language changes over time through the spread of slang expressions.
A free tutorial website, Duolingo, aims to translate the entire web with the help of people starting to learn a new language. It's a project born out of guilt from the man behind one of the most annoying features of web surfing - those online security checks involving random words.
This piece contains strong language from the beginning, as they say on the BBC. But only in the name of science – for a new study of how slang expressions spread on Twitter could offer insights into a more general question in linguistics: how language changes and evolves.
You might, like me, have been entirely innocent of what 'af' denotes in the Twittersphere, in which case the phrase 'I'm bored af' would simply baffle you. It doesn't, of course, take much thought to realise that it's simply an abbreviation for a vulgarity – a tamer version of which is 'as hell'. What's less obvious is why this pithy abbreviation should have jumped from its origin in southern California to a cluster of cities around Atlanta before spreading more widely across the east and west US coasts, as computer scientist Jacob Eisenstein of the Georgia Institute of Technology in Atlanta and his co-workers Brendan O'Connor, Noah Smith and Eric Xing of Carnegie Mellon University in Pittsburgh report in an, as yet unpublished, study.
Other neologisms have different life stories. Spelling bro, slang for brother (male friend or peer) as bruh began in the southeastern US (where it reflects the local pronunciation) before finally jumping to southern California. The emoticon '-__-' (denoting mild annoyance) began in New York and Florida before colonising both coasts and gradually reaching Arizona and Texas.
Who cares? Well, the question of how language changes and evolves has occupied linguistic anthropologists for several decades. What determines whether an innovation will propagate throughout a culture, remain just a local variant, or be stillborn? Such questions decide the grain and texture of all our languages – why we might tweet 'I'm bored af' rather than 'I'm bored, forsooth'.
There are plenty of ideas about how this happens. One suggestion is that innovations spread by simple diffusion from person to person, like a spreading ink blot. Another idea is that bigger population centres exert a stronger attraction on neologisms, so that they go first to large cities by a kind of gravitational pull. Or maybe culture and demography matters more than geographical proximity: words might spread initially within some minority groups while being invisible to the majority.
Sophisticated computer models of interacting 'agents' (that is, virtual people having virtual conversations) can be used to examine these processes, but they tell us little unless there are real data to compare them against. Such data has been extremely difficult to obtain, but now social media channels provide an embarrassment of riches – a precise and searchable record of our exchanges, which are being used to explore everything from the cycle of emotions experienced in everyday life to the changing social sentiment during the course of the Arab Spring.
In this case, Eisenstein and colleagues scoured through messages from the public feed on Twitter. They collected around 40 million messages from around 400,000 individuals between June 2009 and May 2011 that could be tied to a particular geographical location in the USA because of the smartphone metadata optionally included with the message.
The researchers then assigned these to their respective 'Metropolitan Statistical Areas' (MSAs): urban centres that typically represent a single city. For each MSA, demographic data on ethnicity are available which, with some effort to correct for the fact that Twitter users are not necessarily representative of the area's overall population, allows a rough estimate of the ethnic makeup of the message source.
Eisenstein and colleagues used statistical analysis techniques to work out how these urban centres influence each other – to tease out the network across which linguistic innovations like 'af', 'bruh' or '-__-'catch on. The result is a map of the USA showing not just how these phrases spread between population centres, but what the direction of that influence is.
What, then, are the characteristics that make an MSA likely to spawn successful neologisms? It's well established that Twitter has a higher rate of adoption among African Americans than other ethnic groups, and so it perhaps isn't surprising that they now find that innovation centres, as well as being highly populated, have a higher proportion of African Americans, and that similarity of racial demographic can make two urban centres more likely to be linked in the influence network. There is a long history of adoption of African American slang (cool, dig, rip off) in mainstream US culture, so these findings agree with what we might expect.
But these are still early days, and the researchers – who hope to present their preliminary findings at a workshop on Social Network and Social Media Analysis in December organised by the Neural Information Processing Systems Foundation – anticipate that they will eventually be able to identify more nuances of influence in the data. The real point at this stage, however, is the method. Twitter and other social media offer records of language mutating in real time and space: an immense and novel resource that, while no doubt subject to its own unique quirks, can offer linguists the opportunity to explore how our words and phrases arise from acts of tacit cultural negotiation. BBC