The dataset includes Tweets posted during the actual convention with #mla15: the set starts with a Tweet from Thursday 08/01/2015 00:02:53 Pacific Time and ends with a Tweet from Sunday 11/01/2015 23:59:58 Pacific Time.
The total number of Tweets in this dataset sums 23,609 Tweets. Only Tweets from users with at least two followers were collected.
Please note the data in the file is likely to require further refining and even deduplication. The data is shared as is. The dataset is shared to encourage open research into scholarly activity on Twitter. If you use or refer to this data in any way please cite and link back using the citation information above.
The MLA has been a pioneering academic organization in embracing Twitter. Since 2007 the so-called “conference back channel” has been growing considerably. Adoption of Twitter amongst scholars and students seems on the rise as well, and reporting live from the conference is no longer an underground, parallel activity but pretty much a recognized, encouraged aspect of the event.
Microblogging, with special emphasis on Twitter.com, the most well known service, is increasingly used as a means of undertaking digital “backchannel” communication (non-verbal, real-time, communication which does not interrupt a presenter or event, (Ynge 1970, Kellogg et al 2006). Digital backchannels are becoming more prevalent at academic conferences, in educational use, and in organizational settings. Frameworks are therefore required for understanding the role and use of digital backchannel communication, such as that provided by Twitter, in enabling participatory cultures.
Ross et all studied the Twitter activity around three digital humanities conferences (#dh09, #thatcamp and #drha09, #drha2009), collecting and analysing a corpus of 4574 tweets (90%, 4259 original tweets and only 313 Retweets).
Though this was activity that took place in 2009 for events considerably smaller than the MLA, the study by Ross et al remains an important reference for studies on Humanities scholars use of Twitter in general and for the data collection that I’ve been conducting (not only of the MLA backchannel) and the research I’ve been meaning to publish eventually.
As a comparison from another discipline, Desai et al (2012) collected and analysed 993 tweets over the 5 days of the American Society of Nephrology (ASN) annual scientific conference in 2011 (#kidneywk11).
There is still a paucity of reliable, timely research of how scholarls use Twitter around (before, during, after) academic conferences of different diciplines. Part of the problem is that often studies of social media are not disseminated through social media channels (either as fragmentary outputs on Twitter or as blog posts) and the “publishing delay” involved in peer-reviwed formal publication means that the data reaches us, as in the two cases cited above, two years later.
I have been following and participating remotely with the MLA convention through Twitter since 2010, attempting different ways of both engaging with and analysing the scholarly activity taking place under/with the hashtag(s) associated to the event. By far, this year #MLA14 (or #mla14; it’s not case sensitive) seemed to surpass all expectations of adoption.
I have been using Martin Hawksey‘s Twitter Archiving Google Spreadsheet TAGS (now in it’s fifth version) for a few years now, and it’s what I used to start collecting tweets tagged with #MLA14 from the 1st September 2013. In Hawksey’s words, TAGS is “a quick way to collect tweets, make publicly available and collaborate exploring the data.”
The archives I set updated automatically every minute, but the limit imposed by Google Sheets is 400,000 cells per sheet, and TAGS populates 18 columns with the tweets and associated metadata.
This means that the spreadsheets can fill very quickly and scripts can become unresponsive. I knew that if I wanted to collect as much as possible from what I knew would be a very busy feed. In other words I would require more than one archive, and I would have to hope I’d be able to deduplicate and collate the data in more manageable chunks later. In practical terms it meant that I had to be very attentive monitoring both the feed and the Google spreadsheets, following the event on Twitter almos as if I were literally there. It meant being attentive to the live archives and start collecting before the previous one had collapsed.
After the conference I was contacted by Chris Zarate from the MLA, who had also been archiving the #MLA14 feed with TAGS. He had some gaps in his data, and so did I, and only working together we have managed to have some glimpses of a more or less complete dataset of #MLA14 tweets.
A First Finding: How Many
Chris and I had more than 75,000 tweets in our combined sets, and after deduplicating them with OpenRefine we were down to 27,491 tweets.
The MLA annual convention might be a mega conference (around 7,500 paid attendees this year, according to Rosemary Feal) but 27,491 tweets is still an amazingly healthy figure reflecting some undoubtable adoption of Twitter from humanities scholars.
Chris did a quick plot over 9-12 January 2014 (the days of actual conference). It is possible we may have missed some tweets here and there due to the Twitter API rate-limiting, but there are no glaring gaps:
Not suprisingly, the overall Twitter activity peaked in the afternoon of Saturday 11 January (remember the conference took place from 9 to 12 January 2014). It was that morning Central Time that I tweeted that the #MLA14 feed was receiving 21.1 tweets per minute.
Logically many research questions arise.
What’s Next: More Soon
Chris and I are still working on the dataset so as to have it in different and manageable forms that allow for easier qualitative and quantitative analysis.
We are also looking forward to eventually sharing a CSV file containing data and metadata of tweets posted between Sunday September 01 2013 at 20:35:07 to Wednesday January 15 2014 16:16:41 (Central Time).
If you have a dataset including #MLA14 tweets before Sunday September 01 2013 at 20:35:07, we would love to hear from you.
I will keep sharing some insights from the dataset here. Hopefully I’ll have another post on this blog tomorrow with some interesting findings.
N.B. Sadly, in spite of constant efforts by me and many other colleagues to encourage the recognition of blog posts as academic outputs, research of this type that is not presented in the traditional academic venues (read: peer-reviewed academic article or monograph) rarely gets cited (this is frankly disappointing). Therefore I regret I will be unable to blog the complete analysis or share the whole dataset until I have at least secured one formal output for this ongoing research. Were I in a different stage of my career I could probably afford to, but it’s not the case at the moment.
Again, with many thanks to Chris Zarate for collaborating in this project.
Desai, T., Shariff, A., Shariff, A., Kats, M., Fang, X., Christiano, C., & Ferris, M. (2012). Tweeting the meeting: an in-depth analysis of Twitter activity at Kidney Week 2011. (V. Gupta, Ed.) PloS one, 7(7), e40253. doi:10.1371/journal.pone.0040253. Accessed 16 January 2013