Having just shared an (albeit incomplete) #dh2014 archive, I could not stand anymore having lots of #dh2013 spreadhseets in my hard drive without attempting to share a more or less ordered set once and for all.
Taking advantage of the post-#dh2014 inspired drive, I have finally shared an XLS file containing a dataset of Tweets tagged with #dh2013 (case not sensitive).
Priego, Ernesto (2014): An Incomplete #dh2013 Twitter Archive (Conference Days Only; Times in GMT and BST). figshare.
The Digital Humanities 2013 conference took place at the University of Nebraska–Lincoln, USA, 16-19 July 2013.
The archive I have shared contains approximately 6,661 Tweets published publicly and tagged with #dh2013 between Mon Jul 15 07:12:10 +0000 and Sat Jul 20 23:20:04 +0000.
Plase note the data on the set is incomplete. If you have the missing Tweets, will you let us know to complete a set?
Hopefully, albeit incomplete, the available data might be used for some comparative analysis between 2013 and 2014 and ideally coming years.
The Tweets contained in the file were originally collected in July 2013 using Martin Hawksey’s TAGS 5.1.
Due to the volume of Tweets several Google Spreadsheets were created during preceding and during the event, which were subsequently refined to individual sheets. An attempt to reconstruct the chronology was done manually.
With thanks to Lisa Rhody who contributed some Tweets I had failed to collect. The file contains 7 sheets:
Sheet 0. A ‘Cite Me’ sheet, including procedence of this file, citation information, information about its contents, the methods employed and some context.
Sheet 1. Monday 15 July 2013 ( 371 Tweets; noticeably incomplete)
Sheet 2. Tuesday 16 July 2013 ( 1, 187 Tweets)
Sheet 3. Wednesday 17 July 2013 ( 2, 227 Tweets)
Sheet 4. Thursday 18 July 2013 ( 2, 826 Tweets)
Sheet 5. Friday 19 July 2013 (approx 1500 Tweets; various Tweets with line breaks; noticeably incomplete due to high volumes and collection times; set from Fri Jul 19 13:41:01 +0000 )
Sheet 6. Saturday 20 July 2013 ( 122 Tweets; incomplete, set starts from Sat Jul 20 17:42:30)
Times are, unfortunately, in GMT (created) and BST (time). They should have been in Nebraska time, though of course not all Tweets were tweeted from the conference location. This needed to be set in the first collection and I failed to do that, so I continued collecting with the same time settings. (Sorry). This means that dates do not correspond with Conference day times due to time difference. Nebraska is CDT.
Only users with at least 2 followers were included in the archive. Retweets have been included. Data might require reduplication.
Due to the different methods employed in attempting to catch a high volume of Tweets, unfortunately the metadata in the set is not complete (the lack of ISO language metadata in most of these sheets is particularly disappointing, as it would have provided interesting insights).
Some work was done to ensure the chronology was complete; I have highlighted gaps in the Tweets on yellow on the sheets and in the listing above.
The usual conditions and limitations are included in the file and on the fighsare upload cited above.