This is post 1 in a series.
Post 2: Infographic
Post 3: Some charts
The Digital Humanities Summer Institute (DHSI) took place 2-6 June 2014. DHSI takes place on the University of Victoria campus, and is offered by UVic’s Electronic Textual Cultures Lab.
I have uploaded to fighsare an .XLS file containing an archive of tweets tagged with #dhsi2014 (case not sensitive).
The citation information is:
Priego, Ernesto (2014): Digital Humanities Summer Institute 2014: A #dhsi2014 Archive. figshare.
I created and shared the file with a Creative Commons- Attribution license (CC-BY) for academic research and educational use. I am aware other colleagues were collecting #dhsi2014 tweets as well (Jon Martin; Alyssa Arbuckle), hopefully this either complements their archives or offers a way to do comparisons, analysis, etc.
The complete archive contains 10,686 tweets, the first one dated 26/05/2014 12:32:00 and the last one dated 07/06/2014 00:46:08 (Vancouver Central Time).
The tweets contained in this file were collected using Martin Hawksey’s TAGS 5.1. Due to the volume of tweets nine Google Spreadsheets were created during the period of the event, which were subsequently refined to four. The data was subsequently manually refined into various sheets, which have been included here.
- Sheet 0. A ‘Cite Me’ sheet, including procedence of the file, citation information, information about its contents, the methods employed and some context.
- Sheet 1. All includes all 10,686 tweets archived between 6/05/2014 12:32:00 and 07/06/2014 00:46:08 (Vancouver Pacific Time). (Note this sheet includes some tweets with line breaks so number of rows is higher than number of actual tweets. This should be cleaned).
- Sheet 2. All DHSI dates includes archived 10,056 tweets posted throughout the duration of the event, between 02/06/2014 09:16:00 and 07/06/2014 00:46:08. (Event ended on 06/06/2014 but included a few tweets published after midnight that night).
- Sheet 3. Covers the period between the 26/05/2014 and 31/05/2014, with 290 archived tweets. This is prior to the actual event.
- Sheet 4. Covers 01/05/2014, with 335 archived tweets. This is also prior to the actual event.
- Sheet 5. Covers 02/06/2014, with 2,829 archived tweets. This was the first day of the event.
- Sheet 6. Covers 03/06/2014, with 1,726 archived tweets.
- Sheet 7. Covers 04/06/2014, with 1,882 archived tweets.
- Sheet 8. Covers 5/06/2014, with 1,970 archived tweets.
- Sheet 9. Covers 6/06/2014, with 1,649 archived tweets. This was the last day of the event.
- Sheet 10. Covers the early hours of 07/06/2014, with 5 archived tweets (until 07/06/2014 00:46:08 only).
- Sheet 11. Includes an archive of tweets tagged with #dhsi14 (the ‘official’ hashtag was #dhsi2014). It contains 58 tweets archived between 30/05/2014 18:05:28 and 06/06/2014 14:23:17.
To avoid spam only users with at least 2 followers were included in the archive. Retweets have been included. Column D refers to the date and time in which tweet was archived (in GMT); Column E refers to the date of publication (in the event’s local time; Vancouver Pacific Time).
Please note that both research and experience show that the Twitter search API isn’t 100% reliable. Large tweet volumes affect the search collection process. The API might “over-represent the more central users”, not offering “an accurate picture of peripheral activity” (González-Bailón, Sandra, et al. 2012). Therefore, it cannot be guaranteed this file contains each and every tweet tagged with #dhsi2014 during the indicated period.
Please take into account that other hashtags for specific sessions might have been used, and that participants might have tweeted from/about the conference (remotely or locally) without the hashtag collected here, so this archive does not represent an authoritative, complete view of all the Twitter activity associated to the event.
Some deduplication and refining has been performed to avoid spam tweets and duplication. Some work was done to ensure the chronology was complete; I have highlighted an apparent gap in the tweets between 05/06/2014 23:41 and 06/06/2014 10:19; this could be due to a later start on the morning of the last day of activities, though one could have expected other tweets form other time zones coming it at that time, so it’s possible I have missed them. Some characters in some of the tweets’ text might not have been decoded correctly.
[The #dhsi2014 tweets per day in archive bar chart had a typo so have removed to correct; will replace as soon as I can].
Please note the data in the file is likely to require further refining and even deduplication. The data is shared as is. If you use or refer to this data in any way please cite and link back using the citation information above.
I’ll be following up this post with some findings from the data.