#DH2018 and #DH2019 Twitter Archive Counts. A Comparison

Background

My interest in documenting the scholarly activity on Twitter using conference hashtags is not new; for the digital humanities I have been looking into it since 2010. Searching on this blog or googling related keywords may throw some results to those interested in background. I have been archiving conference hashtag archives for a while now, often depositing them as part of the scholarly record, blogging and giving workshops about my objectives and methdologies, etc.

I like sharing results in real time while conferences are taking place or shortly after. Therefore any results shared are always-already provisional, perfectible, and unfinished. I have always believed that a signal is better than no signal or having to wait 3 years for one, therefore I insist in sharing any quick insights that I can get rather than not sharing them at all or having to wait until I miraculously find the time to do it differently (which I am not likely to, so I’d rather take any opportunity I have to share something). Hopefully someone finds it helpful in some way.

Once again I have also been critical of the metrication of scholarly activitiy so the fact that I share quantitative data from the archives collected does not mean I think this metrication is always-already something to aspire to or that it means anything in particular. I see it as an ethnographic means to document the existence of scholarly activity on Twitter around academic conferences in specific fields, and perhaps as an entry point to assess academic and public engagement on Twitter with academic hashtags and the events they represent, and/or possibly any increase or decrease or transformations in this type of activity on Twitter. For example, it is possible to gain insights of Twitter user settings preferences, as in the case of the language users have set up, as I looked into this post on user_lang in #DH2018 tweets.

The Methods

The metrics compared here are the result of a double method of collection as a means to ensure the validity of the collected data. I used a Python script to collect both archives, and then set the parameters as those for archives I collected using TAGS (see Priego 2018). Even if the collected data still needs to be refined, when the counts are the same or very semilar I get a degree of certainty the data collected via TAGS from the Twitter Search API is close to being as reliable as it could be.

For 2018 and 2019 I managed to get the settings and timings right to achieve what looks like a complete set of #DH2018 and #DH2019 tweets. Below I share a comparative table where the main metrics can be compared. As indicated in the table, it must be noted that there are important differences in mainly a) the number of days before and after the conference days included in the archive and b) the number of days each conference was held on according to their respective web pages / programmes (I seem to remember the Mexico City conference had activities at least one day prior to the date indicated on the main web site but I may be misremembering- need to check).

The Basic Counts

Needless to say most interesting or useful insights from looking at these archives would be qualitative data and not necessarily quantitative data as the one presented here. The RTs and @ replies stats can give an indication of the level of interaction in between accounts, and the number of accounts tweeting with each hashtag each year could be seen as an indication of the interest in the conference or hashtag (this indication may be misguiding due to spamming or confusion due to hashtag overlap, and of course one would need to know which accounts are included and not included in each one).

There is a series of analyses that can be run with the full data collected and I hope that now that I have a more solid longitudinal dataset of yearly archives I may be able to do that with more roubstness soon. In the meanwhile then, for what they are worth here are the main archive stats compared for last year and this year.

 

#DH2018 #DH2019 Notes
First conference day according to programme 26/06/2018 08/07/2019
Last conference day according to programme 29/06/2019 12/07/2019
First Tweet Collected in Archive 24/06/2018 06:19 29/06/2019 02:13 Local conference time zone
Last Tweet Collected in Archive 30/06/2018 06:17 14/07/2019 22:56 Local conference time zone
Days collected 6 days 16 days
Number of collected tweets (includes RTs) 13858 14101 Data might require refining and deduplication
In Reply Ids 564 1091
In Reply @s 747 812
Number of links 4312 9061
Number of RTs 8656 8650 Estimate on occurrence of RTs
Number of unique accounts 2329 2157
Conference location Mexico City, Mexico Utrecht, the Netherlands
Priego, E. (2019): #DH2018 and #DH2019 Twitter Archive Counts. Summary Comparative Data Table. figshare. Dataset. https://doi.org/10.6084/m9.figshare.8918810

 

Insights

Even if I collected #DH2019 during a longer period (ten days more than the #DH2018 archive), there were fewer unique user accounts using #DH2019 than #DH2018. And taking into account the #DH2019 archive included more collection days and therefore more opportunity for interactions, the #DH2019 archive showed more replies, mentions and links than the #DH2018 one. The number of tweets and RTs in both archives (again, taking into account the differences in collection days) remained very close. It could be argued the Twitter activity does not indicate an increment nor reduction in engagement (as manifested through tweets or RTs) with the conference hashtag, while showing that this year fewer accounts participated. What follows is refining and deduplicating the data if required, in order to limit the archives to the same data collection timings, revise the initial insights, and then perform qualitative text and account analysis in order to determine amongst other things if any differences in unique accounts using the hashtag were relevant to the field, or were simply bots or other unrelated accounts like spam bots. That qualitative refining could give us greater certainty about any changes in the demographic engaging with the conference hashtags over the years. This needs to be done carefully and following ethical standards.

A Polite Request

If you are interested in this same topic and you read this please do not disregard this output only because it’s not been published in a peer-reviewed journal. If you get any type of inspiration or value or motivation from this post, my tweets about it or any other blog posts about Twitter archiving, please do cite these outputs- not only is it good academic practice but a way for us to know about other responses to the same issues and to continue building knowledge together.

References

Priego, E. (2018) Archiving Small Twitter Datasets for Text Analysis: A Workshop Tutorial for Beginners. figshare. https://doi.org/10.6084/m9.figshare.6686798
Priego, E. (2019): #DH2018 and #DH2019 Twitter Archive Counts. Summary Comparative Data Table. figshare. Dataset. https://doi.org/10.6084/m9.figshare.8918810

 

Finally, My Incomplete #dh2013 Twitter Archive (Conference Days Only; Times in GMT and BST)

DH2013 Nebraska logo

Having just shared an (albeit incomplete) #dh2014 archive, I could not stand anymore having lots of #dh2013 spreadhseets in my hard drive without attempting to share a more or less ordered set once and for all.

Taking advantage of the post-#dh2014 inspired drive, I have finally shared an XLS file containing a dataset of Tweets tagged with #dh2013 (case not sensitive).

Priego, Ernesto (2014): An Incomplete #dh2013 Twitter Archive (Conference Days Only; Times in GMT and BST). figshare.
http://dx.doi.org/10.6084/m9.figshare.1103247

The Digital Humanities 2013 conference took place at the University of Nebraska–Lincoln, USA, 16-19 July 2013.

The archive I have shared contains approximately 6,661 Tweets published publicly and tagged with #dh2013 between Mon Jul 15 07:12:10 +0000 and Sat Jul 20 23:20:04 +0000.

Plase note the data on the set is incomplete. If you have the missing Tweets, will you let us know to complete a set?

Hopefully, albeit incomplete, the available data might be used for some comparative analysis between 2013 and 2014 and ideally coming years.
The Tweets contained in the file were originally collected in July 2013 using Martin Hawksey’s TAGS 5.1.

Due to the volume of Tweets several Google Spreadsheets were created during preceding and during the event, which were subsequently refined to individual sheets. An attempt to reconstruct the chronology was done manually.

With thanks to Lisa Rhody who contributed some Tweets I had failed to collect. The file contains 7 sheets:

Sheet 0. A ‘Cite Me’ sheet, including procedence of this file, citation information, information about its contents, the methods employed and some context.

Sheet 1. Monday 15 July 2013 ( 371 Tweets; noticeably incomplete)

Sheet 2. Tuesday 16 July 2013 ( 1, 187 Tweets)

Sheet 3. Wednesday 17 July 2013 ( 2, 227 Tweets)

Sheet 4. Thursday 18 July 2013 ( 2, 826 Tweets)

Sheet 5. Friday 19 July 2013 (approx 1500 Tweets; various Tweets with line breaks; noticeably incomplete due to high volumes and collection times; set from Fri Jul 19 13:41:01 +0000 )

Sheet 6. Saturday 20 July 2013 ( 122 Tweets; incomplete, set starts from Sat Jul 20 17:42:30)

Times are, unfortunately, in GMT (created) and BST (time). They should have been in Nebraska time, though of course not all Tweets were tweeted from the conference location. This needed to be set in the first collection and I failed to do that, so I continued collecting with the same time settings. (Sorry). This means that dates do not correspond with Conference day times due to time difference. Nebraska is CDT.

Only users with at least 2 followers were included in the archive. Retweets have been included. Data might require reduplication.

Due to the different methods employed in attempting to catch a high volume of Tweets, unfortunately the metadata in the set is not complete (the lack of ISO language metadata in most of these sheets is particularly disappointing, as it would have provided interesting insights).

Some work was done to ensure the chronology was complete; I have highlighted gaps in the Tweets on yellow on the sheets and in the listing above.

The usual conditions and limitations are included in the file and on the fighsare upload cited above.

 

An Incomplete #dh2014 Twitter Archive (Conference Days Only)

View of UNIL Sorge campus, hosting DH2014. Photo CC-BY Ernesto Priego

The Digital Humanities 2014 conference took place Monday 7 July 2014 – Saturday 12 July 2014 in Lausanne, Switzerland. (I was lucky to attend and present a poster there). Though there were other hashtags used to tweet about the conference, the main hashtag was #dh2014.

I started collecting #dh2014 Tweets on the 7th of September 2013. Having attempted to collect and reconstruct #dh2012 and #dh2013 archives (and having seen the growth in conference live-tweeting in DH since 2009) I knew the volume would exceed expectations. I broke several Google spreadsheets along the way. In the end I resigned myself to trying to reconstruct an archive for the duration of the conference proceedings, 7-12 July 2014.

I have now shared on figshare an .XLS file containing a dataset of Tweets tagged with #dh2014 (case not sensitive).

Priego, Ernesto (2014): An Incomplete #dh2014 Twitter Archive (Conference Days Only).   figshare.

http://dx.doi.org/10.6084/m9.figshare.1102950  

The complete archive contains  16,154 Tweets published publicly and tagged with #dh2014 between Monday 07/07/2014  00:03:00 (CEST) and Saturday 12/07/2014  23:48:00 (CEST).

The tweets contained in this file were collected using Martin Hawksey’s TAGS 5.1. Due to the volume of Tweets nine Google Spreadsheets were created during the period of the event, which were subsequently refined to four. The data was subsequently organised manually into various sheets, which have been included here.

Sheet 0.  A ‘Cite Me’ sheet, including procedence of this file, citation information,  information about its contents, the methods employed and some context.

Sheet 1. Monday 7 July 2014 (1,052 Tweets; (1,052 Tweets; gap between 07/07/2014 10:19 and 07/07/2014 11:20)

Sheet 2. Tuesday 8 July 2014 (3,605 Tweets)

Sheet 3. Wednesday 9 July 2014 (4,372 Tweets)

Sheet 4. Thursday 10 July 2014 (2,879 Tweets; significant gap between 10/07/2014 01:51 and 10/07/2014 10:10)

Sheet 5. Friday 11 July 2014 (3,843 Tweets)

Sheet 6. Saturday 12 July 2014  (403 Tweets)

Collected under local Lausanne, Switzerland times. Times in GMT also included.

Only users with at least 2 followers were included in the archive. Retweets have been included. Data might require reduplication.

Unfortunately the metadata in the sheets for Monday – Thursday is incomplete (the lack of ISO language metadata in these sheets is particularly disappointing, as it would have provided interesting insights); Friday and Saturday do contain the standard metadata available from TAGS.

Some work was done to ensure the chronology was complete; I have highlighted a gap in the Tweets on Monday 7 July 2014 between 07/07/2014 10:19  and 07/07/2014 11:20 and on Thursday 9 July 2014 between 10/07/2014 01:51 and 10/07/2014 10:10.

I was not able to recover these Tweets. Yannick Rochat and Martin Grandjean’s archive has what seems the complete set (available at http://goo.gl/6W3dol; last accessed Tuesday 15 July 2014 11:55 BST). Please cfr:

Please note Rochat and Grandjean’s dataset has 16,903 Tweets, whereas my collection only harvested 16,154 Tweets (749 Tweets less).

Please note that both research and experience show that the Twitter search API isn’t 100% reliable. Large tweet volumes affect the search collection process. The API might “over-represent the more central users”, not offering “an accurate picture of peripheral activity” (González-Bailón, Sandra, et al. 2012).

The Tweet volume was higher than what the available collecting methods allowed so data is likely to be incomplete. It is not guaranteed this file contains each and every Tweet tagged with #dh2014 during the indicated period, and is shared for comparative and indicative educational and research purposes only.

Please note the data in this file is likely to require further refining and even deduplication. The data is shared as is.  This dataset is shared to encourage open research into scholarly activity on Twitter. If you use or refer to this data in any way please cite and link back using the citation information above.

Online Attention to Digital Humanities Publications (#DH2014 poster)

 Priego, Ernesto; Havemann, Leo; Atenas, Javiera (2014): Online Attention to Digital Humanities Publications (#DH2014 poster). figshare. http://dx.doi.org/10.6084/m9.figshare.1094345

“Online Attention to Digital Humanities Publications” is the final title of the poster I created and worked on with Leo Havemann (Birkbeck College, University of London) and Javiera Atenas (University College London).

This poster is being presented in the Digital Humanities 2014 conference, Lausanne, Switzerland, July the 10th 2014, 2:00pm – 3:30pm. I am blogging this after having set it up on the exhibition board. Paper was heavy and it wouldn’t stay up!

Photo of the Online Attention to Digital Humanities Publications (#DH2014) poster as set up in the exhibition space
DH2014, Lausanne, Switzerland, 11.58.40 am.

As you can see the poster is already heavy with text (its real dimensions are A0 -118.89 x 84.11 cm-, so it is legible), but we were unable to include all the findings we obtained and had ambitiously promised in the abstract.  We chose to focus on mapping the principal authors of the outputs in the dataset and on looking at the role of open access, licensing, etc. We are working on a long paper where we visualise and discuss the international distribution of the Tweets mentioning the top articles in the dataset and engage with the relationships between online mentions, access type and citations.

The poster is available for download and citation as:

Priego, Ernesto; Havemann, Leo; Atenas, Javiera (2014): Online Attention to Digital Humanities Publications (#DH2014 poster). figshare.
http://dx.doi.org/10.6084/m9.figshare.1094345

We suggest altmetrics services like the Altmetric Explorer can be an efficient method to obtain bibliographic datasets and track scholarly outputs being mentioned online in the sources curated by these services.  Our dataset reflects that outputs with “digital humanities” in their metadata were not published in fully-fledged Open Access journals. The role of SSRN and arXiv as open repositories was found to be relatively significant, but the licensing of the outputs available through them was not always immediately clearly displayed.

Our working definition of “Open Access” requires outputs to be open for human and machine access through  CC-BY or at least CC-BY-SA. The absence of clear licensing information at output level is perceived to be problematic, as is the lack of any outputs clearly and visibly licensed with CC-BY.

The fact the 3 most-mentioned outputs in the dataset were available without a paywall might signal towards the potential of Open Access for greater public impact. ‘Free access’ outputs in paywalled journals did not reflect higher mentions nor citations than their paywalled or non-paywalled counterparts.

Though the dataset reflects a predictable dominance of authors based in the USA, the dataset points towards a growing presence of international digital humanities researchers.

Source data:

The information on the poster is derives from a dataset based on an original report obtained with the Altmetric Explorer on April 23 2014. More recent reports are likely to vary.

Priego, Ernesto; Havemann, Leo; Atenas, Javiera (2014): Source Dataset for Online Attention to Digital Humanities Publications (#DH2014 poster). figshare.
http://dx.doi.org/10.6084/m9.figshare.1094359

The original Altmetric Explorer data export was refined, modified and edited by Ernesto Priego, Leo Havemann and Javiera Atenas.

If you use, share or refer to this poster and data please use the citation information above.

This poster and its source data are shared under a CC-BY license.

At DH2014: Online Attention to Digital Humanities Publications; Attendance Survey

I am very excited I will be presenting a poster I co-authored with my colleagues Leo Havemann (Birkbeck College) and Javiera Atenas (University College London) at the Digital Humanities 2014 conference in Lausanne, Switzerland, this Thursday 10 July 2014.

The original abstract we submitted is here.

Our poster’s actual size is A0. It will be shared openly online as well on Thursday itself.

With my colleague Élika Ortega we have also been conducting a short, quick survey on attendance to the conference. Whether you attended or not, we need your help!