#DH2018 and #DH2019 Twitter Archive Counts. A Comparison


My interest in documenting the scholarly activity on Twitter using conference hashtags is not new; for the digital humanities I have been looking into it since 2010. Searching on this blog or googling related keywords may throw some results to those interested in background. I have been archiving conference hashtag archives for a while now, often depositing them as part of the scholarly record, blogging and giving workshops about my objectives and methdologies, etc.

I like sharing results in real time while conferences are taking place or shortly after. Therefore any results shared are always-already provisional, perfectible, and unfinished. I have always believed that a signal is better than no signal or having to wait 3 years for one, therefore I insist in sharing any quick insights that I can get rather than not sharing them at all or having to wait until I miraculously find the time to do it differently (which I am not likely to, so I’d rather take any opportunity I have to share something). Hopefully someone finds it helpful in some way.

Once again I have also been critical of the metrication of scholarly activitiy so the fact that I share quantitative data from the archives collected does not mean I think this metrication is always-already something to aspire to or that it means anything in particular. I see it as an ethnographic means to document the existence of scholarly activity on Twitter around academic conferences in specific fields, and perhaps as an entry point to assess academic and public engagement on Twitter with academic hashtags and the events they represent, and/or possibly any increase or decrease or transformations in this type of activity on Twitter. For example, it is possible to gain insights of Twitter user settings preferences, as in the case of the language users have set up, as I looked into this post on user_lang in #DH2018 tweets.

The Methods

The metrics compared here are the result of a double method of collection as a means to ensure the validity of the collected data. I used a Python script to collect both archives, and then set the parameters as those for archives I collected using TAGS (see Priego 2018). Even if the collected data still needs to be refined, when the counts are the same or very semilar I get a degree of certainty the data collected via TAGS from the Twitter Search API is close to being as reliable as it could be.

For 2018 and 2019 I managed to get the settings and timings right to achieve what looks like a complete set of #DH2018 and #DH2019 tweets. Below I share a comparative table where the main metrics can be compared. As indicated in the table, it must be noted that there are important differences in mainly a) the number of days before and after the conference days included in the archive and b) the number of days each conference was held on according to their respective web pages / programmes (I seem to remember the Mexico City conference had activities at least one day prior to the date indicated on the main web site but I may be misremembering- need to check).

The Basic Counts

Needless to say most interesting or useful insights from looking at these archives would be qualitative data and not necessarily quantitative data as the one presented here. The RTs and @ replies stats can give an indication of the level of interaction in between accounts, and the number of accounts tweeting with each hashtag each year could be seen as an indication of the interest in the conference or hashtag (this indication may be misguiding due to spamming or confusion due to hashtag overlap, and of course one would need to know which accounts are included and not included in each one).

There is a series of analyses that can be run with the full data collected and I hope that now that I have a more solid longitudinal dataset of yearly archives I may be able to do that with more roubstness soon. In the meanwhile then, for what they are worth here are the main archive stats compared for last year and this year.


#DH2018 #DH2019 Notes
First conference day according to programme 26/06/2018 08/07/2019
Last conference day according to programme 29/06/2019 12/07/2019
First Tweet Collected in Archive 24/06/2018 06:19 29/06/2019 02:13 Local conference time zone
Last Tweet Collected in Archive 30/06/2018 06:17 14/07/2019 22:56 Local conference time zone
Days collected 6 days 16 days
Number of collected tweets (includes RTs) 13858 14101 Data might require refining and deduplication
In Reply Ids 564 1091
In Reply @s 747 812
Number of links 4312 9061
Number of RTs 8656 8650 Estimate on occurrence of RTs
Number of unique accounts 2329 2157
Conference location Mexico City, Mexico Utrecht, the Netherlands
Priego, E. (2019): #DH2018 and #DH2019 Twitter Archive Counts. Summary Comparative Data Table. figshare. Dataset. https://doi.org/10.6084/m9.figshare.8918810



Even if I collected #DH2019 during a longer period (ten days more than the #DH2018 archive), there were fewer unique user accounts using #DH2019 than #DH2018. And taking into account the #DH2019 archive included more collection days and therefore more opportunity for interactions, the #DH2019 archive showed more replies, mentions and links than the #DH2018 one. The number of tweets and RTs in both archives (again, taking into account the differences in collection days) remained very close. It could be argued the Twitter activity does not indicate an increment nor reduction in engagement (as manifested through tweets or RTs) with the conference hashtag, while showing that this year fewer accounts participated. What follows is refining and deduplicating the data if required, in order to limit the archives to the same data collection timings, revise the initial insights, and then perform qualitative text and account analysis in order to determine amongst other things if any differences in unique accounts using the hashtag were relevant to the field, or were simply bots or other unrelated accounts like spam bots. That qualitative refining could give us greater certainty about any changes in the demographic engaging with the conference hashtags over the years. This needs to be done carefully and following ethical standards.

A Polite Request

If you are interested in this same topic and you read this please do not disregard this output only because it’s not been published in a peer-reviewed journal. If you get any type of inspiration or value or motivation from this post, my tweets about it or any other blog posts about Twitter archiving, please do cite these outputs- not only is it good academic practice but a way for us to know about other responses to the same issues and to continue building knowledge together.


Priego, E. (2018) Archiving Small Twitter Datasets for Text Analysis: A Workshop Tutorial for Beginners. figshare. https://doi.org/10.6084/m9.figshare.6686798
Priego, E. (2019): #DH2018 and #DH2019 Twitter Archive Counts. Summary Comparative Data Table. figshare. Dataset. https://doi.org/10.6084/m9.figshare.8918810


“Access/Accès”: #DH2017, Montreal, 8-11 August 2017 Tweetage Volume Charts

Screen Shot 2017-08-08 at 12.03.36

#DH2017 starts today in Montreal.  The theme is “Access/Accès”. Details in the hyperlink. I wish I were there!

I am sure the tweetage will exceed the limits of my poor Google spreadsheet, but as it’s become kind of customary I am attempting to collect as many tweets with the conference hashtag as possible.

Using Martin Hawksey’s TAGS, here’s what the archive looks like as of 6:35:05 AM Montreal time of the first official day (8 August 2017):

Archive for #DH2017, Top Tweeters and 3 day activity, 6:35:05 of day one Montreal time

As of 9 August 2017, 6:11:33 AM Montreal time

Screen Shot 2017-08-09 at 11.19.25

As of 10 August 2017, 6:07:45 AM Montreal time

Screen Shot 2017-08-10 at 11.13.54

As of 11 August 2017, 7:12:46 AM Montreal time

Screen Shot 2017-08-11 at 12.30.08

As of 12 August 2017, 03:11:57 AM Montreal time. (I would have liked to take this screenshot later but I would not be online at that time. Considering the conference had finished by then it will do),

Screen Shot 2017-08-12 at 08.44.15

As of 13 August 2017, 05:50:54 AM Montreal time

Screen Shot 2017-08-13 at 11.16.34

On 9 August do note the hashtag went nuclear being spammed, particularly with  annoying ‘trending topics’ tweets, so data could do with some refining. However it does not look, at a quick glance, that spamming was serious. With more time further on and once I have closed the collection I could take a closer look and give an indication of the extent of the spamming. In any case please note as always the counts I am presenting are merely indicative, numbers are not meant to be taken at face value and no inherent quality or value judgements should be inferred from the volumes reported.

As I often state the data presented is the result of the collection methods employed, different methods are likely to present different results.

Note that this time only tweets from users with at least 10 followers are being collected. For the purpose of the archive, retweets count as tweets (this means not every tweet contains ‘original’ content).

It has been assumed that those scholars or scholarly organisations tweeting publicly from public accounts at very high volumes from an international conference do expect to get noticed by the international community for for their tweetage with the hashtag and therefore are giving implicit consent to get noted by said community for scholarly purposes; if anyone opposes to their username appearing in one of the ‘Top Tweeters’ bar charts above please let me know and I can anonymise their username retrospectively if that helps.

This is the first year I manage to archive a more or less complete set. On the one hand it helps that TAGS has improved, that I was able to be collecting and monitoring the collection in real time, and that I set the limit of a minumum of 10 followers for accounts to be collected. It also helped I did not start collecting to far back in advance as I sometimes have done.

I will be depositing a dataset of Tweet ID’s and timestamps, which is the source data for the charts embedded here, next week.

Speaking of “Access/Accès”, here’s a recent post I wrote about access and license types in a set of articles from the Journal of Digital Scholarship in the Humanities. In case you missed it (you probably did), it might be of interest given this year’s theme.