The Lockdown Chronicles 38: The Department of Health and Social Care

This is just a thumbnail. To go to the comic strip, click on the blog post URL.

Click on the image below to read the comic strip in full size. Sources and references on this post under the comic strip below.

The Department of Health and Social Care tweets.
Click on image for full size.

Source: Department of Health and Social Care. “As of 9am 6 June, there have been 5,438,712 tests, with 218,187 tests on 5 June. 284,868 people have tested positive. As of 5pm on 5 June, of those tested positive for coronavirus, across all settings, 40,465 have sadly died. More info at” Tweet. @DHSCgovuk 2:11 PM BST, June 6, 2020. Twitter Web App. Available from [Accessed 6 June 2020].

This comic strip CC-BY-NC-SA.


References and additional resources

Department of Health and Social Care. “As of 9am 6 June, there have been 5,438,712 tests, with 218,187 tests on 5 June. 284,868 people have tested positive. As of 5pm on 5 June, of those tested positive for coronavirus, across all settings, 40,465 have sadly died.
More info at” Tweet. @DHSCgovuk 2:11 PM BST, June 6, 2020. Twitter Web App. Available from [Accessed 6 June 2020]. Coronavirus (COVID-19) in the UK, available at [Accessed 6 June 2020]

Office for National Statistics, “Latest data and analysis on coronavirus (COVID-19) in the UK and its effect on the economy and society.” Available at [Accessed 6 June 2020]

Johns Hopkins University Coronavirus Resource Center, available at [Accessed 6 June 2020]

Sandle, P., and Faulconbridge, G. Reuters. (28 March 2020) World News.
“UK coronavirus death toll under 20,000 would be ‘good result’, says health chief”. Available from . [Accessed 6 June 2020]

The Lockdown Chronicles is a series of periodical comic strips made at night (in candlelight!) adapting and reusing openly-licensed or public domain items from online digital collections. Publication and tweetage are scheduled in advance. Historical sources are adapted and updated for the current pandemic; please refer to each strip’s references on each post for further context.  Catch up with the series at

Tweeting in an Age of Overwhelming Information Overload and Increased Workloads


Twitter is no longer niche as it once was. How has my thinking changed in relation to Twitter use by academics? In this post I bullet-point some ideas that can be taken if desired as tips or strategies by those academic colleagues who are new to Twitter. You can scroll down and skim if you want.

 [PhD Comics, August 21 2014]
[PhD Comics, August 21 2014]
Motivation for this post

I‘ve been asked to become a “social media champion” for my school. I think it’s cool there’s an interest in embracing social media more widely, organically and effectively.

The past

Things have changed significantly since Sarah and I started touring the UK in 2011 giving social media workshops for academics with Networked Researcher (RIP), and, indeed my own personal and professional views on Twitter have evolved along the way- what we call “social media” is no longer a niche, defined region of the Internet and the Web, but as mainstream as it can possibly get, reaching a relevance and centrality in today’s information and technological sphere that is yet to be surpassed.

I wrote dozens of blog posts for a variety of international platforms (some long extinct) in the distant past (2011-2013) on academic Twitter use, including the following pieces that got published by the Guardian Higher Education Network.

If you click on the links and read the articles, please do take them with a grain of salt and historical perspective as things have evolved significantly since. I would write them differently today (also; headlines were the Guardian’s, not mine).

This tour down memory lane has also reminded of this blog post that I wrote for Altmetric in 2013 on “Strategies to Get your Research Mentioned Online“. It needs rewriting now.

(By the way, remember this LSE Impact Blog November 2013 post by Alan Cann on academic blogging going mainstream?)

Sharing these links here again as context and in case it’s of historical interest.

Those were the days. We were young. We thought everything was possible. (It still is, albeit in a completely different way!).

The present moment

How to think of academic tweeting in an age of overwhelming information overload and increased workloads? How has my thinking changed in relation to Twitter use by academics?

I cannot go in great detail here, but I thought I’d try to bullet-point some ideas that can be taken if desired as tips or strategies by those academic colleagues who are new to Twitter.

  • Twitter needs to be taken seriously. In spite of its ill-repute, it is an influential public platform for the dissemination of information. Precisely what information we disseminate on it is each user’s responsibility.
  • No one uses Twitter in the exact same way. Twitter is always-already experienced differently by each and every user. There are therefore no straight-forward rules. Most users learn along the way. An experienced Twitter user is more likely to use Twitter better than an inexperienced Twitter user who has read all the social media policies, terms and conditions and ethical guidelines available. An experienced Twitter user who has read all those documents will be an even better user, but that’s a personal view.
  • The default Twitter web client and the Twitter mobile app are not the right tools for busy people who are expected to author “content”. If you are busy, are already doubtful Twitter can deliver quality information, and feel being asked to tweet as an annoying imposition or a waste of time, there are no worse tools to start doing it than those.
  • For new users it may look daunting, but I totally recommended using TweetDeck to those academics being asked to manage a work account and/or wishing to be more effective locating and monitoring relevant accounts and content. TweetDeck is a free web-based application owned by Twitter. There is no mobile version. To use TweetDeck you will need a Twitter account. How to use TweetDeck guidance here.
  • In general, I think tweeting from your mobile phone for work is a bad idea- unless there’s no other choice, you are at a conference without space to place or plug your laptop, etc.
  • Before you start tweeting for work it helps to have clarity of purpose. Do not think of Twitter as an instant messaging service; think of it as a public publishing platform. What is it you need to communciate? To whom? Why? When? How?
  • Everyone and their dog is on Twitter. (And yet… so many aren’t so far). How will you become visible? Before joining Twitter, make a list of people and organisations you want to be visible to. Think of it as your Twitter contact list or address book.
  • Search for your stakeholders on Twitter via TweetDeck and create a list with a descriptive name. The more specific the list the better. You can have different lists. On Tweetdeck, you can get a column per list, where you will only see, if desired, tweets by those accounts you have added in your list. Think of it as an email folder for which you have created rules.
  • You don’t have to have a column for your timeline, where you would see everyone you follow. These days, to use your main Twitter timeline as your main way of monitoring Twitter is frankly inefficient, also because regardless of what your settings are the algorithm will prioritise some content over others and it will not be first posted first. We need to try to beat the relevance algorithm and curate our own dedicated timelines.
  • If your goal is to use Twitter to communicate the work you or your organisation does, you can schedule tweets in advance on TweetDeck. This means you don’t need to be on Twitter all the time. You don’t have to tweet in real time.
  • If you blog, make sure you add a social media sharing widget so that your posts get tweeted automatically when you publish. Make sure your site’s readers can share your posts on social media easily- customise the sharing widgets so the share text generated includes a mention of your username (e.g. “[Post title] [URL] via @ernestopriego“).
  • Systematically share what you publish or deposit in your open access institutional or data repository. If you don’t share your own work, who will?
  • Twitter is social, so it won’t work well if you only broadcast your own content. Even if your intention is to mainly broadcast what you or your organisation does, having columns of your stakeholders will allow you to check those columns at an appropriate time and see fewer tweets (more manageable) but potentially they will be more relevant because you have more carefully/strictly curated the sources in that timeline in advance.
  • Have a column for your notifications, and acknowledge positive feedback whenever you can. Often there’s no need to reply, ‘liking’ a reply suffices these days a an acknowledgement and it can go a long way. You are busy and others know it because they are busy too, but still appreciate a nudge of appreciation.
  • No user is an island. Create continents and archipielagos, build bridges.
  • Retweet what you find interesting or useful, support causes or themes you advocate, but avoid amplifying discord or bad vibes (those are, I’m aware, relative).
  • Include the disclaimers “RTs and likes are not endorsements” in your bio, to be safe. Avoid/do not RT tweets you wouldn’t have tweeted originally yourself (ask yourself: would I have published this for the world to see? By retweeting it, you are doing just that), including those tweets with links to content you have not checked before. Check and read links before retweeting/tweeting them.

In a way these same strategies have already been in practice for a while. They are not new. If anything, the pressing realities of employment in a digital age mean we need to be more drastically pragmatic and strategic.

I realise there’s way more I have to say about this, but I have surpassed the 1000 word count so I will have to leave it there. Thanks for reading, if you did.

Tweets per user_lang in a #DH2018 archive

I collected an archive of #DH2018 tweets from accounts with at least 10 followers. The main quant summary is in the table below, which I also tweeted earlier:

Twitter Activity for #DH2018, archive by Ernesto Priego

I wanted to take a quick look at number of tweets per user_lang. “user_lang” filters the language that appears in the user twitter profile. (Please note “user_lang” is different from “lang”, which, when present, indicates a BCP 47 language identifier corresponding to the machine-detected language of the tweeted text).

Filtering the #DH2018 tweets archive by user_lang and then counting the number of tweets per user_lang gives us the following table:

tweet count per user_lang

The archive only collected tweets from acounts with at least 10 followers. The table above can be, just for fun, visualised as a simple bar chart, as a means to quickly show the difference in volume:

user_lang #dh2018 archive bar chart

Please note the archive collects unique tweets including RTs,  therefore it can be a unique tweet by a unique user who has been retweeted several times (or none) that contributes to the count or a given user_lang.

In other words, the counts above do not indicate there were x number of users whose Twitter profiles had x language code, but merely the number of tweets in this specific archive organised according to the user_lang code from the tweeter’s Twitter profile.

Therefore what this can possibly provide an indication of is of the over or under-representation of tweets from accounts whose Twitter profiles have specific language codes. It’s not that x number of tweets in the archive were in this or that language, nor that x number of tweeters using the hashtag speak this or that language.

What becomes apparent is that an overwhelming majority of accounts with tweets in the archive have ‘en’ as the language code in their Twitter profiles; it is interesting that, in the archive, only one tweet was collected by an account with ‘es-MX’ as the language code in its Twitter profile.

One must also take into account that often ‘en’ is or might be the default user_lang code in Twitter profiles.

I still need to go back to my archives from previous years, but it does look like that in spite of the usual over-representation of the ‘en’ user_lang code, at least there is a diversity of user_lang in the archived tweets, with ‘es’ in second place.

Once I refine and anonymise the data I will be depositing the source data for this post.

*This blog post was typed quickly, typos and wonky syntax might have remained.


Metricating #respbib18 and #ResponsibleMetrics: A Comparison

I’m sharing summaries of Twitter numerical data from collecting the following bibliometrics event hashtags:

  • #respbib18 (Responsible use of Bibliometrics in Practice, London, 30 January 2018) and
  • #ResponsibleMetrics (The turning tide: A new culture of responsible metrics for research, London, 8 February 2018).


#respbib18 Summary

Event title Responsible use of Bibliometrics in Practice
Date 30-Jan-18
Times 9:00 am – 4:30 pm  GMT
Sheet ID RB
Hashtag #respbib18
Number of links 128
Number of RTs 100
Number of Tweets 360
Unique tweets 343
First Tweet in Archive 23/01/2018 11:44 GMT
Last Tweet in Archive 01/02/2018 16:17 GMT
In Reply Ids 15
In Reply @s 49
Unique usernames 54
Unique users who used tag only once 26 <–for context of engagement

Twitter Activity

#respbib18 twitter activity last three days
CC-BY. Originally published as


#ResponsibleMetrics Summary

Event title The turning tide: A new culture of responsible metrics for research
Date 08-Feb-18
Times 09:30 – 16:00 GMT
Sheet ID RM
Hashtag #ResponsibleMetrics
Number of links 210
Number of RTs 318
Number of Tweets 796
Unique tweets 795
First Tweet in Archive 05/02/2018 09:31 GMT
Last Tweet in Archive 08/02/2018 16:25 GMT
In Reply Ids 43
In Reply @s 76
Unique usernames 163
Unique usernames who used tag only once 109 <–for context of engagement

Twitter Activity

#responsiblemetrics Twitter activity last three days
CC-BY. Originally published as

#respbib18: 30 Most Frequent Terms


Term RawFrequency
metrics 141
responsible 89
bibliometrics 32
event 32
data 29
snowball 25
need 24
use 21
policy 18
today 18
looking 17
people 16
rankings 16
research 16
providers 15
forum 14
forward 14
just 14
practice 14
used 14
community 13
different 12
metric 12
point 12
using 12
available 11
know 11
says 11
talks 11
bibliometric 10

#ResponsibleMetrics: 30 Most Frequent Terms

Term RawFrequency
metrics 51
need 36
research 29
indicators 25
panel 16
responsible 15
best 13
different 13
good 13
use 13
index 12
lots 12
people 12
value 12
like 11
practice 11
context 10
linear 10
rankings 10
saying 10
used 10
way 10
bonkers 9
just 9
open 9
today 9
universities 9
coins 8
currency 8
data 8


Twitter data mined with Tweepy. For robustness and quick charts a parallel collection was done with TAGS. Data was checked and deduplicated with OpenRefine. Text analysis performed with Voyant Tools. Text was anonymised through stoplists; two stoplists were applied (one to each dataset), including usernames and Twitter-specific terms (such as RT,, HTTPS, etc.), including terms in hashtags. Event title keywords were not included in stoplists.

No sensitive, personal nor personally-identifiable data is contained in this data. Any usernames and names of individuals were removed at data refining stage and again from text analysis results if any remained.

Please note that both datasets span different number of days of activity, as indicated in the summary tables. Source data was refined but duplications might have remained, which would logically affect the resulting term raw frequencies, therefore numbers should be interpreted as indicative only and not as exact measurements.  RTs count as Tweets and raw frequencies reflect the repetition of terms implicit in retweeting.


As usual I share this hoping others might find interesting and draw their own conclusions.

A very general insight for me is that we need a wider group engaging with this discussions. At most we are talking about a group of approximately 50 individuals that actively engaged on Twitter on both events.

From the Activity charts it is noticeable that tweeting recedes at breakout times, possibly indicating that most tweeting activity is coming from within the room– when hashtags create wide engagement, activity is more constant and does not exactly reflect the timings of actual real-time activity in the room.

It seems to me that the production, requirement, use and interpretation of metrics for research assessment directly affects everyone in higher education, regardless of their position or role. The topic should not be obscure or limited to bibliometricians and RDM, Research and Enterprise or REF panel people.

Needless to say I do not think everyone ‘engaged’ with these events or topics is or should be actively using the hashtag on Twitter (i.e. we don’t know how many people followed on Twitter). An assumption here is that we cannot detect nor measure anything if there is not a signal– more folks elsewhere might be interested in these events but if they did not use the hashtag they were logically not detected here. That there is no signal measurable with the selected tools does not mean there is not a signal elsewhere, and I’d like this to be a comment on metrics for assessment as well.

In terms of frequent terms it remains apparent (as in other text analyses I have performed on academic Twitter hashtag archives) that frequently tweeted terms remain ‘neutral’ nouns, or adjectives if they are a keyword in the event’s title, subtitle or panel sessions (e.g. ‘responsible’). When a term like ‘snowball’ or ‘bonkers’ appears, it stands out. Due to the lack of more frequent modifiers, it remains hard to distant-read sentiment or critical stances, or even positions. Most frequent terms do come from RTs, not because of consensus in ‘original’ Tweets.

It seems that if we wanted to demonstrate the value added by live-tweeting or using an event’s hashtag remotely, quantifying (metricating?) the active users, tweets over time, days of activity and frequent words would not be the way to go for all events, particularly not for events with relatively low Twitter activity.

As we have seen, automated text analysis is more likely to reveal mostly-neutral keywords, rather than any divergence of opinion on or additions to the official discourse. We would have to look at those words less repeated, and perhaps to replies that did not use the hashtag, but this is not recommended as it would complicate things ethically: though it is generally accepted that RTs do not imply endorsement, less frequent terms in Tweets with the hashtag could single-out individuals, and if a hashtag was not included on a Tweet it should be interpreted the Tweet is not meant to be part of that public discussion/corpus.





Great! News! People! Fake! Donald’s Tweets: 18 January 2017 to 18 January 2018

Trump Simplest Words image Image via The Telegraph
Image via The Telegraph

In two days it will be a year since the inauguration of Twitter user ID 25073877.  Time flies when things are beyond ridiculous, right?

Some of you may remember I’ve published before other posts looking into various aspects of this user’s tweetage. I have already detailed the methodology I have followed (as well as its acknowledged limitations) on some of those previous posts. This has been a work in progress. See for example this, or this, or even this. There’s more if you follow the links.

Anyway, as the anniversary of the inauguration approaches I wanted to share with you, for what it’s worth, some quick numbers from a whole year’s worth of Twitter data.

The dataset I worked with for the purpose of this post is based on a larger Twitter archive I’ve been collecting and studying.

The dataset that I looked into in this occasion is composed by 2,587 tweets posted between 18/01/2018 08:49 AM EST (GMT -5) and 18/01/2017 06:53 AM EST (GMT-5).

As usual I did some basic text analysis, and some quick comparative quant stuff.

20 Most Tweeted Terms

Term Count
great 473
news 190
people 182
fake 166
thank 162
just 160
today 158
president 151
big 145
tax 140
trump 137
america 134
country 128
u.s 125
jobs 116
american 115
time 110
foxandfriends 98
media 98
new 97


Other Twitter Data Numeralia

Twitter Text Counts

Number of ! 1,261
Number of Characters (no spaces, including URLs and usernames) 275,964
Number of Pages (single space, 12pt) 109
Number of Words 50,176

Follower Growth

User followers as of  18/01/2018 08:49 46,815,170
User followers as of 18/01/2017 06:53 20,227,768
Gained followers in the period 26,587,402

Tweets About the Mexico Border Wall

id_str time (EST)
9.53979E+17 18/01/2018 08:16
9.53264E+17 16/01/2018 08:54
9.51229E+17 10/01/2018 18:07
9.50884E+17 09/01/2018 19:16
9.49066E+17 04/01/2018 18:53
9.46732E+17 29/12/2017 08:16
9.38391E+17 06/12/2017 07:53
9.20425E+17 17/10/2017 19:03
9.18063E+17 11/10/2017 06:36
9.08274E+17 14/09/2017 06:20
9.01803E+17 27/08/2017 09:44
8.97833E+17 16/08/2017 10:51
8.97045E+17 14/08/2017 06:38
8.85279E+17 12/07/2017 19:24
8.78014E+17 22/06/2017 18:15
8.56849E+17 25/04/2017 08:36
8.56485E+17 24/04/2017 08:28
8.56172E+17 23/04/2017 11:44
8.56171E+17 23/04/2017 11:42
8.30406E+17 11/02/2017 08:18
8.24617E+17 26/01/2017 08:55
8.24084E+17 24/01/2017 21:37
8.23147E+17 22/01/2017 07:35

[hydrate tweets using twarc]

The susual caveats apply. Numbers must be taken with a pinch of salt: the Twitter Search API is not a complete index of all Tweets, but instead an index of recent Tweets– my archive has collected Tweets every hour, which means, for instance, that Tweets that are promptly deleted in between collections do not get archived.

I have attempted refining the dataset, but duplicated Tweets might have stubbornly survived, which in turn logically would have affected the counts. However, in spite of these limitations, the data is indicative and potentially useful and/or interesting as documentation of current and recent historical events. For what it’s worth.

We’ve lived with this user’s tweets daily, and we are very much aware of the kind of discourse developed through the constant, reliably exasperating tweetage. So these basic numbers are most likely not to tell you anything you weren’t aware of already. A simile occurs to me: we are all aware of the daily, accumulative effects of stress, or, say, ageing, but sometimes it is only until we compare snapshots that we realise the true extent of its effects.

Marked for Deprecation: Push More, Read Less, and the New Twitter

[This post ontains embedded Tweets. Some browsers might display them as blank spaces].

This post is composed of 996 words.


Even if you are not on Twitter you will know by now that the character limit for Tweets has gone up from 140 characters to 280 characters. You can read a post about it from Twitter Product Manager Aliza Rosen here.

The post is enlightening. I found this paragraph both funny and sad:

Historically, 9% of Tweets in English hit the character limit. This reflects the challenge of fitting a thought into a Tweet, often resulting in lots of time spent editing and even at times abandoning Tweets before sending. With the expanded character count, this problem was massively reduced – that number dropped to only 1% of Tweets running up against the limit. Since we saw Tweets hit the character limit less often, we believe people spent less time editing their Tweets in the composer. This shows that more space makes it easier for people to fit thoughts in a Tweet, so they could say what they want to say, and send Tweets faster than before (Rosen, 7 November 2017).

The logic is, to me, astounding: for Twitter, expanding the character limit was a way of making tweeting ‘easier’, as they considered that if people could write more it would make writing tweets faster and easier. Why? Because less editing would be involved.

In practical terms I disagree with Rosen and I don’t think this change will make ‘tweeting easier’. Not if by ‘tweeting’ we also understand the experience of reading Tweets. Under the heading “Keeping Twitter’s Brevity” in her post, Rosen writes:

We – and many of you – were concerned that timelines may fill up with 280 character Tweets, and people with the new limit would always use up the whole space. But that didn’t happen. Only 5% of Tweets sent were longer than 140 characters and only 2% were over 190 characters. As a result, your timeline reading experience should not substantially change, you’ll still see about the same amount of Tweets in your timeline. For reference, in the timeline, Tweets with an image or poll usually take up more space than a 190 character Tweet (Rosen, 7 November 2017).

I am willing to believe that at the volume of their Tweet sample during the testing period they only saw a 2% of Tweets over 190 characters. However, each user’s timeline will be different, and since not everyone is on Twitter all the time, the experience will also vary depending on the time one is on Twitter. Perhaps it is because it was the first day of general release, but my Timeline, in my perception, was noticeably transformed.

On the Web Client, it really looked like Tumblr. The issue goes beyond what it looks like as it involves as well what is being said– if ‘editing’ is considering too much effort and being able to type more is considered ‘easier’, do we really think the quality of the content (and content is experience too) will improve? What about the time a user is expected to ‘parse’ their timeline? Because wider lengths, more text, more space take more time to scan, to skim, to parse, to read, to engage with.

Interestingly, Twitter has not only extended a Tweet’s length to 280 characters, it has also changed the way it calculates it and displays it to the user. Where the user had a useful word count, now we see a circle visualising progress as we write. It does not give us an absolute count.

Until fairly recently, the length of a Tweet was measured by the number of codepoints in the NFC normalised version of the text. This was interesting to us interested in the multilingual Web for many reasons (read this). As Twitter explains in their twitter-text Parser documentation, ‘”max length” is no longer defined, and instead twitter-text uses a weighted scale specified by the Unicode code point ranges.’

This means that what we used to call in everyday parlance ‘the length of a Tweet’, meaning its word count, is now not an absolute measure but a weighting estimated by an algorithm.

Twitter is nearly obsessive-compulsive in the detail they provide on their Display Requirements. I am too busy and I haven’t had the time to look further in their Developer documentation to see if there’s a mention anywhere if any third-party apps could play with

  • weightedLength
  • permillage
  • isValid
  • displayTextRange
  • validDisplayTextRange

in order to display only Tweets with a weighted lenght of less than 140 characters, as suggested by Janet Gunter yesterday:

My suspicion is that Twitter would not be too happy considering how retentive they are about how their content should be displayed. However, such app, as suggested by Gunter, would definitely respond to what is many a keen Tweeter’s user experience.

Ultimately what interests me and frustrates me in equal measure is what this particular development (amongst others!) does tell us about the mutual influence of technology on culture/human behaviour/politics and of technology as politics. The ‘Tweeting made easier’ rationale pushed by Twitter’s product developers indicates their understanding of ‘tweeting’ is that of posting content, i.e. pushing, broadcasting. According to their data, most users will find it easier to type more- what about those reading Tweets? We shouldn’t worry because not that many will tweet beyond 190 characters, they say. It does not make real sense.

What makes sense is how this change fits within a culture of no accountability, where the old-guard of media broadcasting and multinational corporations are the loudest voices (i.e. DJT). My guess is that this will force even more veteran Twitter users to behave completely different on Twitter if not leave the service at all. We will be pushed out as we won’t have the time nor the patience for cluttered timelines full of unnecessary extra detail. Instead of engagement, it is likely to promote more disengagement. Will we still call it microblogging?








[Who cares anyway? Everything is tl;dr now. My voice is one amongst millions- who has the time, the ‘attention’ to read?]

“Access/Accès”: #DH2017, Montreal, 8-11 August 2017 Tweetage Volume Charts

Screen Shot 2017-08-08 at 12.03.36

#DH2017 starts today in Montreal.  The theme is “Access/Accès”. Details in the hyperlink. I wish I were there!

I am sure the tweetage will exceed the limits of my poor Google spreadsheet, but as it’s become kind of customary I am attempting to collect as many tweets with the conference hashtag as possible.

Using Martin Hawksey’s TAGS, here’s what the archive looks like as of 6:35:05 AM Montreal time of the first official day (8 August 2017):

Archive for #DH2017, Top Tweeters and 3 day activity, 6:35:05 of day one Montreal time

As of 9 August 2017, 6:11:33 AM Montreal time

Screen Shot 2017-08-09 at 11.19.25

As of 10 August 2017, 6:07:45 AM Montreal time

Screen Shot 2017-08-10 at 11.13.54

As of 11 August 2017, 7:12:46 AM Montreal time

Screen Shot 2017-08-11 at 12.30.08

As of 12 August 2017, 03:11:57 AM Montreal time. (I would have liked to take this screenshot later but I would not be online at that time. Considering the conference had finished by then it will do),

Screen Shot 2017-08-12 at 08.44.15

As of 13 August 2017, 05:50:54 AM Montreal time

Screen Shot 2017-08-13 at 11.16.34

On 9 August do note the hashtag went nuclear being spammed, particularly with  annoying ‘trending topics’ tweets, so data could do with some refining. However it does not look, at a quick glance, that spamming was serious. With more time further on and once I have closed the collection I could take a closer look and give an indication of the extent of the spamming. In any case please note as always the counts I am presenting are merely indicative, numbers are not meant to be taken at face value and no inherent quality or value judgements should be inferred from the volumes reported.

As I often state the data presented is the result of the collection methods employed, different methods are likely to present different results.

Note that this time only tweets from users with at least 10 followers are being collected. For the purpose of the archive, retweets count as tweets (this means not every tweet contains ‘original’ content).

It has been assumed that those scholars or scholarly organisations tweeting publicly from public accounts at very high volumes from an international conference do expect to get noticed by the international community for for their tweetage with the hashtag and therefore are giving implicit consent to get noted by said community for scholarly purposes; if anyone opposes to their username appearing in one of the ‘Top Tweeters’ bar charts above please let me know and I can anonymise their username retrospectively if that helps.

This is the first year I manage to archive a more or less complete set. On the one hand it helps that TAGS has improved, that I was able to be collecting and monitoring the collection in real time, and that I set the limit of a minumum of 10 followers for accounts to be collected. It also helped I did not start collecting to far back in advance as I sometimes have done.

I will be depositing a dataset of Tweet ID’s and timestamps, which is the source data for the charts embedded here, next week.

Speaking of “Access/Accès”, here’s a recent post I wrote about access and license types in a set of articles from the Journal of Digital Scholarship in the Humanities. In case you missed it (you probably did), it might be of interest given this year’s theme.



#rfringe17: Top 230 Terms in Tweetage





tl; dr

Repository Fringe is a gathering for repository managers and others interested in research data repositories and publication repositories.

I collected an archive of #rfringe17, containing 1118 Tweet IDs. I then analysed the text in the tweets with Voyant Tools to identify most frequent terms and manually refined the results to 230 terms.

I collected an archive of #rfringe17 tweets using TAGS. The key stats from the archive:

Number of Tweets in Archive 1,118
Number of usernames in Archive 215
First Tweet Collected 26/07/2017 14:58:12
Last Tweet Collected 05/08/2017 08:00:06


Repository Fringe is a gathering for repository managers and others interested in research data repositories and publication repositories. Participation is a key element – the event is designed to encourage all attendees to share their repository experiences and expertise.

2017 marks the 10th Repo Fringe where we will be celebrating progress we have made over the last 10 years to share content beyond borders and debating future trends and challenges.

It took place in Edinburgh,  3 – 4 August 2017.

If you are not new to this blog you will then guess that I could not resist running the text of the tweets collected through Voyant Tools to obtain the term counts in the corpus with their Terms tool. As usual I applied the English stop words filter which I customised to include Twitter-specific terms (such as https,, etc.) and the list of usernames.

I then manually refined the resulting data to remove smileys and any remaining usernames (some might have survived as it’s hard to disambiguate sometimes normal terms from usernames). I limited the results to 230 top terms.

Do take the counts with a pinch of salt as I did not clean the export from TAGS so Tweet duplicates and perhaps even some spam (who knows) might have remained.

Term Count
research 109
open 106
data 104
wikidata 75
oa 72
openscience 66
repository 63
repofringe 56
repositories 53
libraries 51
openresleeds 49
copyright 46
just 43
science 42
good 41
impact 41
thanks 41
day 39
access 38
poster 36
work 35
openaccess 34
talk 34
edinburgh 30
today 30
great 29
ucl 29
sherpa 28
read 27
want 27
event 26
project 26
really 26
time 26
cool 25
fringe 25
policy 24
metadata 23
publishers 23
publishing 23
says 23
colleague 22
policies 22
wikipedia 22
workflow 22
guide 21
millar 21
useful 21
comprehensive 20
content 20
fascinating 20
interesting 20
liveblogs 20
rdm 20
institutional 19
issue 19
it’s 19
liveblog 19
look 19
new 19
think 19
workshop 19
check 18
citizen 18
events 18
group 18
ip 18
management 18
need 18
outputs 18
presentation 18
rescue 18
session 18
trump 18
casrai 17
cycle 17
excellent 17
journal 17
lots 17
promotion 17
query 17
resource 17
uk 17
best 16
future 16
press 16
stuff 16
gallery 15
i’m 15
key 15
ref 15
showing 15
successful 15
support 15
thank 15
working 15
art 14
come 14
core 14
fun 14
miss 14
nice 14
process 14
provide 14
reminding 14
university 14
using 14
way 14
add 13
beautiful 13
demo 13
deposit 13
eprints 13
forward 13
funders 13
importance 13
keynote 13
looking 13
paper 13
phd 13
researchers 13
vote 13
e.g 12
era 12
especially 12
feedback 12
generation 12
got 12
let 12
needed 12
observation 12
recent 12
report 12
review 12
showcase 12
site2cite 12
star 12
theses 12
try 12
we’re 12
weirdness 12
advises 11
attendees 11
boat 11
broken 11
coar 11
control 11
criteria 11
exposure 11
global 11
institutions 11
like 11
model 11
prof 11
scholarly 11
survey 11
trek 11
use 11
years 11
articles 10
award 10
case 10
excited 10
exposing 10
figshare 10
gifts 10
hear 10
highlighted 10
important 10
initiative 10
integrating 10
introducing 10
live 10
opening 10
platform 10
ref2021 10
spend 10
vision 10
week 10
won 10
workshops 10
altmetric 9
colleagues 9
current 9
discussion 9
evidence 9
field 9
getting 9
i’ll 9
infrastructure 9
inspiring 9
library 9
link 9
list 9
local 9
long 9
make 9
meeting 9
peer 9
post 9
practice 9
preservation 9
problem 9
role 9
service 9
shoutout 9
shows 9
slides 9
sure 9
team 9
thought 9
touch 9
tweets 9
works 9
added 8
based 8
believe 8
better 8
change 8
conference 8
contributing 8
days 8
european 8
example 8
far 8
favourite 8
fully 8
here’s 8
image 8
included 8

Logically sharing this data as an HTML table is not the best way of doing it but hey. I have the source data if anyone is interested; Twitter developer guidelines allow the sharing of tweet IDs. In this case the source data is composed by the dataset of 1118 tweet ID strings (id_str).

Maybe I missed it but in the list above I could not find ‘bepress’ or ‘elsevier‘, by the way…

On UK Labour and Conservatives Tweet Sources


I‘ve been tracking the Twitter accounts of the UK Labour, Conservative, Green, and LibDem parties as we approach June the 8th (General Election). I am interested in what they are saying on Twitter through their official Twitter accounts, how they are saying it, how often and what apps they choose to do so.

Unfortunately there are still some duplicates in my Twitter data collection, but I can at least share at this point the sources used to tweet from the UK Labour and Conservatives Twitter accounts, as well as some indicative numbers, bearing in mind they may vary slightly, for tweets per source in a sample of 500 Tweets per account from 12/05/2017 to 01/06/2017 so far.


Source Count
MediaStudio 279
SproutSocial 106
TweetDeck 55
Twibbon 1
Twitter for Android 3
Twitter for iPhone 8
Twitter Web Client 48


Source Count
MediaStudio 25
TweetDeck 222
Twitter for iPhone 73
Twitter Web Client 180

Even bearing in my mind the sample of 500 tweets from each account may still contain some duplicates, the list of sources alone provides objective indication of each account’s social media management tool preferences. Something that stands out is that in comparison to, say, the realdonaldtrump account, none of these tweets were posted from Twitter Ads.

The source list indicates to me that UK Labour has attempted a more professional social media management strategy, with a reduced number of tweets from Android, iPhone and the Web Client, whereas the Conservatives have a majority of tweets coming from free & anyone-can-use apps, with no shortage of tweets coming from an iPhone (but no Android at all).

This short update is part of an ongoing lunchtime pet project for which I wish I had more time, but hey.  I also have data from the other political parties, but no time right now. Anyway, for what it’s worth, I thought I’d share.

N.B. Dear Guardian Data, in case you like what you see here and you ‘borrow’ the idea or any data… please kindly attribute and link back. It’s only polite to do so. Thank you!

Exeunt Android; Enter Ads: An Update on the Sources of Presidential Tweetage


A quick update as something I consider interesting has emerged from the ongoing archiving of the, er, current ‘Trumpian’ tweetage (see a previous post here). In case you do follow this blog you may be aware I’ve been keeping an eye on the ‘source’ of the Tweets, which is information (a metadata field) pertaining to each published Tweet which is made publicly visible by Twitter to anyone through certain applications like TweetDeck and directly through Twitter’s API (for Twitter’s ‘Field Guide’, see this).

Given the diversity of sources detected on the Tweets from the account under scrutiny in the past, hypotheses have been proposed suggesting correlations between type of content and source (application used to post Tweets); others have suggested that it is also indication of different people behind the account (though as we have said previously it is also possible that the same person tweets from different devices and applications).

Anyway, here’s some recent new insights emerging from the data since the last post:

  • Since Inauguration Day (20 January 2017), the last Tweet coming from Twitter for Android so far was timestamped 25/03/2017 10:41 (AM; DC time).  No Tweets from Android have been posted since that Tweet until the time of writing of this.
  • The last Tweet coming from the Twitter Web Client so far was timestamped 25/01/2017  19:03:33. No more Tweets with the Web Client as source have been posted (or collected by my archive) since then.
  • Since Inauguration Day, the Tweet timestamped 31/03/2017  14:30:38 was the first one to come from Twitter Ads. Since then 21 Tweets have been posted from Twitter Ads, the last one so far timestamped 17/05/2017  16:36:02.
  • During April and May 2017 Tweets have only come from Twitter for iPhone or Twitter Ads. The account in question has tweeted every sincle day throughout May until today 18 May 2017, a total of 90 Tweets so far (including a duplicated one in which a typo was corrected). Below a breakdown per source:


from_user Month Source Count
realDonaldTrump May Twitter for iPhone 81
Twitter Ads 9


As a keen Twitter user I personally find it interesting Twitter for Android has stopped being used by the account in question and that Twitter Ads has been used recently (instead?) in alternation with the Tweets from iPhone. Eyeballing the dataset quickly appears to indicate there might be a potential correlation between Tweets with links and official announcements (rather than statements/opinions) and Twitter Ads, but that requires looking into more closely and I will have to leave that for another time.

*Public note to self: I need to get rid of this habit of capitalising ‘Tweets’ as a noun… it becomes annoying.

Android vs iPhone: Source Counts and Trends in a Bit More than a Year’s Worth of Trumpian Tweetage

Last month I took a quick look at a month’s worth of Trumpian tweetage (user ID 25073877)  using text analysis. Using a similar methodology I have now prepared and shared a CSV file containing Tweet IDs and other metadata of 3,805 Tweets  from user ID 25073877 posted publicly between Thursday February 25  2016 16:35:12 +0000  to Monday April 03 2017 12:51:01 +0000. I deposited the file on figshare, including notes on motivation and methodology, here:

3805 Tweet IDs from User 25073877 [Thu Feb 25 16:35:12 +0000 2016 to Mon Apr 03 12:51:01 +0000 2017].


The dataset allows us count the sources for each Tweet (i.e. the application used to publish each Tweet according to the data provided by the Twitter Search API). The resulting counts are:

Source Tweet Count
Twitter for iPhone 1816
Twitter for Android 1672
Twitter Web Client 287
Twitter for iPad 22
Twitter Ads 3
Instagram 2
Media Studio 2
Periscope 1

As we have seen in previous posts, the account has alternated between iPhone and Android since the Inauguration. I wanted to look at relative trends throughout the dataset. Having prepared the main dataset I performed the text analysis of a document comprising the source listing arranged in chronological order according to the date and time of Tweet publication, and the listing corresponds to Tweets published between 25 February 2016 and Monday 3 April 2017. Using the Trends tool in Voyant, I divided the document in 25 segments, with the intention to roughly represent each monthly period covered in the listing and highlight source relative frequency trends in the period covered per segment.

The Trends tool shows a line graph depicting the distribution of a word’s occurrence across a corpus or document; in this case each word represents the source of a Tweet in the document. Each line in the graph is coloured according to the word it represents, at the top of the graph a legend displays which words are associated with which colours. I only included the most-used sources, leaving iPad there as reference.

The resulting graph looks like this:

Line Graph of Relative Frequencies of four most used sources by realdonaldtrump visualised in 25 segments of a document including Twe3,805 Tweets  from user ID 25073877 posted publicly between Thursday February 25  2016 16:35:12 +0000  to Monday April 03 2017 12:51:01 +0000. Data collected and analysed by Ernesto Priego. CC-BY. Chart made with Trends, Voyant Tools by Stéfan Sinclair & Geoffrey Rockwell (CC-BY 2017).
Line graph of the relative frequencies of the four most used sources visualised in 25 segments of a document including 3,805 Tweets from user ID 25073877 dated between Thursday February 25 2016 16:35:12 +0000 and Monday April 03 2017 12:51:01 +0000. Data collected and analysed by Ernesto Priego. CC-BY. Chart made with Trends, Voyant Tools by Stéfan Sinclair & Geoffrey Rockwell (CC-BY 2017).

I enjoyed this article by Christopher Ingraham (Washington Post Weblog, 3 April 2017), and I envy the access to the whole Trupian tweetage dataset, that would be essential to attempt to reproduce the analysis presented. The piece focuses on the use of exclamation marks (something I took an initial look at on my 6 February 2017 post), but it would be useful to take a closer look at any potential significant correlations between use of language in specific Tweets and the sources used to post those Tweets.

The article also has an embedded video titled ‘When it’s actually Trump tweeting, it’s way angrier’, repeating claims that there is a clear difference between those Tweets the account in question published from an iPhone and those published from an Android. I briefly referred to this issue on my 15 March 2017 post already, and I have not seen evidence yet that it is a staffer who actually posts from Twitter for iPhone from the account. I may be completely wrong, but I am still not convinced there is data-backed evidence to say for certain that Tweets from different sources are always tweeted by two or more different people, or that the differences in language per source are predictable and reliably attributable to a single specific person (the same people can after all tweet from the same account using different devices and applications, and indeed potentially. use different language/discourse/tone).  Anecdotal, I know, but I have noticed that sometimes my tweetage from the Android mobile app is different from my tweetage from TweetDeck on my Mac, but no regular patterns can be inferred there.

I do not necessarily doubt there is more than one person using the account, nor that the language used may vary significantly depending on the Tweets’ source.  What I’d like to see however is more robust studies demonstrating and highlighting correlations between language use in Tweets- texts and Tweets’ sources from the account in question taking into consideration that the same users can own different devices and use different language strategies depending on a series of contextual variables. Access to the source data of said studies should be consider essential for any assessment of any results or conclusions provided. Limitations and oppostion to more open sharing of Twitter data for research reproducibility are just one hurdle on the way for more scholarship in this area.