Tweeting in an Age of Overwhelming Information Overload and Increased Workloads


Twitter is no longer niche as it once was. How has my thinking changed in relation to Twitter use by academics? In this post I bullet-point some ideas that can be taken if desired as tips or strategies by those academic colleagues who are new to Twitter. You can scroll down and skim if you want.

 [PhD Comics, August 21 2014]
[PhD Comics, August 21 2014]
Motivation for this post

I‘ve been asked to become a “social media champion” for my school. I think it’s cool there’s an interest in embracing social media more widely, organically and effectively.

The past

Things have changed significantly since Sarah and I started touring the UK in 2011 giving social media workshops for academics with Networked Researcher (RIP), and, indeed my own personal and professional views on Twitter have evolved along the way- what we call “social media” is no longer a niche, defined region of the Internet and the Web, but as mainstream as it can possibly get, reaching a relevance and centrality in today’s information and technological sphere that is yet to be surpassed.

I wrote dozens of blog posts for a variety of international platforms (some long extinct) in the distant past (2011-2013) on academic Twitter use, including the following pieces that got published by the Guardian Higher Education Network.

If you click on the links and read the articles, please do take them with a grain of salt and historical perspective as things have evolved significantly since. I would write them differently today (also; headlines were the Guardian’s, not mine).

This tour down memory lane has also reminded of this blog post that I wrote for Altmetric in 2013 on “Strategies to Get your Research Mentioned Online“. It needs rewriting now.

(By the way, remember this LSE Impact Blog November 2013 post by Alan Cann on academic blogging going mainstream?)

Sharing these links here again as context and in case it’s of historical interest.

Those were the days. We were young. We thought everything was possible. (It still is, albeit in a completely different way!).

The present moment

How to think of academic tweeting in an age of overwhelming information overload and increased workloads? How has my thinking changed in relation to Twitter use by academics?

I cannot go in great detail here, but I thought I’d try to bullet-point some ideas that can be taken if desired as tips or strategies by those academic colleagues who are new to Twitter.

  • Twitter needs to be taken seriously. In spite of its ill-repute, it is an influential public platform for the dissemination of information. Precisely what information we disseminate on it is each user’s responsibility.
  • No one uses Twitter in the exact same way. Twitter is always-already experienced differently by each and every user. There are therefore no straight-forward rules. Most users learn along the way. An experienced Twitter user is more likely to use Twitter better than an inexperienced Twitter user who has read all the social media policies, terms and conditions and ethical guidelines available. An experienced Twitter user who has read all those documents will be an even better user, but that’s a personal view.
  • The default Twitter web client and the Twitter mobile app are not the right tools for busy people who are expected to author “content”. If you are busy, are already doubtful Twitter can deliver quality information, and feel being asked to tweet as an annoying imposition or a waste of time, there are no worse tools to start doing it than those.
  • For new users it may look daunting, but I totally recommended using TweetDeck to those academics being asked to manage a work account and/or wishing to be more effective locating and monitoring relevant accounts and content. TweetDeck is a free web-based application owned by Twitter. There is no mobile version. To use TweetDeck you will need a Twitter account. How to use TweetDeck guidance here.
  • In general, I think tweeting from your mobile phone for work is a bad idea- unless there’s no other choice, you are at a conference without space to place or plug your laptop, etc.
  • Before you start tweeting for work it helps to have clarity of purpose. Do not think of Twitter as an instant messaging service; think of it as a public publishing platform. What is it you need to communciate? To whom? Why? When? How?
  • Everyone and their dog is on Twitter. (And yet… so many aren’t so far). How will you become visible? Before joining Twitter, make a list of people and organisations you want to be visible to. Think of it as your Twitter contact list or address book.
  • Search for your stakeholders on Twitter via TweetDeck and create a list with a descriptive name. The more specific the list the better. You can have different lists. On Tweetdeck, you can get a column per list, where you will only see, if desired, tweets by those accounts you have added in your list. Think of it as an email folder for which you have created rules.
  • You don’t have to have a column for your timeline, where you would see everyone you follow. These days, to use your main Twitter timeline as your main way of monitoring Twitter is frankly inefficient, also because regardless of what your settings are the algorithm will prioritise some content over others and it will not be first posted first. We need to try to beat the relevance algorithm and curate our own dedicated timelines.
  • If your goal is to use Twitter to communicate the work you or your organisation does, you can schedule tweets in advance on TweetDeck. This means you don’t need to be on Twitter all the time. You don’t have to tweet in real time.
  • If you blog, make sure you add a social media sharing widget so that your posts get tweeted automatically when you publish. Make sure your site’s readers can share your posts on social media easily- customise the sharing widgets so the share text generated includes a mention of your username (e.g. “[Post title] [URL] via @ernestopriego“).
  • Systematically share what you publish or deposit in your open access institutional or data repository. If you don’t share your own work, who will?
  • Twitter is social, so it won’t work well if you only broadcast your own content. Even if your intention is to mainly broadcast what you or your organisation does, having columns of your stakeholders will allow you to check those columns at an appropriate time and see fewer tweets (more manageable) but potentially they will be more relevant because you have more carefully/strictly curated the sources in that timeline in advance.
  • Have a column for your notifications, and acknowledge positive feedback whenever you can. Often there’s no need to reply, ‘liking’ a reply suffices these days a an acknowledgement and it can go a long way. You are busy and others know it because they are busy too, but still appreciate a nudge of appreciation.
  • No user is an island. Create continents and archipielagos, build bridges.
  • Retweet what you find interesting or useful, support causes or themes you advocate, but avoid amplifying discord or bad vibes (those are, I’m aware, relative).
  • Include the disclaimers “RTs and likes are not endorsements” in your bio, to be safe. Avoid/do not RT tweets you wouldn’t have tweeted originally yourself (ask yourself: would I have published this for the world to see? By retweeting it, you are doing just that), including those tweets with links to content you have not checked before. Check and read links before retweeting/tweeting them.

In a way these same strategies have already been in practice for a while. They are not new. If anything, the pressing realities of employment in a digital age mean we need to be more drastically pragmatic and strategic.

I realise there’s way more I have to say about this, but I have surpassed the 1000 word count so I will have to leave it there. Thanks for reading, if you did.

“Access/Accès”: #DH2017, Montreal, 8-11 August 2017 Tweetage Volume Charts

Screen Shot 2017-08-08 at 12.03.36

#DH2017 starts today in Montreal.  The theme is “Access/Accès”. Details in the hyperlink. I wish I were there!

I am sure the tweetage will exceed the limits of my poor Google spreadsheet, but as it’s become kind of customary I am attempting to collect as many tweets with the conference hashtag as possible.

Using Martin Hawksey’s TAGS, here’s what the archive looks like as of 6:35:05 AM Montreal time of the first official day (8 August 2017):

Archive for #DH2017, Top Tweeters and 3 day activity, 6:35:05 of day one Montreal time

As of 9 August 2017, 6:11:33 AM Montreal time

Screen Shot 2017-08-09 at 11.19.25

As of 10 August 2017, 6:07:45 AM Montreal time

Screen Shot 2017-08-10 at 11.13.54

As of 11 August 2017, 7:12:46 AM Montreal time

Screen Shot 2017-08-11 at 12.30.08

As of 12 August 2017, 03:11:57 AM Montreal time. (I would have liked to take this screenshot later but I would not be online at that time. Considering the conference had finished by then it will do),

Screen Shot 2017-08-12 at 08.44.15

As of 13 August 2017, 05:50:54 AM Montreal time

Screen Shot 2017-08-13 at 11.16.34

On 9 August do note the hashtag went nuclear being spammed, particularly with  annoying ‘trending topics’ tweets, so data could do with some refining. However it does not look, at a quick glance, that spamming was serious. With more time further on and once I have closed the collection I could take a closer look and give an indication of the extent of the spamming. In any case please note as always the counts I am presenting are merely indicative, numbers are not meant to be taken at face value and no inherent quality or value judgements should be inferred from the volumes reported.

As I often state the data presented is the result of the collection methods employed, different methods are likely to present different results.

Note that this time only tweets from users with at least 10 followers are being collected. For the purpose of the archive, retweets count as tweets (this means not every tweet contains ‘original’ content).

It has been assumed that those scholars or scholarly organisations tweeting publicly from public accounts at very high volumes from an international conference do expect to get noticed by the international community for for their tweetage with the hashtag and therefore are giving implicit consent to get noted by said community for scholarly purposes; if anyone opposes to their username appearing in one of the ‘Top Tweeters’ bar charts above please let me know and I can anonymise their username retrospectively if that helps.

This is the first year I manage to archive a more or less complete set. On the one hand it helps that TAGS has improved, that I was able to be collecting and monitoring the collection in real time, and that I set the limit of a minumum of 10 followers for accounts to be collected. It also helped I did not start collecting to far back in advance as I sometimes have done.

I will be depositing a dataset of Tweet ID’s and timestamps, which is the source data for the charts embedded here, next week.

Speaking of “Access/Accès”, here’s a recent post I wrote about access and license types in a set of articles from the Journal of Digital Scholarship in the Humanities. In case you missed it (you probably did), it might be of interest given this year’s theme.



On UK Labour and Conservatives Tweet Sources


I‘ve been tracking the Twitter accounts of the UK Labour, Conservative, Green, and LibDem parties as we approach June the 8th (General Election). I am interested in what they are saying on Twitter through their official Twitter accounts, how they are saying it, how often and what apps they choose to do so.

Unfortunately there are still some duplicates in my Twitter data collection, but I can at least share at this point the sources used to tweet from the UK Labour and Conservatives Twitter accounts, as well as some indicative numbers, bearing in mind they may vary slightly, for tweets per source in a sample of 500 Tweets per account from 12/05/2017 to 01/06/2017 so far.


Source Count
MediaStudio 279
SproutSocial 106
TweetDeck 55
Twibbon 1
Twitter for Android 3
Twitter for iPhone 8
Twitter Web Client 48


Source Count
MediaStudio 25
TweetDeck 222
Twitter for iPhone 73
Twitter Web Client 180

Even bearing in my mind the sample of 500 tweets from each account may still contain some duplicates, the list of sources alone provides objective indication of each account’s social media management tool preferences. Something that stands out is that in comparison to, say, the realdonaldtrump account, none of these tweets were posted from Twitter Ads.

The source list indicates to me that UK Labour has attempted a more professional social media management strategy, with a reduced number of tweets from Android, iPhone and the Web Client, whereas the Conservatives have a majority of tweets coming from free & anyone-can-use apps, with no shortage of tweets coming from an iPhone (but no Android at all).

This short update is part of an ongoing lunchtime pet project for which I wish I had more time, but hey.  I also have data from the other political parties, but no time right now. Anyway, for what it’s worth, I thought I’d share.

N.B. Dear Guardian Data, in case you like what you see here and you ‘borrow’ the idea or any data… please kindly attribute and link back. It’s only polite to do so. Thank you!

Don’t Walk Away: The Aporetics of Information in the Age of Twitter Overload

“The Oxford English Dictionary includes two forms of the word: the adjective “aporetic”, which it defines as “to be at a loss”, “impassable”, and “inclined to doubt, or to raise objections”; and the noun form “aporia”, which it defines as the “state of the aporetic” and “a perplexity or difficulty”.”

– from the Wikipedia entry for ‘Aporia’

Like you, I’ve read the news today.

One immediately wants to write something. One also feels lost for words. We used to be, as humanity, ‘lost for words’ when facing something unspeakable, because it had not been said before. There were no words for it because it exceeded the limits of our understanding, of current and previous systems of belief. That for which we had no words for was unknown and unknowable. And now, words flow. Please bear with me.

Tragically, incidents like the Manchester Arena attack are no longer ‘new’. Steadily, mostly thanks to the almost immediate global mass dissemination of information, we already have a discourse and therefore a vocabulary of reaction. Online and on print, everyone feeds from incidents ‘like this’ (language is a minefield). Organizations, communities and individuals struggle to make sense of our own being in the world by becoming present through utterances. We say/write/post, therefore we exist. There should be no doubt that many of the reactions are in good faith, as an expression of humanity. Extending one’s hand for a handshake or an embrace.

There is also, however, a negative side. It is the ongoing feeding of fear, the promotion of the terror that through loopy repetition gets ingrained in our minds. The effects are double: the terror is widely known, in detail, and impossible to ignore, changing society at its core, but the terror also gets normalised, and therefore muted. Multiplicity of sources, angles, opinions create confusion. So better to look away, focus on what keeps our lives ‘normal’. Just another day on Planet Earth. Carry on, nothing to see here. This is the effect we should try more actively to avoid, but how? As usual when I write, I am aware that this very post is contributing to the problematic phenomenon I am trying to make sense of by writing. This is why I think we have in front of us an aporia, a perplexing problem which is or seems impossible for us to crack.

The world today avoids problematic situations. The term ‘problematic’ is indeed now every sociologist’s and academic’s cliché. In the English-speaking tradition, practical solutions through practicable methods and measurable solutions are preferred to the Romance languages’ preference for the essay that by definition attempts or rehearses an approach around a problem. Essaying is ‘problematising’, but this is incredibly frustrating when there is a pressing need to just get on with things and face what cannot be avoided and requires a ‘solution‘. As soon as we use that word, however, echoes of the unspeakable come back to haunt us, sometimes consciously, sometimes not.

In times of alarm and pain, there is a responsibility in saying as much as there is a responsibility in not saying. Knowing when and how to participate online is a skill to be developed, individually, as communities, societies and cultures. I am motivated to write by the following questions/writing this has made me think of these questions:

  • When everyone with a social media account contributes to the infosphere in which we are immersed in, how do we balance the need to say, to participate in society, while being aware of how each of us may be contributing to the steady deterioration and erosion of public discourse?
  • What are the effects that our postings have on others, and can we ever fully have control over these possible effects?
  • How do we build ‘healthy’ networks of support, online and offline, without alienating others who are also at the producing-and-receiving end of the information flow?

Obviously I have no answers to these questions.

Many respectable folk have written about the ethics of storytelling and the need to actively resist the horror through art and documentation.*  This documentation will one day be the testament of our era, an immense archive of humanity’s consciousness, spoken out loud. Social media today replicates many of the bad practices of the mainstream media (in the UK, the tabloid press has a lot to answer for), and we must look into the role that the pervasive broadcasting of information has on Post Traumatic Stress Disorder. Victims and affected communities are vulnerable and in pain, and constant semi-immersive and excessive broadcasting can contribute and exacerbate the pain, as well as the social divisions that make extremism thrive.

At present, however, the way we live rarely allows us to stop and reflect, and more importantly, to listen to each other. Issues on international mainstream news that affect us all are constantly considered outside the limits of professional practice, regardless of what we may do for a living, and the pragmatism of everyday survival trumps more considered attempts to prioritise the building of relationships, a commons of solidarity and understanding (and also respectful disagreement) seeking to build and maintain the public good. We mute accounts tweeting and retweeting the hashtag or event du jour. We lament not more young people even register to vote, but we have embraced politics (and the social consequences of politics) as a form of entertainment. At most, we have allowed most political ‘engagement’ to become a version of Gogglebox. In our everyday lives, we walk away from all the chatter to remain sane and to focus. We cannot deal with so much and get back to our work, and the clamour ‘outside’ overshadows the individual tragedies and issues, becoming pure noise and fury. In the age in which methods of production of information have been made widely available to the masses, actual resistance, we know well, has been almost completely deactivated.

And so we ‘carry on’, we tell ourselves, but the problems remain, and the need to share, to make sense of it all still somehow remains as well. Whether it is murdered journalists in Mexico or teenagers in a pop concert in Manchester, the terror is real. People are suffering right now. Attacks, victims are not mere metrics, nor ‘content’, nor objects of study. Incidents like the Manchester Arena attack are no longer ‘new’, we said, but each death and the pain of each parent, relative, friend, fellow citizen, human being is absolutely unique. The tragedy is never repeatable, it is absolute uniqueness, and this is what makes it so utterly painful, shocking, and perplexing.

As the crowds pour their thoughts and pain online, this is paradoxically a crucial moment to reconsider our understanding of the meaning of ‘engagement’. As algorithmic relevance defines concrete realities and the attention economy becomes so fierce that most people are seen but not heard, the temptation is to back off and walk away in silence. This seems to me to be exactly what those seeking to terrorise want. For us to hide, to close up, to not go out, to not be together. For us to forget who we are and what makes us human.

As I worked on this interview, and once it was published as I shared it, I was visited by fears that it did not matter, that it made no difference. Friends ironically, jokingly, said they would share it with friends who couldn’t care less. Friends and family directly affected by the situation documented in the article reacted to it with distance. I could literally touch the fear. I was aware that in my ability to translate it into English I was already exercising a privilege not altogether disconnected from the inequality that is one of the causes of the horrors I was trying to document. I was also aware of my distance from the events, even if I feel very close to them. The alternative, not to do anything, not to at least try to contribute to avoiding the complicitous silence denounced by the interviewee did not seem to me like an option. I had to face the contradictions.

There is the feeling that there is already enough information out there, and that therefore we don’t need anyone else’s contribution. So much information is perceived as an ‘excess’, and its effect is to alienate us and disempower us. The point is precisely to make us feel like nothing we can do really matters – and if it matters it does for different reasons to the message conveyed- because it brings some kind of capital to the author, or because it provides authors with a sense of identity, of singularity or importance in a world where it is harder and harder to stand out. Black Mirror stuff.

This is an important part of this aporetic nature of being online and being a citizen: how to balance the rights of individual expression with the need to consider the effects it has on others given the current infrastructures for communication and the discourse they enable, encourage and actively produce. Terrorism and mass social media have something in common: one of their side effects is to make individuals and communities feel like there’s nothing they can do to make a difference, that no resistance is likely to make a difference, that no awareness or documentation of the terror will stop the pain.

I said I felt lost for words, and now I’ve written more than 1500 words. The irony is painful and awareness has its limitations.

To be honest I don’t know how to end this post. I just want to resist repressing the grief and the concern. I want to think there are still ways we can share our feelings, report on what we believe deserves to be known, and be active part of our communities.

The logic of Terrorism and the commodification of all human communication, of human pain, packaged as ‘content’,  cannot triumph, even if our humble means to resist it are always-already the same tools used to advance it. It’s perhaps a question of remembering the precious singularity, the absolute uniqueness of each human being in this world.


*Not just people like Paul Ricoeur and Dominick LaCapra, just look at this 2015 conference programme for more recent work.

Exeunt Android; Enter Ads: An Update on the Sources of Presidential Tweetage


A quick update as something I consider interesting has emerged from the ongoing archiving of the, er, current ‘Trumpian’ tweetage (see a previous post here). In case you do follow this blog you may be aware I’ve been keeping an eye on the ‘source’ of the Tweets, which is information (a metadata field) pertaining to each published Tweet which is made publicly visible by Twitter to anyone through certain applications like TweetDeck and directly through Twitter’s API (for Twitter’s ‘Field Guide’, see this).

Given the diversity of sources detected on the Tweets from the account under scrutiny in the past, hypotheses have been proposed suggesting correlations between type of content and source (application used to post Tweets); others have suggested that it is also indication of different people behind the account (though as we have said previously it is also possible that the same person tweets from different devices and applications).

Anyway, here’s some recent new insights emerging from the data since the last post:

  • Since Inauguration Day (20 January 2017), the last Tweet coming from Twitter for Android so far was timestamped 25/03/2017 10:41 (AM; DC time).  No Tweets from Android have been posted since that Tweet until the time of writing of this.
  • The last Tweet coming from the Twitter Web Client so far was timestamped 25/01/2017  19:03:33. No more Tweets with the Web Client as source have been posted (or collected by my archive) since then.
  • Since Inauguration Day, the Tweet timestamped 31/03/2017  14:30:38 was the first one to come from Twitter Ads. Since then 21 Tweets have been posted from Twitter Ads, the last one so far timestamped 17/05/2017  16:36:02.
  • During April and May 2017 Tweets have only come from Twitter for iPhone or Twitter Ads. The account in question has tweeted every sincle day throughout May until today 18 May 2017, a total of 90 Tweets so far (including a duplicated one in which a typo was corrected). Below a breakdown per source:


from_user Month Source Count
realDonaldTrump May Twitter for iPhone 81
Twitter Ads 9


As a keen Twitter user I personally find it interesting Twitter for Android has stopped being used by the account in question and that Twitter Ads has been used recently (instead?) in alternation with the Tweets from iPhone. Eyeballing the dataset quickly appears to indicate there might be a potential correlation between Tweets with links and official announcements (rather than statements/opinions) and Twitter Ads, but that requires looking into more closely and I will have to leave that for another time.

*Public note to self: I need to get rid of this habit of capitalising ‘Tweets’ as a noun… it becomes annoying.

Android vs iPhone: Source Counts and Trends in a Bit More than a Year’s Worth of Trumpian Tweetage

Last month I took a quick look at a month’s worth of Trumpian tweetage (user ID 25073877)  using text analysis. Using a similar methodology I have now prepared and shared a CSV file containing Tweet IDs and other metadata of 3,805 Tweets  from user ID 25073877 posted publicly between Thursday February 25  2016 16:35:12 +0000  to Monday April 03 2017 12:51:01 +0000. I deposited the file on figshare, including notes on motivation and methodology, here:

3805 Tweet IDs from User 25073877 [Thu Feb 25 16:35:12 +0000 2016 to Mon Apr 03 12:51:01 +0000 2017].


The dataset allows us count the sources for each Tweet (i.e. the application used to publish each Tweet according to the data provided by the Twitter Search API). The resulting counts are:

Source Tweet Count
Twitter for iPhone 1816
Twitter for Android 1672
Twitter Web Client 287
Twitter for iPad 22
Twitter Ads 3
Instagram 2
Media Studio 2
Periscope 1

As we have seen in previous posts, the account has alternated between iPhone and Android since the Inauguration. I wanted to look at relative trends throughout the dataset. Having prepared the main dataset I performed the text analysis of a document comprising the source listing arranged in chronological order according to the date and time of Tweet publication, and the listing corresponds to Tweets published between 25 February 2016 and Monday 3 April 2017. Using the Trends tool in Voyant, I divided the document in 25 segments, with the intention to roughly represent each monthly period covered in the listing and highlight source relative frequency trends in the period covered per segment.

The Trends tool shows a line graph depicting the distribution of a word’s occurrence across a corpus or document; in this case each word represents the source of a Tweet in the document. Each line in the graph is coloured according to the word it represents, at the top of the graph a legend displays which words are associated with which colours. I only included the most-used sources, leaving iPad there as reference.

The resulting graph looks like this:

Line Graph of Relative Frequencies of four most used sources by realdonaldtrump visualised in 25 segments of a document including Twe3,805 Tweets  from user ID 25073877 posted publicly between Thursday February 25  2016 16:35:12 +0000  to Monday April 03 2017 12:51:01 +0000. Data collected and analysed by Ernesto Priego. CC-BY. Chart made with Trends, Voyant Tools by Stéfan Sinclair & Geoffrey Rockwell (CC-BY 2017).
Line graph of the relative frequencies of the four most used sources visualised in 25 segments of a document including 3,805 Tweets from user ID 25073877 dated between Thursday February 25 2016 16:35:12 +0000 and Monday April 03 2017 12:51:01 +0000. Data collected and analysed by Ernesto Priego. CC-BY. Chart made with Trends, Voyant Tools by Stéfan Sinclair & Geoffrey Rockwell (CC-BY 2017).

I enjoyed this article by Christopher Ingraham (Washington Post Weblog, 3 April 2017), and I envy the access to the whole Trupian tweetage dataset, that would be essential to attempt to reproduce the analysis presented. The piece focuses on the use of exclamation marks (something I took an initial look at on my 6 February 2017 post), but it would be useful to take a closer look at any potential significant correlations between use of language in specific Tweets and the sources used to post those Tweets.

The article also has an embedded video titled ‘When it’s actually Trump tweeting, it’s way angrier’, repeating claims that there is a clear difference between those Tweets the account in question published from an iPhone and those published from an Android. I briefly referred to this issue on my 15 March 2017 post already, and I have not seen evidence yet that it is a staffer who actually posts from Twitter for iPhone from the account. I may be completely wrong, but I am still not convinced there is data-backed evidence to say for certain that Tweets from different sources are always tweeted by two or more different people, or that the differences in language per source are predictable and reliably attributable to a single specific person (the same people can after all tweet from the same account using different devices and applications, and indeed potentially. use different language/discourse/tone).  Anecdotal, I know, but I have noticed that sometimes my tweetage from the Android mobile app is different from my tweetage from TweetDeck on my Mac, but no regular patterns can be inferred there.

I do not necessarily doubt there is more than one person using the account, nor that the language used may vary significantly depending on the Tweets’ source.  What I’d like to see however is more robust studies demonstrating and highlighting correlations between language use in Tweets- texts and Tweets’ sources from the account in question taking into consideration that the same users can own different devices and use different language strategies depending on a series of contextual variables. Access to the source data of said studies should be consider essential for any assessment of any results or conclusions provided. Limitations and oppostion to more open sharing of Twitter data for research reproducibility are just one hurdle on the way for more scholarship in this area.

Android vs iPhone: Trends in a Month’s Worth of Trumpian Tweetage

What’s in a month’s worth of presidential tweetage?

I prepared a dataset containing a total of 123 public Tweets and corresponding metadata from user_id_str 25073877 between 15 February 2017 06:40:32 and 15 March 2017  08:14:20 Eastern Time (this figure does not factor in any tweets the user may have deleted shortly after publication). Of the 123 Tweets 68 were published from Android; 55 from iPhone. The whole text of the Tweets in the dataset accounts for 2,288 words, or 12,364 characters (no spaces; including URLs).

Using the Trends tools from Voyant Tools by Stéfan Sinclair & Geoffrey Rockwell I visualised the raw frequencies of the terms ‘Android’ and ‘iPhone’ in this dataset over 30 segments (more or less corresponding to the length of the month covered in the dataset) where each timestamped Tweet, sorted in chronological order, had its corresponding source indicated.

The result looked like this:

Raw frequency of Tweets per source in 30 segments by realdonaldtrump between 15 February 2017 06:40:32 and 15 March 2017 08:14:20 Eastern Time. Total: 123 Tweets: 68 from Android; 55 from iPhone. Data collected and analysed by Ernesto Priego. CC-BY. Chart made with Trends, Voyant Tools by Stéfan Sinclair & Geoffrey Rockwell (CC 2017).
Raw frequency of Tweets per source in 30 segments by realdonaldtrump between 15 February 2017 06:40:32 and 15 March 2017 08:14:20 Eastern Time. Total: 123 Tweets: 68 from Android; 55 from iPhone. Data collected and analysed by Ernesto Priego. CC-BY. Chart made with Trends, Voyant Tools by Stéfan Sinclair & Geoffrey Rockwell (CC 2017).

The chart does indeed reflect the higher number of Tweets from Android, and it also shows how over the whole document both sources are, in spite of more frequent absences from Tweets from iPhone, present throughout. The question as usual is what does this tell us. Back in 9 August 2016 David Robinson published an insightful analysis where he concludes that “he [Trump] writes only the (angrier) Android half”. With the source data I have gathered so far it would be possible (given the time and right circumstances) to perform a content analysis of Tweets per source, in order to confirm or reject any potential corelations between types of Tweets (re: tone, function, sentiment, time of day) and source used to post them.

Eyeballing the data, specifically since Inauguration Day until the present, does not seem to provide unambiguous evidence that the Tweets are undoubtedly written by two different persons (or more). What it is factual is that the Tweets do come from different sources (see my previous post), but at the moment, like with everything else this administration has been doing, my cursory analysis has only found conflicting insights, where for example a Tweet one would perhaps have expected to have been posted from iPhone (attributable hypothetically to a potentially less inflammable aide) was in fact posted from Android, and viceversa.

I may be wrong, but at the moment I cannot see any evidence there is any kind of predictable pattern, let alone strategy, behind the alternation between Android and iPhone (the only two type of sources used to publish Tweet from the account in question in the last month). Most of the times Tweets by source type will come in sequences of four or more Tweets, but sometimes a random lone Tweet from a different source will be sandwiched in between.

More confunsigly, all of the Tweets published between 08/03/2017 18:50 and 15/03/2017  08:14:20 have only had iPhone as source, without exception. Attention to detail is required to run robust statistical and content analyses that consider complete timestamps and further code the Tweet text and time data into more discrete categories, attempting a high level of granularity at both the temporal (time of publishing; ongoing documented events) and textual (content; discourse) levels. (If you are reading this and would like to take a look at the dataset, DM me via Twitter).

Anyway. In case you are curious, here’s the top 20 most frequent words in the text of the tweets, per source, in this dataset ( 15 February 2017 06:40:32 and 15 March 2017  08:14:20 Eastern Time). Analysis courtesy of Voyant Tools, applying a customised English stop words list (excluding Twitter-specific terms like rt,, https, etc, but leaving terms in hashtags).

Android iPhone
Term Count Trend Term Count Trend
fake 11 0.007795889 great 16 0.016129032
great 11 0.007795889 jobs 14 0.014112903
media 10 0.007087172 america 6 0.006048387
obama 10 0.007087172 trump 6 0.006048387
election 9 0.006378455 american 5 0.005040322
just 9 0.006378455 join 5 0.005040322
news 9 0.006378455 big 4 0.004032258
big 8 0.005669738 healthcare 4 0.004032258
failing 6 0.004252303 meeting 4 0.004032258
foxandfriends 6 0.004252303 obamacare 4 0.004032258
president 6 0.004252303 thank 4 0.004032258
russia 6 0.004252303 u.s 4 0.004032258
democrats 5 0.003543586 whitehouse 4 0.004032258
fbi 5 0.003543586 address 3 0.003024194
house 5 0.003543586 better 3 0.003024194
new 5 0.003543586 day 3 0.003024194
nytimes 5 0.003543586 exxonmobil 3 0.003024194
people 5 0.003543586 investment 3 0.003024194
white 5 0.003543586 just 3 0.003024194
american 4 0.002834869 make 3 0.003024194

Android vs iPhone: Most Frequent Words from_user_id_str 25073877 Per Source

I have archived 3,603 public Tweets from_user_id_str 25073877 published between 27/02/2016 00:06 and 27/02/2017 12:06 (GMT -5, Washington DC Time). This is almost exactly a year’s worth of Tweets from the account in question.

Eight source types were detected in the dataset. Most of the Tweets were published either from iPhone (46%) or an Android (45%).

The Tweet counts per source are as follows:


Instagram 2
MediaStudio 1
Periscope 1
Twitter Ads 1
Twitter for Android 1629
Twitter for iPad 22
Twitter for iPhone 1660
Twitter Web Client 287
 Total 3603


The table above visualised as a bar chart, just because:


Source of 3603 Tweets from_user_id_str 25073877 (27/02/2016 00:06 to 27/02/2017 12:06) Bar chart.


As a follow/up to a previous post, I share in the table below the top 50 most frequent word forms per source (iPhone and Android) in this set of 3,603 Tweets  from_user_id_str 25073877, courtesy of a quick text analysis (applying a customised English stop word list globally) made with Voyant Tools:


Android iPhone
Term Count Trend Term Count Trend
great 276 0.008124816 thank 417 0.015241785
hillary 252 0.00741831 trump2016 215 0.007858475
trump 184 0.005416544 great 190 0.006944698
crooked 162 0.004768914 makeamericagreatagain 165 0.006030922
people 160 0.004710038 join 160 0.005848167
just 151 0.004445099 rt 144 0.00526335
clinton 120 0.003532529 hillary 119 0.004349574
big 107 0.003149838 clinton 118 0.004313023
media 106 0.0031204 america 111 0.004057166
thank 94 0.002767148 trump 104 0.003801309
bad 89 0.002619959 make 89 0.003253043
president 88 0.002590521 new 88 0.003216492
make 86 0.002531646 tomorrow 82 0.002997186
america 85 0.002502208 people 75 0.002741328
cnn 85 0.002502208 maga 73 0.002668226
country 72 0.002119517 today 73 0.002668226
like 72 0.002119517 americafirst 69 0.002522022
u.s 72 0.002119517 draintheswamp 68 0.002485471
time 71 0.00209008 tonight 67 0.00244892
said 67 0.001972329 ohio 66 0.002412369
jobs 66 0.001942891 vote 63 0.002302716
vote 63 0.001854578 just 61 0.002229614
win 63 0.001854578 florida 59 0.002156512
new 62 0.00182514 crooked 52 0.001900654
going 59 0.001736827 going 49 0.001791001
news 58 0.001707389 imwithyou 49 0.001791001
bernie 56 0.001648513 president 49 0.001791001
foxnews 55 0.001619076 votetrump 49 0.001791001
good 54 0.001589638 tickets 46 0.001681348
wow 53 0.0015602 american 43 0.001571695
job 50 0.001471887 time 43 0.001571695
nytimes 50 0.001471887 pennsylvania 42 0.001535144
republican 50 0.001471887 poll 41 0.001498593
0 49 0.001442449 soon 41 0.001498593
today 49 0.001442449 support 41 0.001498593
totally 49 0.001442449 enjoy 38 0.00138894
enjoy 48 0.001413012 campaign 37 0.001352389
cruz 46 0.001354136 rally 37 0.001352389
election 46 0.001354136 carolina 35 0.001279287
look 46 0.001354136 north 35 0.001279287
want 46 0.001354136 live 34 0.001242735
obama 44 0.001295261 speech 33 0.001206184
dishonest 41 0.001206947 california 18 0.000657919
can’t 39 0.001148072 hillaryclinton 18 0.000657919
night 39 0.001148072 honor 18 0.000657919
really 39 0.001148072 job 18 0.000657919
show 39 0.001148072 nevada 18 0.000657919
way 39 0.001148072 right 18 0.000657919
ted 38 0.001118634 supertuesday 18 0.000657919


I thought you’d like to know.

Donald’s Followers Going Up and Up…

In the context of popular calls to unfollow it (there’s a hashtag too), I  thought it would be interesting to look at how the number of followers of said Twitter account has been changing recently.

I looked at a dataset of all the Tweets from the account linked above timestamped between 04/11/2016 14:56 and 13/02/2017 22:30 (Washington DC time).

The change in followers (user_followers_count) in that period of time looks like this:

user_follower_count growth from:realdonaldtrump in tweets timestamped between 04/11/2016 14:56 and 13/02/2017 22:30 (Washington DC time)


The world appears to be collapsing, but his follower count keep going up… I thought you’d like to know.

We’ll keep an eye on this.

If you want to be able to read the account’s tweets without following it directly, there are many options. In case it’s useful, here’s a live searchable archive of recent tweets. (It’s bandwidth and Tweet volume dependent, so the resource may not always load).


Insights from the Altmetric Top 100 2016

Altmetric Top 100 2016 Affiliations. via Altmetric

The Altmetric Top 100 2016 was published yesterday. If you click on the green ‘read more about this list’ button, you’ll see useful analysis of the data.

[I also wrote about the Altmetric Top 100 2014, here and here.]

It’s very welcome that this year Altmetric has deposited the article and affiliation data as two datasets as a collection on figshare:

Engineering, Altmetric (2016): Altmetric Top 100 2016. figshare. Retrieved: 11 24, Dec 14, 2016 (GMT)

This time the source data provides greater insights, particularly the article’s access type  (Open Access, ‘Free’ or paywalled), type of content (article, letter, etc.) and subject.

Altmetric has already provided an analysis of this data (percentage of OA outputs in the list; countries of affiliations, institutions etc.) but having access to the source data means their analysis, visualisations and findings are actually reproducible (reproducibility was identified as a topic gaining interest; see Cat Williams’ post here). By providing access to the source data openly, other types of analysis are not only possible but encouraged (for example text and content analysis of the top 100 output titles).

One insight for me is that this list again demonstrates the dominance of the usual countries of affiliation, and up to a certain extent of the same journals (considering that Altmetric tracks a selection of publications, not all publications that exist).

I was interested in finding out whether the Top 100 would include any articles authored or coauthored by researchers with a Mexican institution as affiliation. There are two:

  1. A genome-wide association scan in admixed Latin Americans identifies loci influencing facial and scalp hair features. Nature Communications 7, Article number: 10815 (2016) doi:10.1038/ncomms10815 (Published online:01 March 2016)
  2. Beverage purchases from stores in Mexico under the excise tax on sugar sweetened beverages: observational study. BMJ 2016;352:h6704 doi: (Published 06 January 2016)

It is notable that both articles are the result of international coauthorship; the Nature Communications article including authors from other Latin American countries (Argenitna, Chile, Colombia); the BMJ one from Mexico and the United States. Importantly, both articles are open access.

I was also interested in seeing whether any Information Science or Computer Science research had made it into the list. There is only one article whose subject was categorised as “Information and Computer Sciences”:

Mastering the game of Go with deep neural networks and tree search. Nature 529,484–489 (28 January 2016) doi:10.1038/nature16961

This is a paywalled article authored by a team of 21 authors with Google DeepMind (London, UK) as affiliation.

I believe access to this data is useful to understand the evolving landscape of scholarly communications. It can also help us authors to gain insights into what kind of research is receiving attention online.

For example, the data seems to contribute to a body of encdotal and bibliometric evidence indicating that, for researchers with affiliations in ‘developing’ nations,  open access and international collaboration remains key to greater visibility.

This year’s data also shows, again, that some countries (in the case of Africa, a whole continent), fields, and journals, remain under-represented or not present at all. It should also be noted that the only Computer Science article in the list is not by researchers affiliated to universities but to Google.

Yesterday I tweeted some quick thoughts after checking out the datasets, and compiled them using the new-ish ‘Moments’ feature on Twitter, which, for what it’s worth, I have embedded below.

[I also wrote about the Altmetric Top 100 2014, here and here.]


Engineering, Altmetric (2016): Altmetric Top 100 2016. figshare. Retrieved: 11 24, Dec 14, 2016 (GMT)

#TheDataDebates: A Quick Twitter Data Summary

Screenshot of an interactive visualisation of a #TheDataDebates archive created with Martin Hawksey's TAGSExplorer
Screenshot of an interactive visualisation of a #TheDataDebates archive created with Martin Hawksey’s TAGSExplorer

1 October 2016 Update: I have now deposited on figshare a CSV file with timestamps, source and user_lang metadata of the archived tweets.

Priego, Ernesto (2016): #TheDataDebates Tweet Timestamps, Source, User Language. figshare. Retrieved: 10 03, Oct 01, 2016 (GMT)

Social Media Data: What’s the use‘ was the title of a panel discussion held at The British Library, London, on Wednesday 21 September 2016, 18:00 – 20:00. The official hashtag of the event was #TheDataDebates.

I made a collection of Tweets tagged with #TheDataDebates published publicly between 12/09/2016 09:06:52 and 22/09/2016 09:55:03 (BST).

Again I used Tweepy 3.5.0, a Python wrapper for the Twitter API, for the collection. Learning to mine with Python has been fun and empowering. To compare results I also used, as usual, Martin Hawksey’s TAGS, with results being equal (I only collected Tweets from accounts with at least 1 follower). Having the collected data already in a spreadsheet saved me time. I only collected Tweets from accounts with at least one follower.

Here’s a summary of the collection:

First Tweet in Archive 12/09/2016 09:06:52
Last Tweet in Archive 22/09/2016 09:55:03
Number of Tweets 


Number of links


Number of RTs


Number of accounts


From the main archive I was able to focus on number of Tweets per source and user language setting.


source Count
Twitter for iPhone


Twitter Web Client


Twitter for Android


Twitter for iPad




UK Trends


Mobile Web (M5)




Twitter for Windows Phone


Big Data news flow










Lt RTEngine






User Language Setting (user_lang)

user_lang Count Notes






6 of it are spam






both spam


 The summary above is of the raw collection so not all the activity it reflects is either ‘human’ nor relevant, as some accounts tweeting have been identified as bots tweeting spam (a less human readable hashtag could have potentially avoided such spamming given the relatively low activity). Except where I identified spam Tweets, in this post I have not looked at the Tweets’ text data (i.e. I haven’t shared here any text or content analysis). Maybe if I have time in the near future. As Retweets were counted as Tweets in this archive a more specific and precise analysis would have to filter them from the dataset.

I am fully aware this would be more interesting and useful if there were opportunities for others to replicate the analysis through access to the source dataset I used. There are lots of interesting types of analysis that could be run and data to focus on in such a dataset as this. As in previous posts about other events, I am simply sharing this post right now as a quick indicative update published only a few hours after the event concluded.

It was pointed out last night that “social media data mining is starting but still has a way to go to catch up with hard analytical methodologies.” A post like this does not claim to employ a such methodologies, it simply seeks to contribute to the debate with evidence that may hopefully inspire other studies.  Perhaps it’s a two-way process, and  “hard analytical methodologies” (and researchers’ and users’ attitudes regarding cultural paradigms around ethics, privacy, consent, statistical significance)  have also a way to go to catch up with new/recent pervasive forms of data creation and dissemination that perhaps require different, media-community- and content-specific approaches to doing research.

Other Considerations [I am reusing my own text from previous posts here]

Both research and experience show that the Twitter search API is not 100% reliable. Large Tweet volumes affect the search collection process. The API might “over-represent the more central users”, not offering “an accurate picture of peripheral activity” (González-Bailon, Sandra, et al, 2012). Apart from the filters and limitations already declared, it cannot be guaranteed that each and every Tweet tagged with #TheDataDebates during the indicated period was analysed. The dataset was shared for archival, comparative and indicative educational research purposes only.

Only content from public accounts, obtained from the Twitter Search API, was analysed.  The source data is also publicly available to all Twitter users via the Twitter Search API and available to anyone with an Internet connection via the Twitter and Twitter Search web client and mobile apps without the need of a Twitter account. These posts and the resulting dataset contain the results of analyses of Tweets that were published openly on the Web with the queried hashtag; the content of the Tweets is responsibility of the original authors. Original Tweets are likely to be copyright their individual authors but please check individually.This work is shared to archive, document and encourage open educational research into scholarly activity on Twitter.

No private personal information was shared. The collection, analysis and sharing of the data has been enabled and allowed by Twitter’s Privacy Policy. The sharing of the results complies with Twitter’s Developer Rules of the Road. A hashtag is metadata users choose freely to use so their content is associated, directly linked to and categorised with the chosen hashtag.

The purpose and function of hashtags is to organise and describe information/outputs under the relevant label in order to enhance the discoverability of the labeled information/outputs (Tweets in this case). Tweets published publicly by scholars or other professionals during academic conferences or events are often publicly tagged (labeled) with a hashtag dedicated to the event n question. This practice used to be the confined to a few ‘niche’ fields; it is increasingly becoming the norm rather than the exception. Though every reason for Tweeters’ use of hashtags cannot be generalised nor predicted, it can be argued that scholarly Twitter users form specialised, self-selecting public professional networks that tend to observe scholarly practices and accepted modes of social and professional behaviour. In general terms it can be argued that scholarly Twitter users willingly and consciously tag their public Tweets with a conference hashtag as a means to network and to promote, report from, reflect on, comment on and generally contribute publicly to the scholarly conversation around conferences.

As Twitter users, conference Twitter hashtag contributors have agreed to Twitter’s Privacy and data sharing policies.Professional associations like the Modern Language Association and the American Pyschological Association recognise Tweets as citeable scholarly outputs. Archiving scholarly Tweets is a means to preserve this form of rapid online scholarship that otherwise can very likely become unretrievable as time passes; Twitter’s search API has well-known temporal limitations for retrospective historical search and collection. Beyond individual Tweets as scholarly outputs, the collective scholarly activity on Twitter around a conference or academic project or event can provide interesting insights for the contemporary history of scholarly communications. Though this work has limitations and might not be thoroughly systematic, it is hoped it can contribute to developing new insights into a discipline’s public concerns as expressed on Twitter over time.


González-Bailon, Sandra and Wang, Ning and Rivero, Alejandro and Borge-Holthoefer, Javier and Moreno, Yamir, Assessing the Bias in Samples of Large Online Networks (December 4, 2012).  Available at SSRN:

Priego, Ernesto (2016) #WLIC2016 Most Frequent Terms Roundup. figshare. [ahrcpress]. (2016, Sep 21).

Social media data mining is starting but still has a way to go to catch up with hard analytical methodologies #TheDataDebates [Tweet]. Retrieved from

Priego, Ernesto (2016): #TheDataDebates Tweet Timestamps, Source, User Language. figshare. Retrieved: 10 03, Oct 01, 2016 (GMT)

Sheffield Digital Humanities Congress 2016: #dhcshef 100 Most Frequent Terms

 A view of the #dhcshef 2016 dataset with Martin Hawksey's TAGS Explorer
A view of the #dhcshef 2016 dataset created with Martin Hawksey’s TAGS Explorer

The Sheffield Digital Humanities Congress 2016 was held from the 8th to the 10th of September 2016 at the University of Sheffield. The full conference programme is available here:

The event’s official hashtag was the same as in previous editions, #dhcshef.

I made a collection of Tweets tagged with #dhcshef published publicly between Monday September 05 2016 at 17:54:58 +0000 and Saturday September 10 2016 at 23:37:06 +0000. This time I used Tweepy 3.5.0, a Python wrapper for the Twitter API, for the collection. To compare results I also used, as usual, Martin Hawksey’s TAGS, with results being similar (I only collected Tweets from accounts with at least 1 follower).

As in previous occasions I extracted the text and usernames from this dataset and used VoyantTools for a basic text analysis. The dataset contained 1479 Tweets posted by 256 different accounts. 841 of those were RTs. The text of the Tweets composed a corpus with 26,094 total words and 3,057 unique word forms.

I used Voyant’s Terms tool to get the most frequent terms, applying an edited English stop words list that included Twitter and congress-specific terms (this means that words expected to be frequent like ‘digital’, ‘humanities’, ‘congress’, ‘sheffield’, as well as usernames, project’s names and people’s names were filtered out). I exported a list of 500 most frequent terms and then I manually refined the data so remaining people or project’s names were removed. (This is not case sensitive so I may have made mistakes and further disambiguation and refining would be required). If you are interested I previously detailed a similar methodology here.

Here’s my resulting list of the 100 most frequent terms.

Term Count








































































































































































































Please bear in mind that RTs count as Tweets and therefore the repetition implicit in RTs affects directly the frequent term counts. What terms made it into the top 100 reflects my own bias (I personally didn’t want to see how many times ‘digital’ or ‘humanities’ was repeated), but individual trend counts remain the same regardless.

I appreciate the stop words selection is indeed subjective (deictics like ‘tomorrow’ or ‘today’ may very well mean very little).  It’s up to the reader to judge if such a listing offers any insights at all; as Twitter moves relentlessly and as such data remains a moving a target, I’d like to believe that collecting and looking into frequent terms offers at least another point of view if not gateway into how a particular academic event is represented/discussed/reported on Twitter. Perhaps it’s my enjoyment of poetry that makes me think that seeing words out of context (or recontextualised) like this can offer some kind of food for thought or creativity.

Interestingly the dataset showed user_lang metadata other than en or en-GB: de, es, fr, it, nl and ru were also present even if in minority. The dataset also showed that some sources are clearly identified as bots.

I am fully aware this would be more interesting and useful if there were opportunities for others to replicate the text analysis through access to the source dataset I used. There are lots of interesting types of analysis that could be run and data to focus on in such a dataset as this. I am simply sharing this post right now as a quick indicative update after the event concluded.



%d bloggers like this: