Tweets per user_lang in a #DH2018 archive

I collected an archive of #DH2018 tweets from accounts with at least 10 followers. The main quant summary is in the table below, which I also tweeted earlier:

Twitter Activity for #DH2018, archive by Ernesto Priego

I wanted to take a quick look at number of tweets per user_lang. “user_lang” filters the language that appears in the user twitter profile. (Please note “user_lang” is different from “lang”, which, when present, indicates a BCP 47 language identifier corresponding to the machine-detected language of the tweeted text).

Filtering the #DH2018 tweets archive by user_lang and then counting the number of tweets per user_lang gives us the following table:

tweet count per user_lang

The archive only collected tweets from acounts with at least 10 followers. The table above can be, just for fun, visualised as a simple bar chart, as a means to quickly show the difference in volume:

user_lang #dh2018 archive bar chart

Please note the archive collects unique tweets including RTs,  therefore it can be a unique tweet by a unique user who has been retweeted several times (or none) that contributes to the count or a given user_lang.

In other words, the counts above do not indicate there were x number of users whose Twitter profiles had x language code, but merely the number of tweets in this specific archive organised according to the user_lang code from the tweeter’s Twitter profile.

Therefore what this can possibly provide an indication of is of the over or under-representation of tweets from accounts whose Twitter profiles have specific language codes. It’s not that x number of tweets in the archive were in this or that language, nor that x number of tweeters using the hashtag speak this or that language.

What becomes apparent is that an overwhelming majority of accounts with tweets in the archive have ‘en’ as the language code in their Twitter profiles; it is interesting that, in the archive, only one tweet was collected by an account with ‘es-MX’ as the language code in its Twitter profile.

One must also take into account that often ‘en’ is or might be the default user_lang code in Twitter profiles.

I still need to go back to my archives from previous years, but it does look like that in spite of the usual over-representation of the ‘en’ user_lang code, at least there is a diversity of user_lang in the archived tweets, with ‘es’ in second place.

Once I refine and anonymise the data I will be depositing the source data for this post.


*This blog post was typed quickly, typos and wonky syntax might have remained.

 

Great! News! People! Fake! Donald’s Tweets: 18 January 2017 to 18 January 2018

Trump Simplest Words image Image via The Telegraph
Image via The Telegraph

In two days it will be a year since the inauguration of Twitter user ID 25073877.  Time flies when things are beyond ridiculous, right?

Some of you may remember I’ve published before other posts looking into various aspects of this user’s tweetage. I have already detailed the methodology I have followed (as well as its acknowledged limitations) on some of those previous posts. This has been a work in progress. See for example this, or this, or even this. There’s more if you follow the links.

Anyway, as the anniversary of the inauguration approaches I wanted to share with you, for what it’s worth, some quick numbers from a whole year’s worth of Twitter data.

The dataset I worked with for the purpose of this post is based on a larger Twitter archive I’ve been collecting and studying.

The dataset that I looked into in this occasion is composed by 2,587 tweets posted between 18/01/2018 08:49 AM EST (GMT -5) and 18/01/2017 06:53 AM EST (GMT-5).

As usual I did some basic text analysis, and some quick comparative quant stuff.

20 Most Tweeted Terms

Term Count
great 473
news 190
people 182
fake 166
thank 162
just 160
today 158
president 151
big 145
tax 140
trump 137
america 134
country 128
u.s 125
jobs 116
american 115
time 110
foxandfriends 98
media 98
new 97

 

Other Twitter Data Numeralia

Twitter Text Counts

Number of ! 1,261
Number of Characters (no spaces, including URLs and usernames) 275,964
Number of Pages (single space, 12pt) 109
Number of Words 50,176

Follower Growth

User followers as of  18/01/2018 08:49 46,815,170
User followers as of 18/01/2017 06:53 20,227,768
Gained followers in the period 26,587,402

Tweets About the Mexico Border Wall

id_str time (EST)
9.53979E+17 18/01/2018 08:16
9.53264E+17 16/01/2018 08:54
9.51229E+17 10/01/2018 18:07
9.50884E+17 09/01/2018 19:16
9.49066E+17 04/01/2018 18:53
9.46732E+17 29/12/2017 08:16
9.38391E+17 06/12/2017 07:53
9.20425E+17 17/10/2017 19:03
9.18063E+17 11/10/2017 06:36
9.08274E+17 14/09/2017 06:20
9.01803E+17 27/08/2017 09:44
8.97833E+17 16/08/2017 10:51
8.97045E+17 14/08/2017 06:38
8.85279E+17 12/07/2017 19:24
8.78014E+17 22/06/2017 18:15
8.56849E+17 25/04/2017 08:36
8.56485E+17 24/04/2017 08:28
8.56172E+17 23/04/2017 11:44
8.56171E+17 23/04/2017 11:42
8.30406E+17 11/02/2017 08:18
8.24617E+17 26/01/2017 08:55
8.24084E+17 24/01/2017 21:37
8.23147E+17 22/01/2017 07:35

[hydrate tweets using twarc]

The susual caveats apply. Numbers must be taken with a pinch of salt: the Twitter Search API is not a complete index of all Tweets, but instead an index of recent Tweets– my archive has collected Tweets every hour, which means, for instance, that Tweets that are promptly deleted in between collections do not get archived.

I have attempted refining the dataset, but duplicated Tweets might have stubbornly survived, which in turn logically would have affected the counts. However, in spite of these limitations, the data is indicative and potentially useful and/or interesting as documentation of current and recent historical events. For what it’s worth.

We’ve lived with this user’s tweets daily, and we are very much aware of the kind of discourse developed through the constant, reliably exasperating tweetage. So these basic numbers are most likely not to tell you anything you weren’t aware of already. A simile occurs to me: we are all aware of the daily, accumulative effects of stress, or, say, ageing, but sometimes it is only until we compare snapshots that we realise the true extent of its effects.

A #comicsunconf15 Twitter Activity Summary and Archive

I am really happy to say the Scottish Comics Unconference Meet-Up last Saturday was a success. I am hoping to be able to write up some of my notes reflecting on the practice of co-organising and participating in this unconference soon.

In the meanwhile, this is what the day (Saturday 28 February 2015) looked like in terms of #comicsunconf15 Tweets:

#comicsunfonf15 Twitter Activity chart  2015-03-02 at 08.20.54

There’s still a live interactive archive of the hashtag here.

Find out more about how many #comicsunconf15 Tweets there were and how many of us tweeted using the hashtag in my post over at the Comics Grid blog.

Source Data

Priego, Ernesto (2015): A #comicsunconf15 Twitter Archive. figshare. http://dx.doi.org/10.6084/m9.figshare.1321222 Retrieved 10:26, Mar 03, 2015 (GMT)

#MLA15 Twitter Archive, 8-11 January 2015

130th MLA Annual Convention Vancouver, 8–11 January 2015

#MLA15 is the hashtag which corresponded to the 2015 Modern Language Association Annual Convention. The Convention was held in Vancouver from Thursday 8 to Sunday 11 January 2015.

We have uploaded a dataset as a .xlsx file including data from Tweets publicly published with #mla15:

Priego, Ernesto; Zarate, Chris (2015): #MLA15 Twitter Archive, 8-11 January 2015. figshare.
http://dx.doi.org/10.6084/m9.figshare.1293600

The dataset includes Tweets posted during the actual convention with #mla15: the set starts with a Tweet from Thursday 08/01/2015 00:02:53 Pacific Time and ends with a Tweet from Sunday 11/01/2015 23:59:58 Pacific Time.

The total number of Tweets in this dataset sums 23,609 Tweets. Only Tweets from users with at least two followers were collected.

A combination of Twitter Archiving Google Spreadsheets (Martin Hawksey’s TAGS 6.0; available at https://tags.hawksey.info/ ) was used to harvest this collection. OpenRefine (http://openrefine.org/) was used for deduplicating the data.

Please note the data in the file is likely to require further refining and even deduplication. The data is shared as is. The dataset is shared to encourage open research into scholarly activity on Twitter. If you use or refer to this data in any way please cite and link back using the citation information above.

For the #MLA14 datasets, please go to
Priego, Ernesto; Zarate, Chris (2014): #MLA14 Twitter Archive, 9-12 January 2014. figshare.
http://dx.doi.org/10.6084/m9.figshare.924801

A #HEFCEmetrics Twitter Archive

#hefcemetrics top tweeters

I have uploaded a new dataset to figshare:
Priego, Ernesto (2014): A #HEFCEmetrics Twitter Archive. figshare.
http://dx.doi.org/10.6084/m9.figshare.1196029

“In metrics we trust? Prospects & pitfalls of new research metrics” was a one-day workshop hosted by the University of Sussex, as part of the Independent Review of the Role of Metrics in Research Assessment. It took place on Tuesday 7 October 2014 at the Terrace Room, Conference Centre, Bramber House, University of Sussex, UK.

The file contains a dataset of 1178 Tweets tagged with #HEFCEmetrics (case not sensitive). These Tweets were published publicly and tagged with #HEFCEmetrics between 02/10/2014 10:18 and 08/10/2014 00:27 GMT.

The Tweets contained in the file were collected using Martin Hawksey’s TAGS 6.0. The file contains 3 sheets.

Please note the data in this file is likely to require further refining and even deduplication. The data is shared as is. The contents of each Tweet are responsibility of the original authors. This dataset is shared to encourage open research into scholarly activity on Twitter.

For more information refer to the upload itself.

If you use or refer to this data in any way please cite and link back using the citation information above.

1:AM London Altmetrics Conference: A #1AMconf Twitter Archive

1:AM  London 2014 logo

I have uploaded a new dataset to figshare:

Priego, Ernesto (2014): 1:AM London Altmetrics Conference: A #1AMconf Twitter Archive .  figshare.
http://dx.doi.org/10.6084/m9.figshare.1185443

1:AM London, “the 1st Altmetrics Conference: London”, took place 25th—26th September 2014 at the Wellcome Collection, London, UK.

The  file contains a dataset of 4267 Tweets tagged with #1AMconf (case not sensitive). These Tweets were published publicly and tagged with #1AMconf  between Thursday September 18 17:29:56 +0000 2014 and Sunday September 28 16:07:49 +0000 2014.

Only users with at least 2 followers were included in the archive. Retweets have been included. An initial automatic deduplication was performed but data might require further deduplication. The Time column (D) has times in British Summer Time (BST).

Please go to the file cited above for more information.

 

A #IGNCC14 Twitter Archive (Conference Days Only)

The Fifth International Comics and Graphic Novels Conference took place in London 18- 20 July 2014. The official hashtag was #IGNCC14.

I have uploaded to figshare an .XLS file containing a dataset of Tweets tagged with #IGNCC14 (case not sensitive).

Priego, Ernesto (2014): A #IGNCC4 Twitter Archive (Conference Days Only).   figshare.

http://dx.doi.org/10.6084/m9.figshare.1112639

The complete archive contains  1294  Tweets published publicly and tagged with #IGNCC14 between 18/07/2014  07:25:47 BST and 21/07/2014  10:17:15 BST.

The conference’s Twitter activity at a glance:

 

#igncc14 TAGS Archive dashboard
#igncc14 TAGS Archive dashboard
#igncc14 Tweet Volume Over Time
#igncc14 Tweet Volume Over Time

The Tweets contained in the archive were collected using Martin Hawksey’s TAGS 5.1.  The file contains five sheets:

  • Sheet 0. A ‘Cite Me’ sheet, including procedence of this file, citation information,  information about its contents, the methods employed and some context.
  • Sheet 1.  Complete #IGNCC14 Archive (Conference days only). 1294 Tweets, from 18/07/2014  07:25:47 BST to 21/07/2014  10:17:15 BST.
  • Sheet 2.  Friday 18 July 2014. 469 Tweets, from 18/07/2014  07:25:47 BST  to 18/07/2014  21:27:23 BST.
  • Sheet 3. Saturday 19 July 2014. 390 Tweets, from 19/07/2014  06:54:24 BST to 19/07/2014  18:01:05 BST.
  • Sheet 4. Sunday 20 July 2014. 433 Tweets, from 20/07/2014  01:41:11 BST to 21/07/2014  10:17:15 BST.

Tweets collected under Local London, UK times. Times in GMT also included.

Only users with at least 2 followers were included in the archive. Retweets have been included. An initial automatic deduplication was performed. I manually organised and quantified the Tweets in the archive into conference days.

Please note that both research and experience show that the Twitter search API isn’t 100% reliable. Large tweet volumes affect the search collection process. The API might “over-represent the more central users”, not offering “an accurate picture of peripheral activity” (González-Bailón, Sandra, et al. 2012). It is not guaranteed thE file contains each and every Tweet tagged with #IGNCC14 during the indicated period, and is shared for comparative and indicative educational and research purposes only.

Please note the data in this file is likely to require further refining and even deduplication. The data is shared as is.  This dataset is shared to encourage open research into scholarly activity on Twitter. If you use or refer to this data in any way please cite and link back using the citation information above.

Open Repositories 2014, Helsinki. An #or2014 Archive

Open Repositories Conference in Helsinki in June 2014

The 9th International Conference on Open Repositories was held from 9 to 13 June, 2014 in Helsinki, Finland. It is the leading international conference in its field, with an attendance of around 400, with participants from all around the world.

The official hashtag was #or2014.

As part of my ongoing open research on academic conference Twitter backchannel data I have uploaded to figshare an XLS file contains an archive of Tweets tagged with #or2014 (case not sensitive).

Priego, Ernesto (2014): Open Repositories 2014, Helsinki. An #or2014 Archive. figshare.
http://dx.doi.org/10.6084/m9.figshare.1058927

The archive contains 5174 Tweets dated between 02/06/2014 22:32:41 and 13/06/2014 12:44:37.

Times (column E) are Helsinki, Finland times.

This file contains two sheets:

Sheet 0. A ‘Cite Me’ sheet, including procedence of the file, citation information, information about its contents, the methods employed and some context.

Sheet 1. The Archive containing 5174 Tweets dated between 02/06/2014 22:32:41 and 13/06/2014 12:44:37.

The tweets contained in the file were collected using Martin Hawksey’s TAGS 5.1. To avoid spam only users with at least 2 followers were included in the archive. Retweets have been included.

The usual information about methods and limitations in the file and the figshare page for the upload.

If you use or refer to the data in the file in any way please make sure you are using the latest version and please cite and link back using the citation information above.

#HASTAC2013 Interactive Archive

http://hastac2013.org/
http://hastac2013.org/

Version 2.0 The figshare and HASTAC versions of this post have been updated accordingly.

UPDATE Thursday 2 May 2013, 08:48am BST. 

Unfortunately I did not have time to do a new collection increasing the number of tweets to collect. The initial collection used the default 1500, and even though I did it on the Monday morning (BST time) after the conference the archive did not go back enough (it can only go back 7 days). In retrospect I should have aimed to collect more tweets than the default 1500 the first time around, but I was concerned the script would time out.

I only found some time this morning to try again (script having timed out when I tried 18,000 tweets, which is the maximum output), and using 17,500 at worked this time, taking me as back as 26/04/13, 08:2243, which is more than 24 hours before my previous collection.The Conference information says activities started on 25/04/13 (Thursday) but as the programme and now both #hastac2013 archives confirm the day with the most activity was 27/04/13 (Saturday). Therefore though this new collection does not go as back as the 25th, at least it covers the day before activity peaked. Where the previous archive had 1500 tweets, this new one gathered 3,898.

Here two screeshots of the second archive’s summary charts right after I ran the collection:

3898 tweets collected, archive started 26/04/13 8:22:43. Archive set up by Ernesto Priego using TAGS.
3898 tweets collected, archive started 26/04/13 8:22:43. Archive set up by Ernesto Priego using TAGS.
#HASTAC2013 Tweet Volume Over Time, second collection, with peak on 28/04/13, reaching ∼1200 tweets. Archive set up by Ernesto Priego
#HASTAC2013 Tweet Volume Over Time, second collection, with peak on 28/04/13, reaching ∼1200 tweets. Archive set up by Ernesto Priego

You can see a published interactive archive of this new archive here.

I link to the published spreadsheets from the PDF version of this post that can be downloaded from figshare.

HASTAC2013 Interactive Archive. Ernesto Priego. figshare.
http://dx.doi.org/10.6084/m9.figshare.693045

[Version 1.0 below]

As they describe it themselves, the Humanities, Arts, Science, Technology Advanced Collaboratory (HASTAC – “haystack” hastac.org), is “an organisation at the international forefront of knowledge mobilization for our digital present and innovation in the academy.” I have had the honour to be a HASTAC Scholar blogging at their site since 2010.

2013 marks the 10th anniversary of HASTAC’s founding, and on 25th-28th April they celebrated their decennial conference, titled “The Storm of Progress: New Horizons, New Narratives, New Codes”, in Toronto, Canada.

I was able to participate in the conference via pre-recorded video thanks to Fiona Barnett’s kind invitation on Saturday the 27th. While I as presenting in real life at the Forms of Innovation workshop at Durham, UK, my video was being shown in Toronto! This also means that while colleagues were live-tweeting about my session at #formsinn, they were also live-tweeting from #hastac2013…

Anyway many of us were able to follow the proceedings of the HASTAC conference through a lively Twitter backchannel. I believe the backchannel is a useful research resource on its own, and of course it allows us to perform some ‘meta’ analysis of the network itself. I set up a Google spreadsheet to collect #HASTAC2013 tweets and created an interactive archive that visualises the interactions in real time. (This will make demands from your browser…)

[I have done an intial archive covering only the latest (at the time of publishing) 1500 tweets, as high values may no work due to script timeouts, but I am currently experimenting trying to get the majority of the #hastac2013 output. Will update accordingly. Times from my archiving are GMT].

Screen Shot of a moment in #HASTAC2013 interactive archive, 2013-04-29 at 09.03.13     Screen Shot of a moment in #HASTAC2013 interactive archive, 2013-04-29 at 09.03.13
Screen Shot of a moment in #HASTAC2013 interactive archive, 2013-04-29 at 09.03.13
1500 tweets, 599 RTs, 421 links. Archive started by Ernesto Priego 27/04/2013 18:42:19.
1500 tweets, 599 RTs, 421 links. Archive started by Ernesto Priego 27/04/2013 18:42:19 GMT.
#HASTAC2013 Tweet Volume Over Time, with peak on 28/04/13, reaching ∼500 tweets. Archive set up by Ernesto Priego
#HASTAC2013 Tweet Volume Over Time, with peak on 28/04/13, reaching ∼500 tweets. Archive set up by Ernesto Priego

I have archived and shared a version of this blog post as a PDF on Figshare, so it gets a digital object identifier. Citation is:

HASTAC2013 Interactive Archive. Ernesto Priego. figshare.
http://dx.doi.org/10.6084/m9.figshare.693045

Retrieved 09:12, Apr 29, 2013 (GMT)

As usual, with many thanks to Martin Hawksey.