Great! News! People! Fake! Donald’s Tweets: 18 January 2017 to 18 January 2018

Trump Simplest Words image Image via The Telegraph
Image via The Telegraph

In two days it will be a year since the inauguration of Twitter user ID 25073877.  Time flies when things are beyond ridiculous, right?

Some of you may remember I’ve published before other posts looking into various aspects of this user’s tweetage. I have already detailed the methodology I have followed (as well as its acknowledged limitations) on some of those previous posts. This has been a work in progress. See for example this, or this, or even this. There’s more if you follow the links.

Anyway, as the anniversary of the inauguration approaches I wanted to share with you, for what it’s worth, some quick numbers from a whole year’s worth of Twitter data.

The dataset I worked with for the purpose of this post is based on a larger Twitter archive I’ve been collecting and studying.

The dataset that I looked into in this occasion is composed by 2,587 tweets posted between 18/01/2018 08:49 AM EST (GMT -5) and 18/01/2017 06:53 AM EST (GMT-5).

As usual I did some basic text analysis, and some quick comparative quant stuff.

20 Most Tweeted Terms

Term Count
great 473
news 190
people 182
fake 166
thank 162
just 160
today 158
president 151
big 145
tax 140
trump 137
america 134
country 128
u.s 125
jobs 116
american 115
time 110
foxandfriends 98
media 98
new 97

 

Other Twitter Data Numeralia

Twitter Text Counts

Number of ! 1,261
Number of Characters (no spaces, including URLs and usernames) 275,964
Number of Pages (single space, 12pt) 109
Number of Words 50,176

Follower Growth

User followers as of  18/01/2018 08:49 46,815,170
User followers as of 18/01/2017 06:53 20,227,768
Gained followers in the period 26,587,402

Tweets About the Mexico Border Wall

id_str time (EST)
9.53979E+17 18/01/2018 08:16
9.53264E+17 16/01/2018 08:54
9.51229E+17 10/01/2018 18:07
9.50884E+17 09/01/2018 19:16
9.49066E+17 04/01/2018 18:53
9.46732E+17 29/12/2017 08:16
9.38391E+17 06/12/2017 07:53
9.20425E+17 17/10/2017 19:03
9.18063E+17 11/10/2017 06:36
9.08274E+17 14/09/2017 06:20
9.01803E+17 27/08/2017 09:44
8.97833E+17 16/08/2017 10:51
8.97045E+17 14/08/2017 06:38
8.85279E+17 12/07/2017 19:24
8.78014E+17 22/06/2017 18:15
8.56849E+17 25/04/2017 08:36
8.56485E+17 24/04/2017 08:28
8.56172E+17 23/04/2017 11:44
8.56171E+17 23/04/2017 11:42
8.30406E+17 11/02/2017 08:18
8.24617E+17 26/01/2017 08:55
8.24084E+17 24/01/2017 21:37
8.23147E+17 22/01/2017 07:35

[hydrate tweets using twarc]

The susual caveats apply. Numbers must be taken with a pinch of salt: the Twitter Search API is not a complete index of all Tweets, but instead an index of recent Tweets– my archive has collected Tweets every hour, which means, for instance, that Tweets that are promptly deleted in between collections do not get archived.

I have attempted refining the dataset, but duplicated Tweets might have stubbornly survived, which in turn logically would have affected the counts. However, in spite of these limitations, the data is indicative and potentially useful and/or interesting as documentation of current and recent historical events. For what it’s worth.

We’ve lived with this user’s tweets daily, and we are very much aware of the kind of discourse developed through the constant, reliably exasperating tweetage. So these basic numbers are most likely not to tell you anything you weren’t aware of already. A simile occurs to me: we are all aware of the daily, accumulative effects of stress, or, say, ageing, but sometimes it is only until we compare snapshots that we realise the true extent of its effects.

#rfringe17: Top 230 Terms in Tweetage

 

 

fringelogo-2017-justlogo

 

tl; dr

Repository Fringe is a gathering for repository managers and others interested in research data repositories and publication repositories.

I collected an archive of #rfringe17, containing 1118 Tweet IDs. I then analysed the text in the tweets with Voyant Tools to identify most frequent terms and manually refined the results to 230 terms.

I collected an archive of #rfringe17 tweets using TAGS. The key stats from the archive:

Number of Tweets in Archive 1,118
Number of usernames in Archive 215
First Tweet Collected 26/07/2017 14:58:12
Last Tweet Collected 05/08/2017 08:00:06

From http://www.repositoryfringe.org/:

Repository Fringe is a gathering for repository managers and others interested in research data repositories and publication repositories. Participation is a key element – the event is designed to encourage all attendees to share their repository experiences and expertise.

2017 marks the 10th Repo Fringe where we will be celebrating progress we have made over the last 10 years to share content beyond borders and debating future trends and challenges.

It took place in Edinburgh,  3 – 4 August 2017.

If you are not new to this blog you will then guess that I could not resist running the text of the tweets collected through Voyant Tools to obtain the term counts in the corpus with their Terms tool. As usual I applied the English stop words filter which I customised to include Twitter-specific terms (such as https, t.co, etc.) and the list of usernames.

I then manually refined the resulting data to remove smileys and any remaining usernames (some might have survived as it’s hard to disambiguate sometimes normal terms from usernames). I limited the results to 230 top terms.

Do take the counts with a pinch of salt as I did not clean the export from TAGS so Tweet duplicates and perhaps even some spam (who knows) might have remained.

Term Count
research 109
open 106
data 104
wikidata 75
oa 72
openscience 66
repository 63
repofringe 56
repositories 53
libraries 51
openresleeds 49
copyright 46
just 43
science 42
good 41
impact 41
thanks 41
day 39
access 38
poster 36
work 35
openaccess 34
talk 34
edinburgh 30
today 30
great 29
ucl 29
sherpa 28
read 27
want 27
event 26
project 26
really 26
time 26
cool 25
fringe 25
policy 24
metadata 23
publishers 23
publishing 23
says 23
colleague 22
policies 22
wikipedia 22
workflow 22
guide 21
millar 21
useful 21
comprehensive 20
content 20
fascinating 20
interesting 20
liveblogs 20
rdm 20
institutional 19
issue 19
it’s 19
liveblog 19
look 19
new 19
think 19
workshop 19
check 18
citizen 18
events 18
group 18
ip 18
management 18
need 18
outputs 18
presentation 18
rescue 18
session 18
trump 18
casrai 17
cycle 17
excellent 17
journal 17
lots 17
promotion 17
query 17
resource 17
uk 17
best 16
future 16
press 16
stuff 16
gallery 15
i’m 15
key 15
ref 15
showing 15
successful 15
support 15
thank 15
working 15
art 14
come 14
core 14
fun 14
miss 14
nice 14
process 14
provide 14
reminding 14
university 14
using 14
way 14
add 13
beautiful 13
demo 13
deposit 13
eprints 13
forward 13
funders 13
importance 13
keynote 13
looking 13
paper 13
phd 13
researchers 13
vote 13
e.g 12
era 12
especially 12
feedback 12
generation 12
got 12
let 12
needed 12
observation 12
recent 12
report 12
review 12
showcase 12
site2cite 12
star 12
theses 12
try 12
we’re 12
weirdness 12
advises 11
attendees 11
boat 11
broken 11
coar 11
control 11
criteria 11
exposure 11
global 11
institutions 11
like 11
model 11
prof 11
scholarly 11
survey 11
trek 11
use 11
years 11
articles 10
award 10
case 10
excited 10
exposing 10
figshare 10
gifts 10
hear 10
highlighted 10
important 10
initiative 10
integrating 10
introducing 10
live 10
opening 10
platform 10
ref2021 10
spend 10
vision 10
week 10
won 10
workshops 10
altmetric 9
colleagues 9
current 9
discussion 9
evidence 9
field 9
getting 9
i’ll 9
infrastructure 9
inspiring 9
library 9
link 9
list 9
local 9
long 9
make 9
meeting 9
peer 9
post 9
practice 9
preservation 9
problem 9
role 9
service 9
shoutout 9
shows 9
slides 9
sure 9
team 9
thought 9
touch 9
tweets 9
works 9
added 8
based 8
believe 8
better 8
change 8
conference 8
contributing 8
days 8
european 8
example 8
far 8
favourite 8
fully 8
here’s 8
image 8
included 8

Logically sharing this data as an HTML table is not the best way of doing it but hey. I have the source data if anyone is interested; Twitter developer guidelines allow the sharing of tweet IDs. In this case the source data is composed by the dataset of 1118 tweet ID strings (id_str).

Maybe I missed it but in the list above I could not find ‘bepress’ or ‘elsevier‘, by the way…

People, Government: Top 300 Terms in the Conservative and Labour Manifestos 2017 (Counts and Trends)

A word cloud of the most frequent 500 terms in the Conservative Manifesto 2017. Word cloud created with Voyant Tools.
A word cloud of the most frequent 500 terms in the Conservative Manifesto 2017. Word cloud created with Voyant Tools.

The Labour and Conservative Manifestos 2017 are arguably two of the most important public documents in the UK these days. I have just deposited the following data on figshare:

Priego, Ernesto (2017): Top 300 Terms in the Conservative and Labour Manifestos 2017 (Counts and Trends). figshare. https://doi.org/10.6084/m9.figshare.5016983.v1

I thought some may be interested in practicing some distant reading, or have some fun composing your own Manifesto…

Word Counts and Trends in the Letter Triggering Article 50

PM Theresa May. Crown Copyright; Open Government Licence.
Photo: Crown Copyright; published under the Open Government Licence.

 

As you know I have an interest in political discourse and communications; I find it interesting to see which terms are used when and how frequently in which contexts.

I’m not sure counting words and calculating a term’s trend in a corpus or document (particularly a brief, contemporary document) tells us anything ‘new’, but it is, at least, a different or alternative way to look into a text, to, let’s say, get into it. It perhaps undresses a text, leaving words naked as quantified signals (or perhaps bricks… that could be used to build something different using the exact same components).

I’m aware I still need to do an update on my Trump Tweets data collection, but in the meanwhile, closer home perhaps, I have deposited on figshare a a CSV file listing counts and trends of 459 terms or word forms in the full text of Prime Minister Theresa May’s letter to Donald Tusk triggering Article 50 (29 March 2017).

Counts and Trends of 459 Terms in ‘Prime Minister’s letter to Donald Tusk triggering Article 50’ (29 March 2017). figshare. https://doi.org/10.6084/m9.figshare.4801591.v1

English stop words were applied. Text analysis performed with Voyant Tools 2.2, CC BY Stéfan Sinclair & Geoffrey Rockwell (2017).

The data shared is the result of text analysis of a document published on the www.planforbritain.gov.uk website which is is published under the Open Government Licence. The data shared here obeys the terms of that licence.

www.planforbritain.gov.uk is subject to Crown copyright protection unless otherwise indicated. Read the Crown Copyright page on the National Archives website for more information.

Android vs iPhone: Trends in a Month’s Worth of Trumpian Tweetage

What’s in a month’s worth of presidential tweetage?

I prepared a dataset containing a total of 123 public Tweets and corresponding metadata from user_id_str 25073877 between 15 February 2017 06:40:32 and 15 March 2017  08:14:20 Eastern Time (this figure does not factor in any tweets the user may have deleted shortly after publication). Of the 123 Tweets 68 were published from Android; 55 from iPhone. The whole text of the Tweets in the dataset accounts for 2,288 words, or 12,364 characters (no spaces; including URLs).

Using the Trends tools from Voyant Tools by Stéfan Sinclair & Geoffrey Rockwell I visualised the raw frequencies of the terms ‘Android’ and ‘iPhone’ in this dataset over 30 segments (more or less corresponding to the length of the month covered in the dataset) where each timestamped Tweet, sorted in chronological order, had its corresponding source indicated.

The result looked like this:

Raw frequency of Tweets per source in 30 segments by realdonaldtrump between 15 February 2017 06:40:32 and 15 March 2017 08:14:20 Eastern Time. Total: 123 Tweets: 68 from Android; 55 from iPhone. Data collected and analysed by Ernesto Priego. CC-BY. Chart made with Trends, Voyant Tools by Stéfan Sinclair & Geoffrey Rockwell (CC 2017).
Raw frequency of Tweets per source in 30 segments by realdonaldtrump between 15 February 2017 06:40:32 and 15 March 2017 08:14:20 Eastern Time. Total: 123 Tweets: 68 from Android; 55 from iPhone. Data collected and analysed by Ernesto Priego. CC-BY. Chart made with Trends, Voyant Tools by Stéfan Sinclair & Geoffrey Rockwell (CC 2017).

The chart does indeed reflect the higher number of Tweets from Android, and it also shows how over the whole document both sources are, in spite of more frequent absences from Tweets from iPhone, present throughout. The question as usual is what does this tell us. Back in 9 August 2016 David Robinson published an insightful analysis where he concludes that “he [Trump] writes only the (angrier) Android half”. With the source data I have gathered so far it would be possible (given the time and right circumstances) to perform a content analysis of Tweets per source, in order to confirm or reject any potential corelations between types of Tweets (re: tone, function, sentiment, time of day) and source used to post them.

Eyeballing the data, specifically since Inauguration Day until the present, does not seem to provide unambiguous evidence that the Tweets are undoubtedly written by two different persons (or more). What it is factual is that the Tweets do come from different sources (see my previous post), but at the moment, like with everything else this administration has been doing, my cursory analysis has only found conflicting insights, where for example a Tweet one would perhaps have expected to have been posted from iPhone (attributable hypothetically to a potentially less inflammable aide) was in fact posted from Android, and viceversa.

I may be wrong, but at the moment I cannot see any evidence there is any kind of predictable pattern, let alone strategy, behind the alternation between Android and iPhone (the only two type of sources used to publish Tweet from the account in question in the last month). Most of the times Tweets by source type will come in sequences of four or more Tweets, but sometimes a random lone Tweet from a different source will be sandwiched in between.

More confunsigly, all of the Tweets published between 08/03/2017 18:50 and 15/03/2017  08:14:20 have only had iPhone as source, without exception. Attention to detail is required to run robust statistical and content analyses that consider complete timestamps and further code the Tweet text and time data into more discrete categories, attempting a high level of granularity at both the temporal (time of publishing; ongoing documented events) and textual (content; discourse) levels. (If you are reading this and would like to take a look at the dataset, DM me via Twitter).

Anyway. In case you are curious, here’s the top 20 most frequent words in the text of the tweets, per source, in this dataset ( 15 February 2017 06:40:32 and 15 March 2017  08:14:20 Eastern Time). Analysis courtesy of Voyant Tools, applying a customised English stop words list (excluding Twitter-specific terms like rt, t.co, https, etc, but leaving terms in hashtags).

Android iPhone
Term Count Trend Term Count Trend
fake 11 0.007795889 great 16 0.016129032
great 11 0.007795889 jobs 14 0.014112903
media 10 0.007087172 america 6 0.006048387
obama 10 0.007087172 trump 6 0.006048387
election 9 0.006378455 american 5 0.005040322
just 9 0.006378455 join 5 0.005040322
news 9 0.006378455 big 4 0.004032258
big 8 0.005669738 healthcare 4 0.004032258
failing 6 0.004252303 meeting 4 0.004032258
foxandfriends 6 0.004252303 obamacare 4 0.004032258
president 6 0.004252303 thank 4 0.004032258
russia 6 0.004252303 u.s 4 0.004032258
democrats 5 0.003543586 whitehouse 4 0.004032258
fbi 5 0.003543586 address 3 0.003024194
house 5 0.003543586 better 3 0.003024194
new 5 0.003543586 day 3 0.003024194
nytimes 5 0.003543586 exxonmobil 3 0.003024194
people 5 0.003543586 investment 3 0.003024194
white 5 0.003543586 just 3 0.003024194
american 4 0.002834869 make 3 0.003024194

Android vs iPhone: Most Frequent Words from_user_id_str 25073877 Per Source

I have archived 3,603 public Tweets from_user_id_str 25073877 published between 27/02/2016 00:06 and 27/02/2017 12:06 (GMT -5, Washington DC Time). This is almost exactly a year’s worth of Tweets from the account in question.

Eight source types were detected in the dataset. Most of the Tweets were published either from iPhone (46%) or an Android (45%).

The Tweet counts per source are as follows:

 

Instagram 2
MediaStudio 1
Periscope 1
Twitter Ads 1
Twitter for Android 1629
Twitter for iPad 22
Twitter for iPhone 1660
Twitter Web Client 287
 Total 3603

 

The table above visualised as a bar chart, just because:

 

Source of 3603 Tweets from_user_id_str 25073877 (27/02/2016 00:06 to 27/02/2017 12:06) Bar chart.

 

As a follow/up to a previous post, I share in the table below the top 50 most frequent word forms per source (iPhone and Android) in this set of 3,603 Tweets  from_user_id_str 25073877, courtesy of a quick text analysis (applying a customised English stop word list globally) made with Voyant Tools:

 

Android iPhone
Term Count Trend Term Count Trend
great 276 0.008124816 thank 417 0.015241785
hillary 252 0.00741831 trump2016 215 0.007858475
trump 184 0.005416544 great 190 0.006944698
crooked 162 0.004768914 makeamericagreatagain 165 0.006030922
people 160 0.004710038 join 160 0.005848167
just 151 0.004445099 rt 144 0.00526335
clinton 120 0.003532529 hillary 119 0.004349574
big 107 0.003149838 clinton 118 0.004313023
media 106 0.0031204 america 111 0.004057166
thank 94 0.002767148 trump 104 0.003801309
bad 89 0.002619959 make 89 0.003253043
president 88 0.002590521 new 88 0.003216492
make 86 0.002531646 tomorrow 82 0.002997186
america 85 0.002502208 people 75 0.002741328
cnn 85 0.002502208 maga 73 0.002668226
country 72 0.002119517 today 73 0.002668226
like 72 0.002119517 americafirst 69 0.002522022
u.s 72 0.002119517 draintheswamp 68 0.002485471
time 71 0.00209008 tonight 67 0.00244892
said 67 0.001972329 ohio 66 0.002412369
jobs 66 0.001942891 vote 63 0.002302716
vote 63 0.001854578 just 61 0.002229614
win 63 0.001854578 florida 59 0.002156512
new 62 0.00182514 crooked 52 0.001900654
going 59 0.001736827 going 49 0.001791001
news 58 0.001707389 imwithyou 49 0.001791001
bernie 56 0.001648513 president 49 0.001791001
foxnews 55 0.001619076 votetrump 49 0.001791001
good 54 0.001589638 tickets 46 0.001681348
wow 53 0.0015602 american 43 0.001571695
job 50 0.001471887 time 43 0.001571695
nytimes 50 0.001471887 pennsylvania 42 0.001535144
republican 50 0.001471887 poll 41 0.001498593
0 49 0.001442449 soon 41 0.001498593
today 49 0.001442449 support 41 0.001498593
totally 49 0.001442449 enjoy 38 0.00138894
enjoy 48 0.001413012 campaign 37 0.001352389
cruz 46 0.001354136 rally 37 0.001352389
election 46 0.001354136 carolina 35 0.001279287
look 46 0.001354136 north 35 0.001279287
want 46 0.001354136 live 34 0.001242735
obama 44 0.001295261 speech 33 0.001206184
dishonest 41 0.001206947 california 18 0.000657919
can’t 39 0.001148072 hillaryclinton 18 0.000657919
night 39 0.001148072 honor 18 0.000657919
really 39 0.001148072 job 18 0.000657919
show 39 0.001148072 nevada 18 0.000657919
way 39 0.001148072 right 18 0.000657919
ted 38 0.001118634 supertuesday 18 0.000657919

 

I thought you’d like to know.

Words Donald Likes So Far!

Trump Simplest Words image Image via The Telegraph
Image via The Telegraph, 20 September 2016

Since 20/01/2017 07:31:53 AM Eastern Time until 06/02/2017  07:07:55 AM Eastern Time Donald has…

  • …published 106 Tweets with his realDonaldTrump Twitter account. (He has published at least two more in the time I’ve been drafting this).
  • In this collection only one was published from the Twitter Web Client (the first one in this set)
  • 34 Tweets were published from Twitter for iPhone (mostly for Tweets between 20/01/2017 23:56 and 02/02/2017  12:29:16)
  • the remaining 71 Tweets were published from Twitter for Android.
  • All his latest Tweets, between 03/02/2017  06:24:51 and 06/02/2017  07:07:55, were published from Twitter for Android.
  • In this corpus he typed about 2,096 words or word forms (including URLs, Twitter account mentions and hashtags). This is about 5 pages.
  • 67 of his 106 Tweets include exclamation marks (!).
  • 31 of this 106 Tweets have included at least one word in all caps.

Sorry for all the bold type above.

Finally, these are the top 50 most frequent words (and emojis) in this set of 106 realDonaldTrump Tweets, courtesy of a quick text analysis (applying a customised English stop word list globally) made with Voyant Tools:

Term Count Trend
people 19 0.008796296
country 13 0.006018519
great 12 0.005555556
u.s 10 0.00462963
america 9 0.004166667
news 9 0.004166667
bad 8 0.003703704
fake 8 0.003703704
security 7 0.003240741
american 6 0.002777778
court 6 0.002777778
decision 6 0.002777778
enjoy 6 0.002777778
jobs 6 0.002777778
judge 6 0.002777778
just 6 0.002777778
meeting 6 0.002777778
today 6 0.002777778
ban 5 0.002314815
going 5 0.002314815
iran 5 0.002314815
make 5 0.002314815
states 5 0.002314815
thank 5 0.002314815
tonight 5 0.002314815
beginning 4 0.001851852
big 4 0.001851852
bring 4 0.001851852
coming 4 0.001851852
day 4 0.001851852
deal 4 0.001851852
election 4 0.001851852
illegal 4 0.001851852
interview 4 0.001851852
interviewed 4 0.001851852
like 4 0.001851852
long 4 0.001851852
nytimes 4 0.001851852
obama 4 0.001851852
p.m 4 0.001851852
party 4 0.001851852
president 4 0.001851852
supreme 4 0.001851852
united 4 0.001851852
whitehouse 4 0.001851852
yesterday 4 0.001851852
ºðÿ 4 0.001851852
abc 3 0.001388889
administration 3 0.001388889

So these are the most frequent presidential words so far.

I thought you would like to know.

Sheffield Digital Humanities Congress 2016: #dhcshef 100 Most Frequent Terms

 A view of the #dhcshef 2016 dataset with Martin Hawksey's TAGS Explorer
A view of the #dhcshef 2016 dataset created with Martin Hawksey’s TAGS Explorer

The Sheffield Digital Humanities Congress 2016 was held from the 8th to the 10th of September 2016 at the University of Sheffield. The full conference programme is available here: http://www.hrionline.ac.uk/dhc.

The event’s official hashtag was the same as in previous editions, #dhcshef.

I made a collection of Tweets tagged with #dhcshef published publicly between Monday September 05 2016 at 17:54:58 +0000 and Saturday September 10 2016 at 23:37:06 +0000. This time I used Tweepy 3.5.0, a Python wrapper for the Twitter API, for the collection. To compare results I also used, as usual, Martin Hawksey’s TAGS, with results being similar (I only collected Tweets from accounts with at least 1 follower).

As in previous occasions I extracted the text and usernames from this dataset and used VoyantTools for a basic text analysis. The dataset contained 1479 Tweets posted by 256 different accounts. 841 of those were RTs. The text of the Tweets composed a corpus with 26,094 total words and 3,057 unique word forms.

I used Voyant’s Terms tool to get the most frequent terms, applying an edited English stop words list that included Twitter and congress-specific terms (this means that words expected to be frequent like ‘digital’, ‘humanities’, ‘congress’, ‘sheffield’, as well as usernames, project’s names and people’s names were filtered out). I exported a list of 500 most frequent terms and then I manually refined the data so remaining people or project’s names were removed. (This is not case sensitive so I may have made mistakes and further disambiguation and refining would be required). If you are interested I previously detailed a similar methodology here.

Here’s my resulting list of the 100 most frequent terms.

Term Count
great

106

project

98

data

76

research

64

students

63

word

58

funding

55

work

55

just

53

spread

53

use

51

opportunity

48

text

47

historical

46

oa

46

looking

45

open

45

editions

44

pedagogy

40

academic

38

access

36

keynote

36

like

36

analysis

35

follow

35

using

34

book

33

new

33

projects

33

university

33

important

32

innovation

32

today

32

tomorrow

32

early

31

minimal

31

paper

31

south

31

content

30

excellent

30

love

30

social

30

look

29

talking

29

tools

29

discussing

28

global

28

grants

28

london

28

network

28

review

28

forward

27

libraries

27

resources

27

sudan

27

history

26

talk

26

books

25

online

25

programme

25

really

25

teach

25

teaching

25

digitisation

24

issues

24

tactical

24

archive

23

critique

23

make

23

different

22

need

22

peer

22

session

22

cultural

21

heritage

21

starts

21

studies

21

value

21

art

20

cool

20

don’t

20

good

20

live

20

press

20

start

20

arts

19

available

19

colleagues

19

delegates

19

going

19

metadata

19

presenting

19

day

18

digitised

18

let’s

18

networks

18

notes

18

person

18

started

18

begins

17

Please bear in mind that RTs count as Tweets and therefore the repetition implicit in RTs affects directly the frequent term counts. What terms made it into the top 100 reflects my own bias (I personally didn’t want to see how many times ‘digital’ or ‘humanities’ was repeated), but individual trend counts remain the same regardless.

I appreciate the stop words selection is indeed subjective (deictics like ‘tomorrow’ or ‘today’ may very well mean very little).  It’s up to the reader to judge if such a listing offers any insights at all; as Twitter moves relentlessly and as such data remains a moving a target, I’d like to believe that collecting and looking into frequent terms offers at least another point of view if not gateway into how a particular academic event is represented/discussed/reported on Twitter. Perhaps it’s my enjoyment of poetry that makes me think that seeing words out of context (or recontextualised) like this can offer some kind of food for thought or creativity.

Interestingly the dataset showed user_lang metadata other than en or en-GB: de, es, fr, it, nl and ru were also present even if in minority. The dataset also showed that some sources are clearly identified as bots.

I am fully aware this would be more interesting and useful if there were opportunities for others to replicate the text analysis through access to the source dataset I used. There are lots of interesting types of analysis that could be run and data to focus on in such a dataset as this. I am simply sharing this post right now as a quick indicative update after the event concluded.

 

 

Libraries! Most Frequent Terms in #WLIC2016 Tweets (part IV)

IFLA World Library and Information Congress 82nd IFLA General Conference and Assembly 13–19 August 2016, Columbus, Ohio, USA
IFLA World Library and Information Congress
82nd IFLA General Conference and Assembly
13–19 August 2016, Columbus, Ohio, USA. Copyright by IFLA, CC BY 4.0.

 


 

This is part IV. For necessary context, methodology, limitations, please see here (part 1),  here (part 2), and here (part 3).

Since this was published and shared for the first time I may have done new edits. I often come back to posts once they have been published to revise them.

Throughout the process of performing the day by day text analysis I became aware of other limitations to take into account and I have revised part 3 accordingly.

Summary

Here’s a summary of the counts of the source (unrefined) #WLIC2016 archive I collected:

Number of Links

12435

Number of RTs estimate based on occurrence of RT

14570

Number of Tweets

23552

Unique Tweets <-used to monitor quality of archive

23421

First Tweet in Archive 14/08/2016 11:29:03 EDT
Last Tweet in Archive 22/08/2016 04:20:53 EDT
In Reply Ids

270

In Reply @s

429

Number of Tweeters

3035

As previously indicated the Tweet count includes RTs. This count might require further deduplication and it might include bots’ Tweets and possibly some unrelated Tweets.

Here’s a summary of the Tweet count of the #WLIC2016  dataset I refined from the complete archive. As I explained in part 3 I organised the Tweets into conference days, from Sunday 14 to Thursday 18 August. Each day was a different corpus to analyse. I also analysed the whole set as a single corpus to ensure the totals replicated.

Day Tweet count
Sunday 14 August 2016

2543

Monday 15 August 2016

6654

Tuesday 16 August 2016

4861

Wednesday 17 August 2016

4468

Thursday 18 August 2016

3801

Thursday – Sunday

22327

 

The Most Frequent Terms

The text analysis involved analysing each corpus, first obtaining a ‘raw’ output of 300 most frequent terms and their counts. As described in previous posts, I then applied an edited English stop words list followed by a manual editing of the top 100 most frequent terms (for the shared dataset) and of the top 50 for this post. Unlike before in this case I removed ‘barack’ and ‘obama’ from Thursday and Monday’s corpora, and tried to remove usernames and hashtags though it’s posssible that further disambiguation and refining might be needed in those top 100 and top 50.

The text analysis of the Sun-Thu Tweets as a single corpus gave us the following Top 50:

#WLIC2016 Sun-Thu Top 50 Most Frequent Terms (stop-words applied; edited)

Rank

Term Count

1

libraries

2895

2

library

2779

3

librarians

1713

4

session

1467

5

access

872

6

world

832

7

public

774

8

copyright

766

9

people

757

10

need

750

11

data

746

12

make

733

13

privacy

674

14

digital

629

15

new

615

16

wikipedia

602

17

indigenous

593

18

use

574

19

information

555

20

great

539

21

knowledge

512

22

literacy

502

23

internet

481

24

work

428

25

thanks

419

26

message

416

27

future

412

28

change

379

29

social

378

30

open

369

31

just

354

32

research

353

33

know

330

34

community

323

35

important

319

36

oclc

317

37

collections

312

38

books

300

39

learn

300

40

opening

291

41

read

289

42

impact

287

43

place

282

44

good

280

45

services

277

46

national

276

47

best

272

48

latest

269

49

report

267

50

users

266

As mentioned above I also analysed each day as a single corpus. I refined the ‘raw’ 300 most frequent terms per day to a top 100 after stop words and manual editing. I then laid them all out as a single table for comparison.

#WLIC2016 Top 50 Most Frequent Terms per Day Comparison (stop-words applied; edited)

Rank

Sun 14 Aug

Mon 15 Aug

Tue 16 Aug

Wed 17 Aug

Thu 18 Aug

1

libraries library library libraries libraries

2

library libraries privacy library library

3

librarians librarians libraries librarians librarians

4

session session librarians indigenous public

5

access copyright session session session

6

world wikipedia people knowledge need

7

public digital data access data

8

copyright make indigenous data impact

9

people world make literacy new

10

need internet access need digital

11

data access wikipedia great world

12

make new use people thanks

13

privacy need information research access

14

digital use world public value

15

new public public new national

16

wikipedia future knowledge marketing change

17

indigenous people copyright general privacy

18

use message homeless open great

19

information collections literacy world work

20

great information oclc archives research

21

knowledge content great just use

22

literacy open homelessness national people

23

internet report need assembly knowledge

24

work space freedom place social

25

thanks trend like make using

26

message great thanks read know

27

future net internet community make

28

change work info social services

29

social neutrality latest reading skills

30

open making experiencing work award

31

just update theft information information

32

research books important use learning

33

know collection just learn users

34

community social subject share book

35

important design change matters user

36

oclc data guidelines key best

37

collections thanks digital know collections

38

books librarian students global academic

39

learn know know government measure

40

opening shaping online life poland

41

read google protect thanks community

42

impact change working important learn

43

place literacy statement development outcomes

44

good just work love share

45

services technology future impact time

46

national online read archivist media

47

best poster award good section

48

latest info create books important

49

report working services cultural service

50

users law good help closing

I have shared on figshare a datset containing the summaries above as well as the raw top 300 most frequent terms for the whole set as well as divided per day. The dataset also includes the top 100 most frequent terms lists per day that I  manually edited after having applied the edited English stop word filter.

You can download the spreadsheet from figshare:

Priego, Ernesto (2016): #WLIC2016 Most Frequent Terms Roundup. figshare.
https://dx.doi.org/10.6084/m9.figshare.3749367.v2

Please bear in mind that as refining was done manually and the Terms tool does not always seem to apply stop words evenly there might be errors. This is why the raw output was shared as well. This data should be taken to be indicative only.

As it is increasingly recommended for data sharing, the CC-0 license has been applied to the resulting output in the repository. It is important however to bear in mind that some terms appearing in the dataset might be licensed individually differently; copyright of the source Tweets -and sometimes of individual terms- belongs to their authors.  Authorial/curatorial/collection work has been performed on the shared file as a curated dataset resulting from analysis, in order to make it available as part of the scholarly record. If this dataset is consulted attribution is always welcome.

Ideally for proper reproducibility and to encourage other studies the whole archive dataset should be available.  Those wishing to obtain the whole Tweets should still be able to get them themselves via text and data mining methods.

Conclusions

Indeed, for us today there is absolutely nothing surprising about the term ‘libraries’ being the most frequent word in Tweets coming from IFLA’s World Library and Information Congress. Looking at the whole dataset, however, provides an insight into other frequent terms used by Library and Information professionals in the context of libraries. These terms might not remain frequent for long, and might not have been frequent words in the past (I can only hypothesise– having evidence would be nice).

A key hypothesis for me guiding this exercise has been that perhaps by looking at the words appearing in social media outputs discussing and reporting from a professional association’s major congress, we can get a vague idea of where a sector’s concerns are/were.

I guess it can be safely said that words become meaningful in context. In an age in which repetition and frequency are key to public constructions of cultural relevance (‘trending topics’ increasingly define the news agenda… and what people talk about and how they talk about things) the repetition and frequency of key terms might provide a type of meaningful evidence in itself.  Evidence, however, is just the beginning– further interpretation and analysis must indeed follow.

One cannot obtain the whole picture from decomposing a collectively, socially, publicly created textual corpus (or perhaps any corpus, unless it is a list of words from the start) into its constituent parts. It could also be said that many tools and methods often tell us more about themselves (and those using them) than about the objects of study.

So far text analysis (Rockwell 2003) and ‘distant reading’ through automated methods has focused on working with books (Ramsay 2014). However I’d like to suggest that this kind of text analysis can be another way of reading social media texts and offer another way to contribute to the assessment of their cultural relevance as living documents of a particular setting and moment in time. Who knows, they might also be telling us something about the present perception and activity of a professional field- and might help us to compare it with those in the future.

Other Considerations

Both research and experience show that the Twitter search API is not 100% reliable. Large Tweet volumes affect the search collection process. The API might “over-represent the more central users”, not offering “an accurate picture of peripheral activity” (González-Bailon, Sandra, et al, 2012).

Apart from the filters and limitations already declared, it cannot be guaranteed that each and every Tweet tagged with #WLIC2016 during the indicated period was analysed. The dataset was shared for archival, comparative and indicative educational research purposes only.

Only content from public accounts, obtained from the Twitter Search API, was analysed.  The source data is also publicly available to all Twitter users via the Twitter Search API and available to anyone with an Internet connection via the Twitter and Twitter Search web client and mobile apps without the need of a Twitter account.

These posts and the resulting dataset contain the results of analyses of Tweets that were published openly on the Web with the queried hashtag; the content of the Tweets is responsibility of the original authors. Original Tweets are likely to be copyright their individual authors but please check individually.

This work is shared to archive, document and encourage open educational research into scholarly activity on Twitter. The resulting dataset does not contain complete Tweets nor Twitter metadata. No private personal information was shared. The collection, analysis and sharing of the data has been enabled and allowed by Twitter’s Privacy Policy. The sharing of the results complies with Twitter’s Developer Rules of the Road.

A hashtag is metadata users choose freely to use so their content is associated, directly linked to and categorised with the chosen hashtag. The purpose and function of hashtags is to organise and describe information/outputs under the relevant label in order to enhance the discoverability of the labeled information/outputs (Tweets in this case). Tweets published publicly by scholars or other professionals during academic conferences are often publicly tagged (labeled) with a hashtag dedicated to the conference in question. This practice used to be the confined to a few ‘niche’ fields; it is increasingly becoming the norm rather than the exception.

Though every reason for Tweeters’ use of hashtags cannot be generalised nor predicted, it can be argued that scholarly Twitter users form specialised, self-selecting public professional networks that tend to observe scholarly practices and accepted modes of social and professional behaviour.

In general terms it can be argued that scholarly Twitter users willingly and consciously tag their public Tweets with a conference hashtag as a means to network and to promote, report from, reflect on, comment on and generally contribute publicly to the scholarly conversation around conferences. As Twitter users, conference Twitter hashtag contributors have agreed to Twitter’s Privacy and data sharing policies.

Professional associations like the Modern Language Association and the American Pyschological Association recognise Tweets as citeable scholarly outputs. Archiving scholarly Tweets is a means to preserve this form of rapid online scholarship that otherwise can very likely become unretrievable as time passes; Twitter’s search API has well-known temporal limitations for retrospective historical search and collection.

Beyond individual Tweets as scholarly outputs, the collective scholarly activity on Twitter around a conference or academic project or event can provide interesting insights for the contemporary history of scholarly communications. Though this work has limitations and might not be thoroughly systematic, it is hoped it can contribute to developing new insights into a discipline’s public concerns as expressed on Twitter over time.

References

González-Bailon, Sandra and Wang, Ning and Rivero, Alejandro and Borge-Holthoefer, Javier and Moreno, Yamir, Assessing the Bias in Samples of Large Online Networks (December 4, 2012).  Available at SSRN: http://dx.doi.org/10.2139/ssrn.2185134

Priego, Ernesto (2016) #WLIC2016 Most Frequent Terms Roundup. figshare.
https://dx.doi.org/10.6084/m9.figshare.3749367.v2

Ramsay, Stephen (2014) “The Hermeneutics of Screwing Around; or What You Do with a Million Books.” In Pastplay: Teaching and Learning History with Technology, edited by Kevin Kee, 111-20. Ann Arbor: University of Michigan Press, 2014. Also available at http://quod.lib.umich.edu/d/dh/12544152.0001.001/1:5/–pastplay-teaching-and-learning-history-with-technology?g=dculture;rgn=div1;view=fulltext;xc=1

Rockwell, Geoffrey (2003) “What is Text Analysis, Really? [PDF]” preprint, Literary and Linguistic Computing, vol. 18, no. 2, 2003, p. 209-219.

What’s in a Word? Most Frequent Terms in #WLIC2016 Tweets (part III)

IFLA World Library and Information Congress 82nd IFLA General Conference and Assembly 13–19 August 2016, Columbus, Ohio, USA
IFLA World Library and Information Congress
82nd IFLA General Conference and Assembly
13–19 August 2016, Columbus, Ohio, USA. Copyright by IFLA, CC BY 4.0.

This is part three. For necessary context please start here (part 1) and here (part 2). The final, fourth part is here.

It’s Friday already and the sessions from IFLA’s WLIC 2016 have finished. I’d like to finish what I started and complete a roundup of my quick (but in practice not-so-quick) collection and text analysis of a sample of #WLIC2016 Tweets. My intention is to finish this with a fourth and final blog post following this one and to share a dataset on figshare as soon as possible.

As previously I customised the spreadsheet settings to collect only Tweets from accounts with at least one follower and to reflect the Congress’ location and time zone. Before exporting as CSV I did a basic automated deduplication, but I did not do any further data refining (which means that non-relevant or spam Tweets may be included in the dataset).

What follows is a basic quantitative summary of the initial complete sample dataset:

  • Total Tweets: 22,540 Tweets (includes RTs)
  • First Tweet in complete sample dataset: Sunday 14/08/2016 11:29:03 EDT
  • Last Tweet in complete sample dataset: Friday 19/08/2016 04:20:43 EDT
  • Number of links:  11,676
  • Number of RTs:    13,859
  • Number of usernames: 2,811

The Congress had activities between Friday 12 August and Friday 19 August, but sessions between Sunday 14 August and Thursday 18 August. Ideally I would have liked to collect Tweets from the early hours of Sunday 14 August but I started collecting late so the earliest I got to was 11:29:03 EDT. I suppose at least it was before the first panel sessions started. For more context re: timings: see the Congress outline.

I refined the complete dataset to include only the days that featured panel sessions, and I have organised the data in a different sheet per day for individual analysis. I have also created a table detailing the Tweet counts per Congress sessions day. [Later I realised that though I had the metadata for the Columbus Ohio time zone I ended up organising the data into GMT/BST days. There is a 5 hours difference but the collected Tweets per day still roughly correspond to the timings of the conference. Of course many will have participated in the hashtag remotely –not present at the event– and many present will have tweeted not synchronically (‘live’).  I don’t think this makes much of a difference (no pun intended) to the analysis, but it’s something I was aware of and that others may or not want to consider as a limitation.

Tweets collected per day

Day Tweet count
Sunday 14 August 2016

2543

Monday 15 August 2016

6654

Tuesday 16 August 2016

4861

Wednesday 17 August 2016

4468

Thursday 18 August 2016

3801

Total Tweets in refined dataset: 22, 327 Tweets.

(Always bear in mind these figures reflect the Tweets in the collected dataset, it does not mean that as a fact that was the total number of Tweets published with the hashtag during that period. Not only does the settings of my querying affects the results; Twitter’s search API also has limitations and cannot be assumed to always return the same type or number of results).

I am still in the process of analysing the dataset. There are of course multiple types of analyses that one could do with this data but bear in mind that in this case I have only focused on using text analysis to obtain the most frequent terms in the text from the Tweets tagged with #WLIC2016 that I collected.

As before, in this case I am using the Terms tool from Voyant Tools to perform a basic text analysis in order to identify number of total words and unique word forms and most frequent terms per day; in other words, the data from each day became an individual corpus. (The complete refined dataset including all collected days could be analysed as a single corpus as well for comparison). I am gradually exporting and collecting the ‘raw’ output from the Terms tool per day, so that once I have finsihed applying the stop words to each corpus this output can be compared and so that it could be reproduced with other stop word lists if desired.

As before I am useing the English stop word list which I edited previously to include Twitter-specific terms (e.g. t.co, amp, https), as well as dataset-specific terms (e.g. the Congress’ Twitter account, related hashtags etc), but this time what I did differently is that I included all the 2,811 account usernames in the complete dataset so they would be excluded from the most frequent terms. These are the usernames from accounts with Tweets in the dataset, but other usernames (that were mentioned in Tweets’ text but that did not Tweet themselves with the hashtag) were logically not filtered, so whenever easily identifiable I am painstakingly removing them (manually!) from the remaining list. I am sure there most be a more effective way of doing this but I find the combination of ‘distant’ (automated) editing and ‘close’ (manual) editing interesting and fun.

I am using the same edited stop word list for each analysis. In this case I have also manually removed non-English terms (mostly pronouns, articles). Needless to say I did this not because I didn’t think they were relevant (quite the opposite) but because even though they had a presence they were not fairly comparable to the overwhelming majority of English terms (a ranking of most frequent non-English terms would be needed). As I will also have shared the unedited, ‘raw’ top most frequent terms in the dataset, anyone wishing to look into the non-English terms could ideally do so and run their own analyses without my own subjective stop word list and editing getting in the way. I tried to be as systematic as possible but disambiguation would be needed (the Terms tool is case and context insensitive, so a term could have been a proper name, or a username, and to be consistent I should have removed those too. Again, having the raw list would allow others to correct any filtering/curation/stop word mistakes).

I am aware there are way more sophisticaded methods of dealing with this data. Personally, doing this type of simple data collection and text analysis is an exercise and an interrogation of data collection and analysis methods and tools as reflective practices. An hypothesis behind it is that the terms a community or discipline uses (and retweets) do say something about those communities or disciplines, at least for a particular moment in time and a particular place in particular settings. Perhaps it also says things about the medium used to express those terms. When ‘screwing around‘ with texts it may be unavoidable to wonder what there is to it beyond ‘bean-counting’ (what’s in a word? what’s in a frequent term?), and what there is to social media and academic/professional live-tweeting that can or cannot be quantified. Doing this type of work makes me reflect as well about my own limitations, the limits of text analysis tools, the appropriateness of tools, the importance of replication and reproducibility and the need to document and to share what has been documented.

I’m also thinking about documentation and the open sharing of data outputs as messages in bottles, or as it has been said of metadata as ‘letters to the future’. I’m aware that this may also seem like navel-gazing of little interest outside those associated to the event in question. I would say that the role of libraries in society at large is more crucial and central than many outside the library and information sector may think (but that’s a subject for another time). Perhaps one day in the future it might be useful to look back at what we were talking about in 2016 and what words we used to talk about it. (Look, we were worried about that!) Or maybe no one cares and no one will care, or by then it will be possible to retrieve anything anywhere with great degrees of relevance and precision (including critical interpretation). In the meanwhile,  I will keep refining these lists and will share the output as soon as I can.

Next… the results!

The final, fourth part is here.

Most Frequent Terms in #WLIC2016 Tweets (part II)

IFLA World Library and Information Congress 82nd IFLA General Conference and Assembly 13–19 August 2016, Columbus, Ohio, USA
IFLA World Library and Information Congress
82nd IFLA General Conference and Assembly
13–19 August 2016, Columbus, Ohio, USA. Copyright by IFLA, CC BY 4.0.

 

The first part of this series provides necessary context.

I have now an edited list of the top 50 most frequent terms extracted from a cleaned dataset comprised of 10,721 #WLIC2016 Tweets published by 1,760 unique users between Monday 15/08/2016 10:11:08 EDT and Wednesday 17/08/2016 07:16:35 EDT.

The analysed corpus contained the raw text of the Tweets (includes RTs), comprising 185,006 total words and 12,418 unique word forms.

Stop words were applied as detailed in the first part of this series, and the resulting list (a raw list of 300 most frequent terms) was further edited to remove personal names, personal Twitter user names, common hashtags, etc.  Some organisational Twitter user names were not removed from the list, as an indication of their ‘centrality’ in the network based on the frequency with which they appeared in the corpus.

So here’s an edited list of the top 50 most frequent terms from the dataset described above:

Term Count
library

1379

libraries

1102

librarians

811

session

715

privacy

555

wikipedia

523

make

484

copyright

465

people

428

digital

378

access

375

use

362

public

340

data

322

need

319

iflabuild2016

308

world

308

information

298

internet

289

new

272

great

259

indigenous

255

iflatrends

240

report

202

knowledge

200

future

187

work

187

libraryfreedom

184

literacy

184

space

180

change

178

thanks

172

oclc

171

open

170

just

169

books

168

trend

165

important

162

info

162

know

162

social

161

net

159

neutrality

159

wikilibrary

158

collections

157

working

157

librarian

154

online

154

making

149

guidelines

148

Is this interesting? Is it useful? I don’t know, but I’ve enjoyed documenting it. Reflecting about different criteria to apply stop words and clean, refine terms has also been interesting.

I guess that deep down I believe it’s better to document than not to, even if we may think there should be other ways of doing it (otherwise I wouldn’t even try to do it). Value judgements about the utility or insightfulness of specific data in specific ways is an a posteriori process.

I hope to be able to continue collecting data and once the congress/conference ends I hope to be able to share a dataset with the raw (unedited, unfiltered) most frequent terms in the text from Tweets published with the event’s hashtag. If there’s anyone else interested they could clean, curate and analyse the data in different ways (wishful thinking but hey; it’s hope what guides us.).

What Library Folk Live Tweet About: Most Frequent Terms in #WLIC2016 Tweets

IFLA World Library and Information Congress 82nd IFLA General Conference and Assembly 13–19 August 2016, Columbus, Ohio, USA
IFLA World Library and Information Congress. Logo copyright by IFLA, CC BY 4.0.

Part 2 is  here, part 3  here and the final, fourth part is here.

IFLA stands for The International Federation of Library Associations and Institutions.

The IFLA World Library and Information Congress 2016 and 2nd IFLA General Conference and Assembly, ‘Connections. Collaboration. Community’ is currently taking place (13–19 August 2016) at the Greater Columbus Convention Center (GCCC) in Columbus, Ohio, United States.

The official hashtag of the conference is #WLIC2016. Earlier, I shared a searchable, live archive of the hashtag here. (Page may be slow to load depending on bandwidth).

I have looked at the text from 4,945 Tweets published with #WLIC2016 from 14/08/2016 to 15/08/2016 11:16:06 (EDT, Columbus Ohio time). Only accounts with at least 1 follower were included. I collected them with Martin Hawksey’s TAGS.

According to Voyant Tools this corpus had 82,809 total words and 7,506 unique word forms.

I applied an English stop word list which I edited to include Twitter-specific terms (https, t.co, amp (&) etc.), proper names (Barack Obama, other personal usernames) and some French stop words (mainly personal pronouns). I also edited the stop word list to include some dataset-specific terms such as the conference hashtag and other common hashtags, ‘ifla’, etc. (I left others that could also be considered dataset-specific terms, such as ‘session’ though).

The result was a listing of of 800 frequent terms (the least frequent terms in the list had been repeated 5 times). I then cleaned the data from any dataset-specific stop words that the stop word list did not filter and created an edited ordered listing of the most frequent 50 terms. I left in organisations’ Twitter user names (including @potus), as well as other terms that may not seem that meaningful  on their own (but who knows, they may be).

It must be taken into account the corpus included Retweets; each RT counted as a single Tweet, even if that meant terms were being logically repeated. This means that term counts in the list reflect the fact the dataset contains Retweets (which obviously implies the repetition of text).

If for some reason you are curious about what the most frequent words in #WLIC2016 Tweets were during this initial period (see above), here’s the top 50:

Term Count
libraries

543

copyright

517

librarians

484

library

406

session

374

world

326

message

271

opening

249

access

226

make

204

digital

195

internet

162

future

161

information

157

new

146

use

141

people

138

president

131

potus

125

literacy

118

need

117

oclc

114

ceremony

113

dpla

109

poster

105

thanks

103

collections

102

public

100

delegates

99

cilipinfo

98

countries

95

iflatrends

95

google

93

shaping

91

work

89

drag

83

report

83

create

81

open

81

data

79

content

78

learn

78

latest

77

making

77

fight

76

ifla_arl

75

read

74

info

73

exceptions

69

great

68

So for what it’s worth those were the 5o most frequent terms in the corpus.

I, for one, not being present in the Congress, found it interesting that ‘copyright’ is the second most frequent term, following ‘libraries’. One notices also that, unsurprisingly, the listing of top most frequent terms includes some key terms (such as ‘access’, ‘internet’, ‘digital’, ‘open’, ‘data’) concerning Library and Information professionals of late.

Were these the terms you’d have expected to make a ‘top 50’ in almost 5,000 Tweets from this initial phase of this particular conference?

The conference hasn’t finished yet of course. But so far, for a libraries and information world congress, which terms would you say are noticeable by their absence in this list? ;-)

Part 2 is  here, part 3  here and the final, fourth part is here.