Libraries! Most Frequent Terms in #WLIC2016 Tweets (part IV)

IFLA World Library and Information Congress 82nd IFLA General Conference and Assembly 13–19 August 2016, Columbus, Ohio, USA
IFLA World Library and Information Congress
82nd IFLA General Conference and Assembly
13–19 August 2016, Columbus, Ohio, USA. Copyright by IFLA, CC BY 4.0.

 


 

This is part IV. For necessary context, methodology, limitations, please see here (part 1),  here (part 2), and here (part 3).

Since this was published and shared for the first time I may have done new edits. I often come back to posts once they have been published to revise them.

Throughout the process of performing the day by day text analysis I became aware of other limitations to take into account and I have revised part 3 accordingly.

Summary

Here’s a summary of the counts of the source (unrefined) #WLIC2016 archive I collected:

Number of Links

12435

Number of RTs estimate based on occurrence of RT

14570

Number of Tweets

23552

Unique Tweets <-used to monitor quality of archive

23421

First Tweet in Archive 14/08/2016 11:29:03 EDT
Last Tweet in Archive 22/08/2016 04:20:53 EDT
In Reply Ids

270

In Reply @s

429

Number of Tweeters

3035

As previously indicated the Tweet count includes RTs. This count might require further deduplication and it might include bots’ Tweets and possibly some unrelated Tweets.

Here’s a summary of the Tweet count of the #WLIC2016  dataset I refined from the complete archive. As I explained in part 3 I organised the Tweets into conference days, from Sunday 14 to Thursday 18 August. Each day was a different corpus to analyse. I also analysed the whole set as a single corpus to ensure the totals replicated.

Day Tweet count
Sunday 14 August 2016

2543

Monday 15 August 2016

6654

Tuesday 16 August 2016

4861

Wednesday 17 August 2016

4468

Thursday 18 August 2016

3801

Thursday – Sunday

22327

 

The Most Frequent Terms

The text analysis involved analysing each corpus, first obtaining a ‘raw’ output of 300 most frequent terms and their counts. As described in previous posts, I then applied an edited English stop words list followed by a manual editing of the top 100 most frequent terms (for the shared dataset) and of the top 50 for this post. Unlike before in this case I removed ‘barack’ and ‘obama’ from Thursday and Monday’s corpora, and tried to remove usernames and hashtags though it’s posssible that further disambiguation and refining might be needed in those top 100 and top 50.

The text analysis of the Sun-Thu Tweets as a single corpus gave us the following Top 50:

#WLIC2016 Sun-Thu Top 50 Most Frequent Terms (stop-words applied; edited)

Rank

Term Count

1

libraries

2895

2

library

2779

3

librarians

1713

4

session

1467

5

access

872

6

world

832

7

public

774

8

copyright

766

9

people

757

10

need

750

11

data

746

12

make

733

13

privacy

674

14

digital

629

15

new

615

16

wikipedia

602

17

indigenous

593

18

use

574

19

information

555

20

great

539

21

knowledge

512

22

literacy

502

23

internet

481

24

work

428

25

thanks

419

26

message

416

27

future

412

28

change

379

29

social

378

30

open

369

31

just

354

32

research

353

33

know

330

34

community

323

35

important

319

36

oclc

317

37

collections

312

38

books

300

39

learn

300

40

opening

291

41

read

289

42

impact

287

43

place

282

44

good

280

45

services

277

46

national

276

47

best

272

48

latest

269

49

report

267

50

users

266

As mentioned above I also analysed each day as a single corpus. I refined the ‘raw’ 300 most frequent terms per day to a top 100 after stop words and manual editing. I then laid them all out as a single table for comparison.

#WLIC2016 Top 50 Most Frequent Terms per Day Comparison (stop-words applied; edited)

Rank

Sun 14 Aug

Mon 15 Aug

Tue 16 Aug

Wed 17 Aug

Thu 18 Aug

1

libraries library library libraries libraries

2

library libraries privacy library library

3

librarians librarians libraries librarians librarians

4

session session librarians indigenous public

5

access copyright session session session

6

world wikipedia people knowledge need

7

public digital data access data

8

copyright make indigenous data impact

9

people world make literacy new

10

need internet access need digital

11

data access wikipedia great world

12

make new use people thanks

13

privacy need information research access

14

digital use world public value

15

new public public new national

16

wikipedia future knowledge marketing change

17

indigenous people copyright general privacy

18

use message homeless open great

19

information collections literacy world work

20

great information oclc archives research

21

knowledge content great just use

22

literacy open homelessness national people

23

internet report need assembly knowledge

24

work space freedom place social

25

thanks trend like make using

26

message great thanks read know

27

future net internet community make

28

change work info social services

29

social neutrality latest reading skills

30

open making experiencing work award

31

just update theft information information

32

research books important use learning

33

know collection just learn users

34

community social subject share book

35

important design change matters user

36

oclc data guidelines key best

37

collections thanks digital know collections

38

books librarian students global academic

39

learn know know government measure

40

opening shaping online life poland

41

read google protect thanks community

42

impact change working important learn

43

place literacy statement development outcomes

44

good just work love share

45

services technology future impact time

46

national online read archivist media

47

best poster award good section

48

latest info create books important

49

report working services cultural service

50

users law good help closing

I have shared on figshare a datset containing the summaries above as well as the raw top 300 most frequent terms for the whole set as well as divided per day. The dataset also includes the top 100 most frequent terms lists per day that I  manually edited after having applied the edited English stop word filter.

You can download the spreadsheet from figshare:

Priego, Ernesto (2016): #WLIC2016 Most Frequent Terms Roundup. figshare.
https://dx.doi.org/10.6084/m9.figshare.3749367.v2

Please bear in mind that as refining was done manually and the Terms tool does not always seem to apply stop words evenly there might be errors. This is why the raw output was shared as well. This data should be taken to be indicative only.

As it is increasingly recommended for data sharing, the CC-0 license has been applied to the resulting output in the repository. It is important however to bear in mind that some terms appearing in the dataset might be licensed individually differently; copyright of the source Tweets -and sometimes of individual terms- belongs to their authors.  Authorial/curatorial/collection work has been performed on the shared file as a curated dataset resulting from analysis, in order to make it available as part of the scholarly record. If this dataset is consulted attribution is always welcome.

Ideally for proper reproducibility and to encourage other studies the whole archive dataset should be available.  Those wishing to obtain the whole Tweets should still be able to get them themselves via text and data mining methods.

Conclusions

Indeed, for us today there is absolutely nothing surprising about the term ‘libraries’ being the most frequent word in Tweets coming from IFLA’s World Library and Information Congress. Looking at the whole dataset, however, provides an insight into other frequent terms used by Library and Information professionals in the context of libraries. These terms might not remain frequent for long, and might not have been frequent words in the past (I can only hypothesise– having evidence would be nice).

A key hypothesis for me guiding this exercise has been that perhaps by looking at the words appearing in social media outputs discussing and reporting from a professional association’s major congress, we can get a vague idea of where a sector’s concerns are/were.

I guess it can be safely said that words become meaningful in context. In an age in which repetition and frequency are key to public constructions of cultural relevance (‘trending topics’ increasingly define the news agenda… and what people talk about and how they talk about things) the repetition and frequency of key terms might provide a type of meaningful evidence in itself.  Evidence, however, is just the beginning– further interpretation and analysis must indeed follow.

One cannot obtain the whole picture from decomposing a collectively, socially, publicly created textual corpus (or perhaps any corpus, unless it is a list of words from the start) into its constituent parts. It could also be said that many tools and methods often tell us more about themselves (and those using them) than about the objects of study.

So far text analysis (Rockwell 2003) and ‘distant reading’ through automated methods has focused on working with books (Ramsay 2014). However I’d like to suggest that this kind of text analysis can be another way of reading social media texts and offer another way to contribute to the assessment of their cultural relevance as living documents of a particular setting and moment in time. Who knows, they might also be telling us something about the present perception and activity of a professional field- and might help us to compare it with those in the future.

Other Considerations

Both research and experience show that the Twitter search API is not 100% reliable. Large Tweet volumes affect the search collection process. The API might “over-represent the more central users”, not offering “an accurate picture of peripheral activity” (González-Bailon, Sandra, et al, 2012).

Apart from the filters and limitations already declared, it cannot be guaranteed that each and every Tweet tagged with #WLIC2016 during the indicated period was analysed. The dataset was shared for archival, comparative and indicative educational research purposes only.

Only content from public accounts, obtained from the Twitter Search API, was analysed.  The source data is also publicly available to all Twitter users via the Twitter Search API and available to anyone with an Internet connection via the Twitter and Twitter Search web client and mobile apps without the need of a Twitter account.

These posts and the resulting dataset contain the results of analyses of Tweets that were published openly on the Web with the queried hashtag; the content of the Tweets is responsibility of the original authors. Original Tweets are likely to be copyright their individual authors but please check individually.

This work is shared to archive, document and encourage open educational research into scholarly activity on Twitter. The resulting dataset does not contain complete Tweets nor Twitter metadata. No private personal information was shared. The collection, analysis and sharing of the data has been enabled and allowed by Twitter’s Privacy Policy. The sharing of the results complies with Twitter’s Developer Rules of the Road.

A hashtag is metadata users choose freely to use so their content is associated, directly linked to and categorised with the chosen hashtag. The purpose and function of hashtags is to organise and describe information/outputs under the relevant label in order to enhance the discoverability of the labeled information/outputs (Tweets in this case). Tweets published publicly by scholars or other professionals during academic conferences are often publicly tagged (labeled) with a hashtag dedicated to the conference in question. This practice used to be the confined to a few ‘niche’ fields; it is increasingly becoming the norm rather than the exception.

Though every reason for Tweeters’ use of hashtags cannot be generalised nor predicted, it can be argued that scholarly Twitter users form specialised, self-selecting public professional networks that tend to observe scholarly practices and accepted modes of social and professional behaviour.

In general terms it can be argued that scholarly Twitter users willingly and consciously tag their public Tweets with a conference hashtag as a means to network and to promote, report from, reflect on, comment on and generally contribute publicly to the scholarly conversation around conferences. As Twitter users, conference Twitter hashtag contributors have agreed to Twitter’s Privacy and data sharing policies.

Professional associations like the Modern Language Association and the American Pyschological Association recognise Tweets as citeable scholarly outputs. Archiving scholarly Tweets is a means to preserve this form of rapid online scholarship that otherwise can very likely become unretrievable as time passes; Twitter’s search API has well-known temporal limitations for retrospective historical search and collection.

Beyond individual Tweets as scholarly outputs, the collective scholarly activity on Twitter around a conference or academic project or event can provide interesting insights for the contemporary history of scholarly communications. Though this work has limitations and might not be thoroughly systematic, it is hoped it can contribute to developing new insights into a discipline’s public concerns as expressed on Twitter over time.

References

González-Bailon, Sandra and Wang, Ning and Rivero, Alejandro and Borge-Holthoefer, Javier and Moreno, Yamir, Assessing the Bias in Samples of Large Online Networks (December 4, 2012).  Available at SSRN: http://dx.doi.org/10.2139/ssrn.2185134

Priego, Ernesto (2016) #WLIC2016 Most Frequent Terms Roundup. figshare.
https://dx.doi.org/10.6084/m9.figshare.3749367.v2

Ramsay, Stephen (2014) “The Hermeneutics of Screwing Around; or What You Do with a Million Books.” In Pastplay: Teaching and Learning History with Technology, edited by Kevin Kee, 111-20. Ann Arbor: University of Michigan Press, 2014. Also available at http://quod.lib.umich.edu/d/dh/12544152.0001.001/1:5/–pastplay-teaching-and-learning-history-with-technology?g=dculture;rgn=div1;view=fulltext;xc=1

Rockwell, Geoffrey (2003) “What is Text Analysis, Really? [PDF]” preprint, Literary and Linguistic Computing, vol. 18, no. 2, 2003, p. 209-219.

What’s in a Word? Most Frequent Terms in #WLIC2016 Tweets (part III)

IFLA World Library and Information Congress 82nd IFLA General Conference and Assembly 13–19 August 2016, Columbus, Ohio, USA
IFLA World Library and Information Congress
82nd IFLA General Conference and Assembly
13–19 August 2016, Columbus, Ohio, USA. Copyright by IFLA, CC BY 4.0.

This is part three. For necessary context please start here (part 1) and here (part 2). The final, fourth part is here.

It’s Friday already and the sessions from IFLA’s WLIC 2016 have finished. I’d like to finish what I started and complete a roundup of my quick (but in practice not-so-quick) collection and text analysis of a sample of #WLIC2016 Tweets. My intention is to finish this with a fourth and final blog post following this one and to share a dataset on figshare as soon as possible.

As previously I customised the spreadsheet settings to collect only Tweets from accounts with at least one follower and to reflect the Congress’ location and time zone. Before exporting as CSV I did a basic automated deduplication, but I did not do any further data refining (which means that non-relevant or spam Tweets may be included in the dataset).

What follows is a basic quantitative summary of the initial complete sample dataset:

  • Total Tweets: 22,540 Tweets (includes RTs)
  • First Tweet in complete sample dataset: Sunday 14/08/2016 11:29:03 EDT
  • Last Tweet in complete sample dataset: Friday 19/08/2016 04:20:43 EDT
  • Number of links:  11,676
  • Number of RTs:    13,859
  • Number of usernames: 2,811

The Congress had activities between Friday 12 August and Friday 19 August, but sessions between Sunday 14 August and Thursday 18 August. Ideally I would have liked to collect Tweets from the early hours of Sunday 14 August but I started collecting late so the earliest I got to was 11:29:03 EDT. I suppose at least it was before the first panel sessions started. For more context re: timings: see the Congress outline.

I refined the complete dataset to include only the days that featured panel sessions, and I have organised the data in a different sheet per day for individual analysis. I have also created a table detailing the Tweet counts per Congress sessions day. [Later I realised that though I had the metadata for the Columbus Ohio time zone I ended up organising the data into GMT/BST days. There is a 5 hours difference but the collected Tweets per day still roughly correspond to the timings of the conference. Of course many will have participated in the hashtag remotely –not present at the event– and many present will have tweeted not synchronically (‘live’).  I don’t think this makes much of a difference (no pun intended) to the analysis, but it’s something I was aware of and that others may or not want to consider as a limitation.

Tweets collected per day

Day Tweet count
Sunday 14 August 2016

2543

Monday 15 August 2016

6654

Tuesday 16 August 2016

4861

Wednesday 17 August 2016

4468

Thursday 18 August 2016

3801

Total Tweets in refined dataset: 22, 327 Tweets.

(Always bear in mind these figures reflect the Tweets in the collected dataset, it does not mean that as a fact that was the total number of Tweets published with the hashtag during that period. Not only does the settings of my querying affects the results; Twitter’s search API also has limitations and cannot be assumed to always return the same type or number of results).

I am still in the process of analysing the dataset. There are of course multiple types of analyses that one could do with this data but bear in mind that in this case I have only focused on using text analysis to obtain the most frequent terms in the text from the Tweets tagged with #WLIC2016 that I collected.

As before, in this case I am using the Terms tool from Voyant Tools to perform a basic text analysis in order to identify number of total words and unique word forms and most frequent terms per day; in other words, the data from each day became an individual corpus. (The complete refined dataset including all collected days could be analysed as a single corpus as well for comparison). I am gradually exporting and collecting the ‘raw’ output from the Terms tool per day, so that once I have finsihed applying the stop words to each corpus this output can be compared and so that it could be reproduced with other stop word lists if desired.

As before I am useing the English stop word list which I edited previously to include Twitter-specific terms (e.g. t.co, amp, https), as well as dataset-specific terms (e.g. the Congress’ Twitter account, related hashtags etc), but this time what I did differently is that I included all the 2,811 account usernames in the complete dataset so they would be excluded from the most frequent terms. These are the usernames from accounts with Tweets in the dataset, but other usernames (that were mentioned in Tweets’ text but that did not Tweet themselves with the hashtag) were logically not filtered, so whenever easily identifiable I am painstakingly removing them (manually!) from the remaining list. I am sure there most be a more effective way of doing this but I find the combination of ‘distant’ (automated) editing and ‘close’ (manual) editing interesting and fun.

I am using the same edited stop word list for each analysis. In this case I have also manually removed non-English terms (mostly pronouns, articles). Needless to say I did this not because I didn’t think they were relevant (quite the opposite) but because even though they had a presence they were not fairly comparable to the overwhelming majority of English terms (a ranking of most frequent non-English terms would be needed). As I will also have shared the unedited, ‘raw’ top most frequent terms in the dataset, anyone wishing to look into the non-English terms could ideally do so and run their own analyses without my own subjective stop word list and editing getting in the way. I tried to be as systematic as possible but disambiguation would be needed (the Terms tool is case and context insensitive, so a term could have been a proper name, or a username, and to be consistent I should have removed those too. Again, having the raw list would allow others to correct any filtering/curation/stop word mistakes).

I am aware there are way more sophisticaded methods of dealing with this data. Personally, doing this type of simple data collection and text analysis is an exercise and an interrogation of data collection and analysis methods and tools as reflective practices. An hypothesis behind it is that the terms a community or discipline uses (and retweets) do say something about those communities or disciplines, at least for a particular moment in time and a particular place in particular settings. Perhaps it also says things about the medium used to express those terms. When ‘screwing around‘ with texts it may be unavoidable to wonder what there is to it beyond ‘bean-counting’ (what’s in a word? what’s in a frequent term?), and what there is to social media and academic/professional live-tweeting that can or cannot be quantified. Doing this type of work makes me reflect as well about my own limitations, the limits of text analysis tools, the appropriateness of tools, the importance of replication and reproducibility and the need to document and to share what has been documented.

I’m also thinking about documentation and the open sharing of data outputs as messages in bottles, or as it has been said of metadata as ‘letters to the future’. I’m aware that this may also seem like navel-gazing of little interest outside those associated to the event in question. I would say that the role of libraries in society at large is more crucial and central than many outside the library and information sector may think (but that’s a subject for another time). Perhaps one day in the future it might be useful to look back at what we were talking about in 2016 and what words we used to talk about it. (Look, we were worried about that!) Or maybe no one cares and no one will care, or by then it will be possible to retrieve anything anywhere with great degrees of relevance and precision (including critical interpretation). In the meanwhile,  I will keep refining these lists and will share the output as soon as I can.

Next… the results!

The final, fourth part is here.

Most Frequent Terms in #WLIC2016 Tweets (part II)

IFLA World Library and Information Congress 82nd IFLA General Conference and Assembly 13–19 August 2016, Columbus, Ohio, USA
IFLA World Library and Information Congress
82nd IFLA General Conference and Assembly
13–19 August 2016, Columbus, Ohio, USA. Copyright by IFLA, CC BY 4.0.

 

The first part of this series provides necessary context.

I have now an edited list of the top 50 most frequent terms extracted from a cleaned dataset comprised of 10,721 #WLIC2016 Tweets published by 1,760 unique users between Monday 15/08/2016 10:11:08 EDT and Wednesday 17/08/2016 07:16:35 EDT.

The analysed corpus contained the raw text of the Tweets (includes RTs), comprising 185,006 total words and 12,418 unique word forms.

Stop words were applied as detailed in the first part of this series, and the resulting list (a raw list of 300 most frequent terms) was further edited to remove personal names, personal Twitter user names, common hashtags, etc.  Some organisational Twitter user names were not removed from the list, as an indication of their ‘centrality’ in the network based on the frequency with which they appeared in the corpus.

So here’s an edited list of the top 50 most frequent terms from the dataset described above:

Term Count
library

1379

libraries

1102

librarians

811

session

715

privacy

555

wikipedia

523

make

484

copyright

465

people

428

digital

378

access

375

use

362

public

340

data

322

need

319

iflabuild2016

308

world

308

information

298

internet

289

new

272

great

259

indigenous

255

iflatrends

240

report

202

knowledge

200

future

187

work

187

libraryfreedom

184

literacy

184

space

180

change

178

thanks

172

oclc

171

open

170

just

169

books

168

trend

165

important

162

info

162

know

162

social

161

net

159

neutrality

159

wikilibrary

158

collections

157

working

157

librarian

154

online

154

making

149

guidelines

148

Is this interesting? Is it useful? I don’t know, but I’ve enjoyed documenting it. Reflecting about different criteria to apply stop words and clean, refine terms has also been interesting.

I guess that deep down I believe it’s better to document than not to, even if we may think there should be other ways of doing it (otherwise I wouldn’t even try to do it). Value judgements about the utility or insightfulness of specific data in specific ways is an a posteriori process.

I hope to be able to continue collecting data and once the congress/conference ends I hope to be able to share a dataset with the raw (unedited, unfiltered) most frequent terms in the text from Tweets published with the event’s hashtag. If there’s anyone else interested they could clean, curate and analyse the data in different ways (wishful thinking but hey; it’s hope what guides us.).

What Library Folk Live Tweet About: Most Frequent Terms in #WLIC2016 Tweets

IFLA World Library and Information Congress 82nd IFLA General Conference and Assembly 13–19 August 2016, Columbus, Ohio, USA
IFLA World Library and Information Congress. Logo copyright by IFLA, CC BY 4.0.

Part 2 is  here, part 3  here and the final, fourth part is here.

IFLA stands for The International Federation of Library Associations and Institutions.

The IFLA World Library and Information Congress 2016 and 2nd IFLA General Conference and Assembly, ‘Connections. Collaboration. Community’ is currently taking place (13–19 August 2016) at the Greater Columbus Convention Center (GCCC) in Columbus, Ohio, United States.

The official hashtag of the conference is #WLIC2016. Earlier, I shared a searchable, live archive of the hashtag here. (Page may be slow to load depending on bandwidth).

I have looked at the text from 4,945 Tweets published with #WLIC2016 from 14/08/2016 to 15/08/2016 11:16:06 (EDT, Columbus Ohio time). Only accounts with at least 1 follower were included. I collected them with Martin Hawksey’s TAGS.

According to Voyant Tools this corpus had 82,809 total words and 7,506 unique word forms.

I applied an English stop word list which I edited to include Twitter-specific terms (https, t.co, amp (&) etc.), proper names (Barack Obama, other personal usernames) and some French stop words (mainly personal pronouns). I also edited the stop word list to include some dataset-specific terms such as the conference hashtag and other common hashtags, ‘ifla’, etc. (I left others that could also be considered dataset-specific terms, such as ‘session’ though).

The result was a listing of of 800 frequent terms (the least frequent terms in the list had been repeated 5 times). I then cleaned the data from any dataset-specific stop words that the stop word list did not filter and created an edited ordered listing of the most frequent 50 terms. I left in organisations’ Twitter user names (including @potus), as well as other terms that may not seem that meaningful  on their own (but who knows, they may be).

It must be taken into account the corpus included Retweets; each RT counted as a single Tweet, even if that meant terms were being logically repeated. This means that term counts in the list reflect the fact the dataset contains Retweets (which obviously implies the repetition of text).

If for some reason you are curious about what the most frequent words in #WLIC2016 Tweets were during this initial period (see above), here’s the top 50:

Term Count
libraries

543

copyright

517

librarians

484

library

406

session

374

world

326

message

271

opening

249

access

226

make

204

digital

195

internet

162

future

161

information

157

new

146

use

141

people

138

president

131

potus

125

literacy

118

need

117

oclc

114

ceremony

113

dpla

109

poster

105

thanks

103

collections

102

public

100

delegates

99

cilipinfo

98

countries

95

iflatrends

95

google

93

shaping

91

work

89

drag

83

report

83

create

81

open

81

data

79

content

78

learn

78

latest

77

making

77

fight

76

ifla_arl

75

read

74

info

73

exceptions

69

great

68

So for what it’s worth those were the 5o most frequent terms in the corpus.

I, for one, not being present in the Congress, found it interesting that ‘copyright’ is the second most frequent term, following ‘libraries’. One notices also that, unsurprisingly, the listing of top most frequent terms includes some key terms (such as ‘access’, ‘internet’, ‘digital’, ‘open’, ‘data’) concerning Library and Information professionals of late.

Were these the terms you’d have expected to make a ‘top 50’ in almost 5,000 Tweets from this initial phase of this particular conference?

The conference hasn’t finished yet of course. But so far, for a libraries and information world congress, which terms would you say are noticeable by their absence in this list? ;-)

Part 2 is  here, part 3  here and the final, fourth part is here.

 

#citymash: Library and Information Science as Fluid Practice

Arts Emergency badge. Image tweeted by @philgibby
Arts Emergency badge. Image tweeted by @philgibby

The venerable Oxford English Dictionary (online) tells me that part of the definition of the word “fluid” is “having the property of flowing; consisting of particles that move freely among themselves…”; a second definition also includes “flowing or moving readily; not solid or rigid; not fixed…”

These are the parts of the definition that what we’d like to embrace when we say that #citymash, the libraries and technology unconference that #citylis has organised to take place tomorrow Saturday 13 June 2015, will be a “fluid” event. Moreover, the fluidity of #citymash is an expression of a particular understanding of Library and Information Science (LIS) as a discipline, of librarianship as a practice and of information professionals as people.

As my colleagues Lyn Robinson and David Bawden have said in several occasions, LIS has evolved and it is in ongoing evolution. It flows; sometimes it seems it does so dizzyingly fast, others frustratingly slow, but the fact remains that LIS does flow. This fluidity goes beyond the transformations that documents have undergone from the first cave paintings to the latest hybrid immersive experiences; it includes the way we as academics, practitioners and people interested in all aspects of information interact with each other socially, “in real life”.

The unconference model is part of this transformation. In theory, an unconference is a conference organised, structured and led by the people attending it. All attendees and organisers are encouraged to become participants, with discussion leaders providing moderation and structure for attendees. Indeed, unconferences have become popular as an alternative to the panel discussions and keynote speakers featured at traditional conferences.

When I was a PhD student I witnessed not without some envy the first wonderful appearance (in 2008) and eventually skyrocketing  and international success (from 2009 onwards) of THATcamp (the Humanities and Technology Camp). “An open, inexpensive meeting where humanists and technologists of all skill levels learn and build together in sessions proposed on the spot”, it was the brainchild of colleagues at the  Center for History and New Media at George Mason University in the United States. (They are also the birthplace of Zotero). Wikipedia kindly reminds me that it was indeed in August 2009 that the first THATCamp was held outside of the George Mason campus at the University of Texas in conjunction with the annual meeting of the Society of American Archivists.

Perhaps not coincidentally it was also in 2008 (remember we were in the midst of a serious financial crisis) that the idea of the “Mashed Library” started doing the rounds, thanks to the work of Owen Stephens. By 2010 there had been a series of Mashed Library unconference events and it had been proven that the concept went well beyond Owen sitting on his own in a room with his laptop.

Without pioneers like THATCamp and the Mashed Library events #citymash would not be taking place tomorrow. The inspiring arts and humanities advocacy organisation Arts Emergency has said it very well, “sometimes if you want something to exist you have to make it yourself.” Libraries and universities can be surprisingly conservative and risk-averse. At the same time, paraphrasing Arts Emergency, LIS is a discipine that focuses on experimental thought; libraries and universities can indeed “foster thought beyond the norms of the present. Without the capacity to think beyond repetition there is no beyond to crisis.”

This post is already longer than I intended. The list of initial session leaders for #citymash tomorrow is here. The initial programme is here. There will be practical and discussion sessions on open source implementation, systems librarianship, hands-on Twitter archiving, GoogleRefine, UX, Making in Libraries, Fan Networks, past predictions of the future of the library, 3D printing, storytelling, Markdown, and more. There are also free rooms available for other sessions to be decided on the spot, and a dedicated reflection space throughout the event.

As #citymash is an unconference, timings, topics and proceedings are expected to be fluid. Participants have been asked to bring lunch to share. It will be a social, fun space. It will be fun and it will be flexible, and hopefully it will provide us with an opportunity to learn from each other and to make things ourselves: a space for thinking beyond repetition.

Here’s looking forward to tomorrow!

The #citymash website is at http://citymash.github.io/. Please note that registration has now closed. Follow the #citymash hashtag for live updates from the day.


#citymash has been supported by the Software Sustainability Institute. The Software Sustainability Institute cultivates world-class research with software. The Institute is based at the universities of Edinburgh, Manchester, Southampton and Oxford.

#citymash has been supported by figshare. figshare is a repository where users can make all of their research outputs available in a citable, shareable and discoverable manner.

This post was originally published at the #citylis blog.

#MLA15 Twitter Archive, 8-11 January 2015

130th MLA Annual Convention Vancouver, 8–11 January 2015

#MLA15 is the hashtag which corresponded to the 2015 Modern Language Association Annual Convention. The Convention was held in Vancouver from Thursday 8 to Sunday 11 January 2015.

We have uploaded a dataset as a .xlsx file including data from Tweets publicly published with #mla15:

Priego, Ernesto; Zarate, Chris (2015): #MLA15 Twitter Archive, 8-11 January 2015. figshare.
http://dx.doi.org/10.6084/m9.figshare.1293600

The dataset includes Tweets posted during the actual convention with #mla15: the set starts with a Tweet from Thursday 08/01/2015 00:02:53 Pacific Time and ends with a Tweet from Sunday 11/01/2015 23:59:58 Pacific Time.

The total number of Tweets in this dataset sums 23,609 Tweets. Only Tweets from users with at least two followers were collected.

A combination of Twitter Archiving Google Spreadsheets (Martin Hawksey’s TAGS 6.0; available at https://tags.hawksey.info/ ) was used to harvest this collection. OpenRefine (http://openrefine.org/) was used for deduplicating the data.

Please note the data in the file is likely to require further refining and even deduplication. The data is shared as is. The dataset is shared to encourage open research into scholarly activity on Twitter. If you use or refer to this data in any way please cite and link back using the citation information above.

For the #MLA14 datasets, please go to
Priego, Ernesto; Zarate, Chris (2014): #MLA14 Twitter Archive, 9-12 January 2014. figshare.
http://dx.doi.org/10.6084/m9.figshare.924801

A #HASTAC2014 Conference Tweets Archive

HASTAC 2014, Lima, Perú

Like last year, I attempted to archive the tweets tagged with the HASTAC annual conference’s official hashtag (this year #HASTAC2014).

The resulting dataset is a CSV file containing 3748 tweets tagged with #HASTAC2014 (case not sensitive).

The first tweet in the dataset is dated 19/04/2014 23:10:50 Lima, Perú time and the last one is dated 27/04/2014 15:00:54 also Lima, Perú time. The file also contains equivalent times in GMT.

HASTAC is an alliance of humanists, artists, social scientists, scientists and technologists working together to transform the future of learning for the 21st century. Since 2002, HASTAC (“haystack”) has served as a community of connection where 11,500+ members share news, tools, research, insights, and projects to promote engaged learning for a global society.

HASTAC 2014: Hemispheric Pathways: Critical Makers in International Networks, the 6th international conference for the Humanities, Arts, Science, and Technology Alliance and Collaboratory,  was hosted by the Ministerio Cultura of Lima, Perú, from 6pm Wednesday 23 April to 1pm Sunday 27 April 2014 local time. In order to avoid the inclusion of spam tweets the minimum number of followers a person had to have to be included in the archive was two.

I harvested the tweets with (several!) Twitter Archiving Google Spreadsheets (TAGS version 5.1, by Martin Hawksey).

Please note that both research and experience show that the Twitter search API isn’t 100% reliable. Large tweet volumes affect the search collection process as well. The API might “over-represent the more central users”, not offering “an accurate picture of peripheral activity” (González-Bailón, Sandra, et al. 2012). Therefore, it cannot be guaranteed this file contains each and every tweet tagged with #HASTAC2014 during the indicated period.

[It should go without saying but perhaps it must also be noted that some conference tweets might have used other variations of the hashtag. Logically those were not included in this collection. Therefore it cannot be said that even all tweets tagged #HASTAC2014 represent all the Twitter activity around the 2014 conference.]

The file includes raw data and it might require refining including deduplication. The data is shared as is.

The file is openly accessible via figshare:

Priego, Ernesto (2014): #HASTAC2014 Conference Tweets Archive from 19 April to 25 April 2014. figshare.
http://dx.doi.org/10.6084/m9.figshare.1008290

[I have just published this and the doi might take some time to become active].

The URL for the dataset is

http://figshare.com/articles/_HASTAC2014_Conference_Tweets_Archive_from_19_April_to_25_April_2014/1008290

The file is shared with a Creative Commons- Attribution license (CC-BY).

I have been archiving conference tweets and sharing backchannel datasets for some time now. I am keen on promoting the study of academic conference networks on Twitter. By openly sharing the resulting datasets and by blogging about it throughout time, I have also been openly documenting my own learning curve trying to archive tweets and how to do it better.  If you use or refer to this data in any way please cite and link back using the citation information above.

I will hopefully have time to finish and publish another post with more detail about the HASTAC conference backchannels soon.

Thank you for reading and sharing. If you attended the conference, I hope you had a nice time. As usual, I am sorry I could not attend in person.

 

#MLA14: A First Look (I)

[Originally published on January 16 2014 on my Remote Participation blog at MLA Commons, here:

http://remoteparticipation.commons.mla.org/2014/01/16/mla14-a-first-look/]

Twitter Research and Academic Conferences

The 2014 MLA (Modern Language Association) Annual Convention, was held in Chicago from 9 to 12 January 2014. You can still browse or search 2014 sessions in the online Program.

As I said in a previous post (Priego, 17/12/2013),

The MLA has been a pioneering academic organization in embracing Twitter. Since 2007 the so-called “conference back channel” has been growing considerably. Adoption of Twitter amongst scholars and students seems on the rise as well, and reporting live from the conference is no longer an underground, parallel activity but pretty much a recognized, encouraged aspect of the event.

As explained by Ross et al (2011) [PDF],

Microblogging, with special emphasis on Twitter.com, the most well known service, is increasingly used as a means of undertaking digital “backchannel” communication (non-verbal, real-time, communication which does not interrupt a presenter or event, (Ynge 1970, Kellogg et al 2006). Digital backchannels are becoming more prevalent at academic conferences, in educational use, and in organizational settings. Frameworks are therefore required for understanding the role and use of digital backchannel communication, such as that provided by Twitter, in enabling participatory cultures.

Ross et all studied the Twitter activity around three digital humanities conferences (#dh09, #thatcamp and #drha09, #drha2009), collecting and analysing a corpus of 4574 tweets (90%, 4259 original tweets and only 313 Retweets).

Though this was activity that took place in 2009 for events considerably smaller than the MLA, the study by Ross et al remains an important reference for studies on Humanities scholars use of Twitter in general and for the data collection that I’ve been conducting (not only of the MLA backchannel) and the research I’ve been meaning to publish eventually.

As a comparison from another discipline, Desai et al (2012) collected and analysed 993 tweets over the 5 days of the American Society of Nephrology (ASN) annual scientific conference in 2011 (#kidneywk11).

There is still a paucity of reliable, timely research of how scholarls use Twitter around (before, during, after) academic conferences of different diciplines. Part of the problem is that often studies of social media are not disseminated through social media channels (either as fragmentary outputs on Twitter or as blog posts) and the “publishing delay” involved in peer-reviwed formal publication means that the data reaches us, as in the two cases cited above, two years later.

The Methods

I have been following and participating remotely with the MLA convention through Twitter since 2010, attempting different ways of both engaging with and analysing the scholarly activity taking place under/with the hashtag(s) associated to the event. By far, this year #MLA14 (or #mla14; it’s not case sensitive) seemed to surpass all expectations of adoption.

I have been using Martin Hawksey‘s Twitter Archiving Google Spreadsheet TAGS (now in it’s fifth version) for a few years now, and it’s what I used to start collecting tweets tagged with #MLA14 from the 1st September 2013. In Hawksey’s words, TAGS is “a quick way to collect tweets, make publicly available and collaborate exploring the data.”

The archives I set updated automatically every minute, but the limit imposed by Google Sheets is 400,000 cells per sheet, and TAGS populates 18 columns with the tweets and associated metadata.

This means that the spreadsheets can fill very quickly and scripts can become unresponsive. I knew that if I wanted to collect as much as possible from what I knew would be a very busy feed. In other words I would require more than one archive, and I would have to hope I’d be able to deduplicate and collate the data in more manageable chunks later. In practical terms it meant that I had to be very attentive monitoring both the feed and the Google spreadsheets, following the event on Twitter almos as if I were literally there. It meant being attentive to the live archives and start collecting before the previous one had collapsed.

After the conference I was contacted by Chris Zarate from the MLA, who had also been archiving the #MLA14 feed with TAGS. He had some gaps in his data, and so did I, and only working together we have managed to have some glimpses of a more or less complete dataset of #MLA14 tweets.

A First Finding: How Many

Chris and I had more than 75,000 tweets in our combined sets, and after deduplicating them with OpenRefine we were down to 27,491 tweets.

The MLA annual convention might be a mega conference (around 7,500 paid attendees this year, according to Rosemary Feal) but 27,491 tweets is still an amazingly healthy figure reflecting some undoubtable adoption of Twitter from humanities scholars.

Chris did a quick plot over 9-12 January 2014 (the days of actual conference). It is possible we may have missed some tweets here and there due to the Twitter API rate-limiting, but there are no glaring gaps:

    #mla14 conference days activity plot. Chart cc-by Chris Zarate and Ernesto Priego
#mla14 conference days activity plot. Chart cc-by Chris Zarate and Ernesto Priego

Not suprisingly, the overall Twitter activity peaked in the afternoon of Saturday 11 January (remember the conference took place from 9 to 12 January 2014). It was that morning Central Time that I tweeted that the #MLA14 feed was receiving 21.1 tweets per minute.

Logically many research questions arise.

What’s Next: More Soon

Chris and I are still working on the dataset so as to have it in different and manageable forms that allow for easier qualitative and quantitative analysis.

We are also looking forward to eventually sharing a CSV file containing data and metadata of tweets posted between Sunday September 01 2013 at 20:35:07 to Wednesday January 15 2014 16:16:41 (Central Time).

If you have a dataset including #MLA14 tweets before Sunday September 01 2013 at 20:35:07, we would love to hear from you.

I will keep sharing some insights from the dataset here. Hopefully I’ll have another post on this blog tomorrow with some interesting findings.

N.B. Sadly, in spite of constant efforts by me and many other colleagues to encourage the recognition of blog posts as academic outputs, research of this type that is not presented in the traditional academic venues (read: peer-reviewed academic article or monograph) rarely gets cited (this is frankly disappointing). Therefore I regret I will be unable to blog the complete analysis or share the whole dataset until I have at least secured one formal output for this ongoing research. Were I in a different stage of my career I could probably afford to, but it’s not the case at the moment.

Again, with many thanks to Chris Zarate for collaborating in this project.

References

Desai, T., Shariff, A., Shariff, A., Kats, M., Fang, X., Christiano, C., & Ferris, M. (2012). Tweeting the meeting: an in-depth analysis of Twitter activity at Kidney Week 2011. (V. Gupta, Ed.) PloS one, 7(7), e40253. doi:10.1371/journal.pone.0040253. Accessed 16 January 2013

Priego, E. (2013, December 13). “Live-Tweeting the MLA: Suggested Practices”. MLA Convention blog guest post, MLA Commons. http://convention.commons.mla.org/2013/12/17/live-tweeting-the-mla-suggested-practices/ . Accessed 16 January 2013.

Priego, Ernesto (ernestopriego). “More than 14,000 tweets in my #mla14 archive (surely incomplete) since September. At the moment 21.1 tweets per minute. *Back*channel?!”. 11 Jan 2014, 16:40 UTC. Tweet https://twitter.com/ernestopriego/status/422045270688288768. Accessed 16 January 2013.

Rosemary G. Feal (rgfeal). “@ernestopriego around 7,500”. 16 Jan 2014, 18:39 UTC. Tweet, https://twitter.com/rgfeal/status/423887347734687744. Accessed 16 January 2013.

Ross, C., Terras, M., Warwick, C., & Welsh, A. (2011, October 30). Enabled backchannel: conference Twitter use by digital humanists. J DOC. EMERALD GROUP PUBLISHING LIMITED. Retrieved from UCL Discovery (Open Access) http://discovery.ucl.ac.uk/155116/1/Terras_EnabledBackchannel.pdf . Accessed 16 January 2013.

[Originally published on January 16 2014 on my Remote Participation blog at MLA Commons, here:

http://remoteparticipation.commons.mla.org/2014/01/16/mla14-a-first-look/]

At the Guardian: Live-tweeting at academic conferences: my 10 personal rules of thumb

Guardian Higher Education banner

At the Guardian Higher Education Network, I explore the ethics of live-tweeting academic events. In my article I provide 10 points to bear in mind when navigating this emerging social media minefield, here.

At HASTAC, Resources for Academic Live-Bloggers

HASTAC banner

I posted a selection of resources I’ve found useful for academic live-blogging on my HASTAC blog, hoping someone else finds them useful too.

Resources for Academic Live-Bloggers

Originally posted on my HASTAC blog on 10/3/2012 – 10:46am.

[4 October 08:40am BST Update: last night the Guardian published my article “Live-tweeting at academic conferences: 10 rules of thumb”. It can be read here.]

Workers unite clipart by boobaloo available at http://openclipart.org/detail/168060/-by-boobaloo

There are very interesting resources out there that can be useful for those interested in academic live-blogging or live-tweeting. My approach comes from an arts and humanities perspective, and other disciplines might have different concerns. Those in the medical sciences, for example, might have to check the research guidelines of their institutions and professional associations before engaging in the live sharing of third-party content. In the arts and humanities we are still catching up with the challenges posed by social media (these challenges are not necessarily exclusive of social media and therefore are not particularly new).  Due to the very flexible nature of social media (nothing ever remains the same and things can change very quickly) it is important to be willing to adapt existing resources to specific requirments or circumstances.

Live-blogging is essentially a form of reporting and a way of engaging with real life events and with their reports. It is a form of broadcasting content. It is also a form of research. Guidelines from journalism and research ethics (particularly Internet-Mediated Research) can be very helpful for those working with social media to report academic events as they take place. Academia in the arts and humanities has been relatively slow in the adoption of social media for professional communications and in my view the discussion of issues arising from it shows this, particularly when contrasted with similar discussions in say media or journalism studies. It is in these fields where we can find very interesting resources that we as humanities scholars can adapt to our own settings.

What follows is a quick list of some of the resources I would like to recommend. Obviously there is much more out there. Please note many of the following resources do not specifically refer to arts and humanities academic conference live-tweeting; my suggestion is that there is useful information there we could learn from and adapt to our own needs and purposes.

Basics

Twitter Terms of Service <http://twitter.com/tos>

Twitter Guidelines, Best Practices, Policies <http://support.twitter.com/groups/33-report-abuse-or-policy-violations#topic_148>

Knight Digital Media Center Twitter for Journalists Engagement Tutorials <http://multimedia.journalism.berkeley.edu/tutorials/twitter/engagement/#>

Minocha, S. and Petre, M. (2012). Handbook of social media for researchers and supervisors <http://www.vitae.ac.uk/policy-practice/567271/Handbook-of-social-media-for-researchers-and-supervisors.html>

Priego, E. (2011). “How Twitter will revolutionise academic research and teaching”. Guardian Higher Education Network Learning and Teaching Hub. <http://www.guardian.co.uk/higher-education-network/blog/2011/sep/12/twitter-revolutionise-academia-research>

Stempeck, M. “How to Live Blog Events with a Team”. MIT Center for Civic Media. <http://civic.mit.edu/blog/mstem/how-to-liveblog-events-with-a-team>

Ethics

“Introduction to the Special Issue: Research Ethics in Online Communities”, by Aleks Krotoski. International Journal of Internet Research Ethics Issue 3.1, December 2010. Available from <http://ijire.net/issue_3.1.html>

The British Psychological Society. “Report of the Working Party on Conducting Research on the Internet. Guidelines for ethical practice in psychological research online”. [PDF] <http://www.bps.org.uk/sites/default/files/documents/conducting_research_on_the_internet-guidelines_for_ethical_practice_in_psychological_research_online.pdf>

International Review of Information Ethics, Issue No. 017, Vol. 17 – July 2012. Ethics of Secrecy, edited by Daniel Nagel, Matthias Rath, Michael Zimmer. <http://www.onlinecreation.info/?p=434>

“How do I cite a tweet?” Modern Language Association Style Guide FAQ, <http://www.mla.org/style/handbook_faq/cite_a_tweet>
Lee, C. “How to Cite Twitter and Facebook, Part I, General” American Pyschological Association Style Blog, <http://blog.apastyle.org/apastyle/2009/10/how-to-cite-twitter-and-facebook-part-i.html>.
McKenzie, A. (2012). “Don’t let e-safety worries be a barrier to using social media in school”. Guardian Teacher Network Blog. <http://www.guardian.co.uk/teacher-network/2012/jul/26/social-networking-school-safety>

Blogging and Social Media Guidelines, Best Practices and/or Policies

Afderheide, P., Clark, J. et al, (2009). Scan and Analysis of Best Practices in Digital Journalism In and Outside U.S. Public Broadcasting. Center for Social Media. <http://www.centerforsocialmedia.org/future-public-media/documents/articles/scan-and-analysis-best-practices-digital-journalism-and-outsi>.

Belam, M. (2012). “Forcing Hari to link only shows up how much the rest of the news industry doesn’t”. <http://www.currybet.net/cbet_blog/2012/06/hari-link-footnotes>.

Boudreaux, C. (2009-2012). Online Database of Social Media Policies. Social Media Governance. <http://socialmediagovernance.com/policies.php>

Center for Social Media. Code of Best Practices in Fair Use for Online Video. <http://www.centerforsocialmedia.org/fair-use/related-materials/codes/code-best-practices-fair-use-online-video>

British Broadcasting Corporation. Social Networking, Microblogs and other Third Party Websites: BBC Use. Guidance in Full. <http://www.bbc.co.uk/editorialguidelines/page/guidance-blogs-bbc-full>

Editorial Integrity for Public Media. Principles, Policies and Practices. <http://pmintegrity.org/>

Fitzpatrick, K. (2012). “Advice on Academic Blogging, Tweeting, Whatever”. Planned Obsolescence. <www.plannedobsolescence.net/blog/advice-on-academic-blogging-tweeting-whatever/>.

Giussani, B., and Zuckerman, E. (2007). Tips for Conference Bloggers.  <http://www.lunchoverip.com/conferencebloggers.html>

Global Voices Wiki Author Guidelines <http://wiki.globalvoicesonline.org/article/Author_Guidelines>

International Olympic Committee (2012). IOC Social Media, Blogging and Internet Guidelines for participants and other accredited persons at the London 2012 Olympic Games. [PDF] <http://www.olympic.org/Documents/Games_London_2012/IOC_Social_Media_Blogging_and_Internet_Guidelines-London.pdf>

Research on Academic Social Media

Golbeck, J., Grimes, J. M., and Rogers, A. (2010). “Twitter use by the U.S. Congress”. Journal of the American Society for Information Science and Technology, doi:10.1002/asi.21344

Holcomb, J., Gross, K. and Mitchell, A. (2010). “How Mainstream Media Outlets Use Twitter”. Journalism.org <http://www.journalism.org/analysis_report/how_mainstream_media_outlets_use_twitter?src=prc-headline>.
Journalist’s Resource. “Twitter, politics and the public: Research roundup” <http://journalistsresource.org/studies/society/media-society/us-government-twitter-research/>

Junco, R. (2011). “The need for student social media policies”. [PDF] Educause Review. <http://www.educause.edu/ero/article/need-student-social-media-policies>

Junco, R., Dahms, A. R. et al. (2010). “Media review: #sachat on Twitter”. Journal of Student Affairs Research and Practice, 47(2), 251-254. <http://blog.reyjunco.com/pdf/sachatreview.pdf>

Ross, C. Terras, M. Warwick, C. and Welsh, A. (2011). “Enabled Backchannel: Conference Twitter Use by Digital Humanists. Journal of Documentation. Vol. 67 Iss: 2, pp.214 – 237. <http://www.emeraldinsight.com/journals.htm?articleid=1911710&show=abstract>

Zhao, D., & Rosson, M. B. (2009). “How and why people Twitter: The role that micro-blogging plays in informal communication at work”. In Proceedings of the ACM 2009 International Conference on Supporting Group Work (pp. 243–252). Sanibel Island, Florida, USA: ACM. doi:10.1145/1531674.1531

Disclosure: no one paid me to do this post.

Clip art by boobaloo