Libraries! Most Frequent Terms in #WLIC2016 Tweets (part IV)

IFLA World Library and Information Congress 82nd IFLA General Conference and Assembly 13–19 August 2016, Columbus, Ohio, USA
IFLA World Library and Information Congress
82nd IFLA General Conference and Assembly
13–19 August 2016, Columbus, Ohio, USA. Copyright by IFLA, CC BY 4.0.

 


 

This is part IV. For necessary context, methodology, limitations, please see here (part 1),  here (part 2), and here (part 3).

Since this was published and shared for the first time I may have done new edits. I often come back to posts once they have been published to revise them.

Throughout the process of performing the day by day text analysis I became aware of other limitations to take into account and I have revised part 3 accordingly.

Summary

Here’s a summary of the counts of the source (unrefined) #WLIC2016 archive I collected:

Number of Links

12435

Number of RTs estimate based on occurrence of RT

14570

Number of Tweets

23552

Unique Tweets <-used to monitor quality of archive

23421

First Tweet in Archive 14/08/2016 11:29:03 EDT
Last Tweet in Archive 22/08/2016 04:20:53 EDT
In Reply Ids

270

In Reply @s

429

Number of Tweeters

3035

As previously indicated the Tweet count includes RTs. This count might require further deduplication and it might include bots’ Tweets and possibly some unrelated Tweets.

Here’s a summary of the Tweet count of the #WLIC2016  dataset I refined from the complete archive. As I explained in part 3 I organised the Tweets into conference days, from Sunday 14 to Thursday 18 August. Each day was a different corpus to analyse. I also analysed the whole set as a single corpus to ensure the totals replicated.

Day Tweet count
Sunday 14 August 2016

2543

Monday 15 August 2016

6654

Tuesday 16 August 2016

4861

Wednesday 17 August 2016

4468

Thursday 18 August 2016

3801

Thursday – Sunday

22327

 

The Most Frequent Terms

The text analysis involved analysing each corpus, first obtaining a ‘raw’ output of 300 most frequent terms and their counts. As described in previous posts, I then applied an edited English stop words list followed by a manual editing of the top 100 most frequent terms (for the shared dataset) and of the top 50 for this post. Unlike before in this case I removed ‘barack’ and ‘obama’ from Thursday and Monday’s corpora, and tried to remove usernames and hashtags though it’s posssible that further disambiguation and refining might be needed in those top 100 and top 50.

The text analysis of the Sun-Thu Tweets as a single corpus gave us the following Top 50:

#WLIC2016 Sun-Thu Top 50 Most Frequent Terms (stop-words applied; edited)

Rank

Term Count

1

libraries

2895

2

library

2779

3

librarians

1713

4

session

1467

5

access

872

6

world

832

7

public

774

8

copyright

766

9

people

757

10

need

750

11

data

746

12

make

733

13

privacy

674

14

digital

629

15

new

615

16

wikipedia

602

17

indigenous

593

18

use

574

19

information

555

20

great

539

21

knowledge

512

22

literacy

502

23

internet

481

24

work

428

25

thanks

419

26

message

416

27

future

412

28

change

379

29

social

378

30

open

369

31

just

354

32

research

353

33

know

330

34

community

323

35

important

319

36

oclc

317

37

collections

312

38

books

300

39

learn

300

40

opening

291

41

read

289

42

impact

287

43

place

282

44

good

280

45

services

277

46

national

276

47

best

272

48

latest

269

49

report

267

50

users

266

As mentioned above I also analysed each day as a single corpus. I refined the ‘raw’ 300 most frequent terms per day to a top 100 after stop words and manual editing. I then laid them all out as a single table for comparison.

#WLIC2016 Top 50 Most Frequent Terms per Day Comparison (stop-words applied; edited)

Rank

Sun 14 Aug

Mon 15 Aug

Tue 16 Aug

Wed 17 Aug

Thu 18 Aug

1

libraries library library libraries libraries

2

library libraries privacy library library

3

librarians librarians libraries librarians librarians

4

session session librarians indigenous public

5

access copyright session session session

6

world wikipedia people knowledge need

7

public digital data access data

8

copyright make indigenous data impact

9

people world make literacy new

10

need internet access need digital

11

data access wikipedia great world

12

make new use people thanks

13

privacy need information research access

14

digital use world public value

15

new public public new national

16

wikipedia future knowledge marketing change

17

indigenous people copyright general privacy

18

use message homeless open great

19

information collections literacy world work

20

great information oclc archives research

21

knowledge content great just use

22

literacy open homelessness national people

23

internet report need assembly knowledge

24

work space freedom place social

25

thanks trend like make using

26

message great thanks read know

27

future net internet community make

28

change work info social services

29

social neutrality latest reading skills

30

open making experiencing work award

31

just update theft information information

32

research books important use learning

33

know collection just learn users

34

community social subject share book

35

important design change matters user

36

oclc data guidelines key best

37

collections thanks digital know collections

38

books librarian students global academic

39

learn know know government measure

40

opening shaping online life poland

41

read google protect thanks community

42

impact change working important learn

43

place literacy statement development outcomes

44

good just work love share

45

services technology future impact time

46

national online read archivist media

47

best poster award good section

48

latest info create books important

49

report working services cultural service

50

users law good help closing

I have shared on figshare a datset containing the summaries above as well as the raw top 300 most frequent terms for the whole set as well as divided per day. The dataset also includes the top 100 most frequent terms lists per day that I  manually edited after having applied the edited English stop word filter.

You can download the spreadsheet from figshare:

Priego, Ernesto (2016): #WLIC2016 Most Frequent Terms Roundup. figshare.
https://dx.doi.org/10.6084/m9.figshare.3749367.v2

Please bear in mind that as refining was done manually and the Terms tool does not always seem to apply stop words evenly there might be errors. This is why the raw output was shared as well. This data should be taken to be indicative only.

As it is increasingly recommended for data sharing, the CC-0 license has been applied to the resulting output in the repository. It is important however to bear in mind that some terms appearing in the dataset might be licensed individually differently; copyright of the source Tweets -and sometimes of individual terms- belongs to their authors.  Authorial/curatorial/collection work has been performed on the shared file as a curated dataset resulting from analysis, in order to make it available as part of the scholarly record. If this dataset is consulted attribution is always welcome.

Ideally for proper reproducibility and to encourage other studies the whole archive dataset should be available.  Those wishing to obtain the whole Tweets should still be able to get them themselves via text and data mining methods.

Conclusions

Indeed, for us today there is absolutely nothing surprising about the term ‘libraries’ being the most frequent word in Tweets coming from IFLA’s World Library and Information Congress. Looking at the whole dataset, however, provides an insight into other frequent terms used by Library and Information professionals in the context of libraries. These terms might not remain frequent for long, and might not have been frequent words in the past (I can only hypothesise– having evidence would be nice).

A key hypothesis for me guiding this exercise has been that perhaps by looking at the words appearing in social media outputs discussing and reporting from a professional association’s major congress, we can get a vague idea of where a sector’s concerns are/were.

I guess it can be safely said that words become meaningful in context. In an age in which repetition and frequency are key to public constructions of cultural relevance (‘trending topics’ increasingly define the news agenda… and what people talk about and how they talk about things) the repetition and frequency of key terms might provide a type of meaningful evidence in itself.  Evidence, however, is just the beginning– further interpretation and analysis must indeed follow.

One cannot obtain the whole picture from decomposing a collectively, socially, publicly created textual corpus (or perhaps any corpus, unless it is a list of words from the start) into its constituent parts. It could also be said that many tools and methods often tell us more about themselves (and those using them) than about the objects of study.

So far text analysis (Rockwell 2003) and ‘distant reading’ through automated methods has focused on working with books (Ramsay 2014). However I’d like to suggest that this kind of text analysis can be another way of reading social media texts and offer another way to contribute to the assessment of their cultural relevance as living documents of a particular setting and moment in time. Who knows, they might also be telling us something about the present perception and activity of a professional field- and might help us to compare it with those in the future.

Other Considerations

Both research and experience show that the Twitter search API is not 100% reliable. Large Tweet volumes affect the search collection process. The API might “over-represent the more central users”, not offering “an accurate picture of peripheral activity” (González-Bailon, Sandra, et al, 2012).

Apart from the filters and limitations already declared, it cannot be guaranteed that each and every Tweet tagged with #WLIC2016 during the indicated period was analysed. The dataset was shared for archival, comparative and indicative educational research purposes only.

Only content from public accounts, obtained from the Twitter Search API, was analysed.  The source data is also publicly available to all Twitter users via the Twitter Search API and available to anyone with an Internet connection via the Twitter and Twitter Search web client and mobile apps without the need of a Twitter account.

These posts and the resulting dataset contain the results of analyses of Tweets that were published openly on the Web with the queried hashtag; the content of the Tweets is responsibility of the original authors. Original Tweets are likely to be copyright their individual authors but please check individually.

This work is shared to archive, document and encourage open educational research into scholarly activity on Twitter. The resulting dataset does not contain complete Tweets nor Twitter metadata. No private personal information was shared. The collection, analysis and sharing of the data has been enabled and allowed by Twitter’s Privacy Policy. The sharing of the results complies with Twitter’s Developer Rules of the Road.

A hashtag is metadata users choose freely to use so their content is associated, directly linked to and categorised with the chosen hashtag. The purpose and function of hashtags is to organise and describe information/outputs under the relevant label in order to enhance the discoverability of the labeled information/outputs (Tweets in this case). Tweets published publicly by scholars or other professionals during academic conferences are often publicly tagged (labeled) with a hashtag dedicated to the conference in question. This practice used to be the confined to a few ‘niche’ fields; it is increasingly becoming the norm rather than the exception.

Though every reason for Tweeters’ use of hashtags cannot be generalised nor predicted, it can be argued that scholarly Twitter users form specialised, self-selecting public professional networks that tend to observe scholarly practices and accepted modes of social and professional behaviour.

In general terms it can be argued that scholarly Twitter users willingly and consciously tag their public Tweets with a conference hashtag as a means to network and to promote, report from, reflect on, comment on and generally contribute publicly to the scholarly conversation around conferences. As Twitter users, conference Twitter hashtag contributors have agreed to Twitter’s Privacy and data sharing policies.

Professional associations like the Modern Language Association and the American Pyschological Association recognise Tweets as citeable scholarly outputs. Archiving scholarly Tweets is a means to preserve this form of rapid online scholarship that otherwise can very likely become unretrievable as time passes; Twitter’s search API has well-known temporal limitations for retrospective historical search and collection.

Beyond individual Tweets as scholarly outputs, the collective scholarly activity on Twitter around a conference or academic project or event can provide interesting insights for the contemporary history of scholarly communications. Though this work has limitations and might not be thoroughly systematic, it is hoped it can contribute to developing new insights into a discipline’s public concerns as expressed on Twitter over time.

References

González-Bailon, Sandra and Wang, Ning and Rivero, Alejandro and Borge-Holthoefer, Javier and Moreno, Yamir, Assessing the Bias in Samples of Large Online Networks (December 4, 2012).  Available at SSRN: http://dx.doi.org/10.2139/ssrn.2185134

Priego, Ernesto (2016) #WLIC2016 Most Frequent Terms Roundup. figshare.
https://dx.doi.org/10.6084/m9.figshare.3749367.v2

Ramsay, Stephen (2014) “The Hermeneutics of Screwing Around; or What You Do with a Million Books.” In Pastplay: Teaching and Learning History with Technology, edited by Kevin Kee, 111-20. Ann Arbor: University of Michigan Press, 2014. Also available at http://quod.lib.umich.edu/d/dh/12544152.0001.001/1:5/–pastplay-teaching-and-learning-history-with-technology?g=dculture;rgn=div1;view=fulltext;xc=1

Rockwell, Geoffrey (2003) “What is Text Analysis, Really? [PDF]” preprint, Literary and Linguistic Computing, vol. 18, no. 2, 2003, p. 209-219.

What’s in a Word? Most Frequent Terms in #WLIC2016 Tweets (part III)

IFLA World Library and Information Congress 82nd IFLA General Conference and Assembly 13–19 August 2016, Columbus, Ohio, USA
IFLA World Library and Information Congress
82nd IFLA General Conference and Assembly
13–19 August 2016, Columbus, Ohio, USA. Copyright by IFLA, CC BY 4.0.

This is part three. For necessary context please start here (part 1) and here (part 2). The final, fourth part is here.

It’s Friday already and the sessions from IFLA’s WLIC 2016 have finished. I’d like to finish what I started and complete a roundup of my quick (but in practice not-so-quick) collection and text analysis of a sample of #WLIC2016 Tweets. My intention is to finish this with a fourth and final blog post following this one and to share a dataset on figshare as soon as possible.

As previously I customised the spreadsheet settings to collect only Tweets from accounts with at least one follower and to reflect the Congress’ location and time zone. Before exporting as CSV I did a basic automated deduplication, but I did not do any further data refining (which means that non-relevant or spam Tweets may be included in the dataset).

What follows is a basic quantitative summary of the initial complete sample dataset:

  • Total Tweets: 22,540 Tweets (includes RTs)
  • First Tweet in complete sample dataset: Sunday 14/08/2016 11:29:03 EDT
  • Last Tweet in complete sample dataset: Friday 19/08/2016 04:20:43 EDT
  • Number of links:  11,676
  • Number of RTs:    13,859
  • Number of usernames: 2,811

The Congress had activities between Friday 12 August and Friday 19 August, but sessions between Sunday 14 August and Thursday 18 August. Ideally I would have liked to collect Tweets from the early hours of Sunday 14 August but I started collecting late so the earliest I got to was 11:29:03 EDT. I suppose at least it was before the first panel sessions started. For more context re: timings: see the Congress outline.

I refined the complete dataset to include only the days that featured panel sessions, and I have organised the data in a different sheet per day for individual analysis. I have also created a table detailing the Tweet counts per Congress sessions day. [Later I realised that though I had the metadata for the Columbus Ohio time zone I ended up organising the data into GMT/BST days. There is a 5 hours difference but the collected Tweets per day still roughly correspond to the timings of the conference. Of course many will have participated in the hashtag remotely –not present at the event– and many present will have tweeted not synchronically (‘live’).  I don’t think this makes much of a difference (no pun intended) to the analysis, but it’s something I was aware of and that others may or not want to consider as a limitation.

Tweets collected per day

Day Tweet count
Sunday 14 August 2016

2543

Monday 15 August 2016

6654

Tuesday 16 August 2016

4861

Wednesday 17 August 2016

4468

Thursday 18 August 2016

3801

Total Tweets in refined dataset: 22, 327 Tweets.

(Always bear in mind these figures reflect the Tweets in the collected dataset, it does not mean that as a fact that was the total number of Tweets published with the hashtag during that period. Not only does the settings of my querying affects the results; Twitter’s search API also has limitations and cannot be assumed to always return the same type or number of results).

I am still in the process of analysing the dataset. There are of course multiple types of analyses that one could do with this data but bear in mind that in this case I have only focused on using text analysis to obtain the most frequent terms in the text from the Tweets tagged with #WLIC2016 that I collected.

As before, in this case I am using the Terms tool from Voyant Tools to perform a basic text analysis in order to identify number of total words and unique word forms and most frequent terms per day; in other words, the data from each day became an individual corpus. (The complete refined dataset including all collected days could be analysed as a single corpus as well for comparison). I am gradually exporting and collecting the ‘raw’ output from the Terms tool per day, so that once I have finsihed applying the stop words to each corpus this output can be compared and so that it could be reproduced with other stop word lists if desired.

As before I am useing the English stop word list which I edited previously to include Twitter-specific terms (e.g. t.co, amp, https), as well as dataset-specific terms (e.g. the Congress’ Twitter account, related hashtags etc), but this time what I did differently is that I included all the 2,811 account usernames in the complete dataset so they would be excluded from the most frequent terms. These are the usernames from accounts with Tweets in the dataset, but other usernames (that were mentioned in Tweets’ text but that did not Tweet themselves with the hashtag) were logically not filtered, so whenever easily identifiable I am painstakingly removing them (manually!) from the remaining list. I am sure there most be a more effective way of doing this but I find the combination of ‘distant’ (automated) editing and ‘close’ (manual) editing interesting and fun.

I am using the same edited stop word list for each analysis. In this case I have also manually removed non-English terms (mostly pronouns, articles). Needless to say I did this not because I didn’t think they were relevant (quite the opposite) but because even though they had a presence they were not fairly comparable to the overwhelming majority of English terms (a ranking of most frequent non-English terms would be needed). As I will also have shared the unedited, ‘raw’ top most frequent terms in the dataset, anyone wishing to look into the non-English terms could ideally do so and run their own analyses without my own subjective stop word list and editing getting in the way. I tried to be as systematic as possible but disambiguation would be needed (the Terms tool is case and context insensitive, so a term could have been a proper name, or a username, and to be consistent I should have removed those too. Again, having the raw list would allow others to correct any filtering/curation/stop word mistakes).

I am aware there are way more sophisticaded methods of dealing with this data. Personally, doing this type of simple data collection and text analysis is an exercise and an interrogation of data collection and analysis methods and tools as reflective practices. An hypothesis behind it is that the terms a community or discipline uses (and retweets) do say something about those communities or disciplines, at least for a particular moment in time and a particular place in particular settings. Perhaps it also says things about the medium used to express those terms. When ‘screwing around‘ with texts it may be unavoidable to wonder what there is to it beyond ‘bean-counting’ (what’s in a word? what’s in a frequent term?), and what there is to social media and academic/professional live-tweeting that can or cannot be quantified. Doing this type of work makes me reflect as well about my own limitations, the limits of text analysis tools, the appropriateness of tools, the importance of replication and reproducibility and the need to document and to share what has been documented.

I’m also thinking about documentation and the open sharing of data outputs as messages in bottles, or as it has been said of metadata as ‘letters to the future’. I’m aware that this may also seem like navel-gazing of little interest outside those associated to the event in question. I would say that the role of libraries in society at large is more crucial and central than many outside the library and information sector may think (but that’s a subject for another time). Perhaps one day in the future it might be useful to look back at what we were talking about in 2016 and what words we used to talk about it. (Look, we were worried about that!) Or maybe no one cares and no one will care, or by then it will be possible to retrieve anything anywhere with great degrees of relevance and precision (including critical interpretation). In the meanwhile,  I will keep refining these lists and will share the output as soon as I can.

Next… the results!

The final, fourth part is here.

Most Frequent Terms in #WLIC2016 Tweets (part II)

IFLA World Library and Information Congress 82nd IFLA General Conference and Assembly 13–19 August 2016, Columbus, Ohio, USA
IFLA World Library and Information Congress
82nd IFLA General Conference and Assembly
13–19 August 2016, Columbus, Ohio, USA. Copyright by IFLA, CC BY 4.0.

 

The first part of this series provides necessary context.

I have now an edited list of the top 50 most frequent terms extracted from a cleaned dataset comprised of 10,721 #WLIC2016 Tweets published by 1,760 unique users between Monday 15/08/2016 10:11:08 EDT and Wednesday 17/08/2016 07:16:35 EDT.

The analysed corpus contained the raw text of the Tweets (includes RTs), comprising 185,006 total words and 12,418 unique word forms.

Stop words were applied as detailed in the first part of this series, and the resulting list (a raw list of 300 most frequent terms) was further edited to remove personal names, personal Twitter user names, common hashtags, etc.  Some organisational Twitter user names were not removed from the list, as an indication of their ‘centrality’ in the network based on the frequency with which they appeared in the corpus.

So here’s an edited list of the top 50 most frequent terms from the dataset described above:

Term Count
library

1379

libraries

1102

librarians

811

session

715

privacy

555

wikipedia

523

make

484

copyright

465

people

428

digital

378

access

375

use

362

public

340

data

322

need

319

iflabuild2016

308

world

308

information

298

internet

289

new

272

great

259

indigenous

255

iflatrends

240

report

202

knowledge

200

future

187

work

187

libraryfreedom

184

literacy

184

space

180

change

178

thanks

172

oclc

171

open

170

just

169

books

168

trend

165

important

162

info

162

know

162

social

161

net

159

neutrality

159

wikilibrary

158

collections

157

working

157

librarian

154

online

154

making

149

guidelines

148

Is this interesting? Is it useful? I don’t know, but I’ve enjoyed documenting it. Reflecting about different criteria to apply stop words and clean, refine terms has also been interesting.

I guess that deep down I believe it’s better to document than not to, even if we may think there should be other ways of doing it (otherwise I wouldn’t even try to do it). Value judgements about the utility or insightfulness of specific data in specific ways is an a posteriori process.

I hope to be able to continue collecting data and once the congress/conference ends I hope to be able to share a dataset with the raw (unedited, unfiltered) most frequent terms in the text from Tweets published with the event’s hashtag. If there’s anyone else interested they could clean, curate and analyse the data in different ways (wishful thinking but hey; it’s hope what guides us.).

What Library Folk Live Tweet About: Most Frequent Terms in #WLIC2016 Tweets

IFLA World Library and Information Congress 82nd IFLA General Conference and Assembly 13–19 August 2016, Columbus, Ohio, USA
IFLA World Library and Information Congress. Logo copyright by IFLA, CC BY 4.0.

Part 2 is  here, part 3  here and the final, fourth part is here.

IFLA stands for The International Federation of Library Associations and Institutions.

The IFLA World Library and Information Congress 2016 and 2nd IFLA General Conference and Assembly, ‘Connections. Collaboration. Community’ is currently taking place (13–19 August 2016) at the Greater Columbus Convention Center (GCCC) in Columbus, Ohio, United States.

The official hashtag of the conference is #WLIC2016. Earlier, I shared a searchable, live archive of the hashtag here. (Page may be slow to load depending on bandwidth).

I have looked at the text from 4,945 Tweets published with #WLIC2016 from 14/08/2016 to 15/08/2016 11:16:06 (EDT, Columbus Ohio time). Only accounts with at least 1 follower were included. I collected them with Martin Hawksey’s TAGS.

According to Voyant Tools this corpus had 82,809 total words and 7,506 unique word forms.

I applied an English stop word list which I edited to include Twitter-specific terms (https, t.co, amp (&) etc.), proper names (Barack Obama, other personal usernames) and some French stop words (mainly personal pronouns). I also edited the stop word list to include some dataset-specific terms such as the conference hashtag and other common hashtags, ‘ifla’, etc. (I left others that could also be considered dataset-specific terms, such as ‘session’ though).

The result was a listing of of 800 frequent terms (the least frequent terms in the list had been repeated 5 times). I then cleaned the data from any dataset-specific stop words that the stop word list did not filter and created an edited ordered listing of the most frequent 50 terms. I left in organisations’ Twitter user names (including @potus), as well as other terms that may not seem that meaningful  on their own (but who knows, they may be).

It must be taken into account the corpus included Retweets; each RT counted as a single Tweet, even if that meant terms were being logically repeated. This means that term counts in the list reflect the fact the dataset contains Retweets (which obviously implies the repetition of text).

If for some reason you are curious about what the most frequent words in #WLIC2016 Tweets were during this initial period (see above), here’s the top 50:

Term Count
libraries

543

copyright

517

librarians

484

library

406

session

374

world

326

message

271

opening

249

access

226

make

204

digital

195

internet

162

future

161

information

157

new

146

use

141

people

138

president

131

potus

125

literacy

118

need

117

oclc

114

ceremony

113

dpla

109

poster

105

thanks

103

collections

102

public

100

delegates

99

cilipinfo

98

countries

95

iflatrends

95

google

93

shaping

91

work

89

drag

83

report

83

create

81

open

81

data

79

content

78

learn

78

latest

77

making

77

fight

76

ifla_arl

75

read

74

info

73

exceptions

69

great

68

So for what it’s worth those were the 5o most frequent terms in the corpus.

I, for one, not being present in the Congress, found it interesting that ‘copyright’ is the second most frequent term, following ‘libraries’. One notices also that, unsurprisingly, the listing of top most frequent terms includes some key terms (such as ‘access’, ‘internet’, ‘digital’, ‘open’, ‘data’) concerning Library and Information professionals of late.

Were these the terms you’d have expected to make a ‘top 50’ in almost 5,000 Tweets from this initial phase of this particular conference?

The conference hasn’t finished yet of course. But so far, for a libraries and information world congress, which terms would you say are noticeable by their absence in this list? ;-)

Part 2 is  here, part 3  here and the final, fourth part is here.

 

Visualise This: Getting Attention to Journal Subscription Expenditure

[If you just got here please reload as you may be reading an older version. Thanks]

As part of ongoing research in collaboration with my colleague Domenico Fiormonte on academic publishing and ‘monopolies of knowledge’ (if curious, see this, and this, and this) we have been looking at the available data on the amount of money universities spend on journal subscriptions (both on line and and on print).

Perhaps the most important scholarly contribution to highlight the level of expenditure in the UK has been by Stuart Lawson et al (2015). “Journal subscription costs – FOIs to UK universities” is an essential reference to both understanding the difficulty of accessing consistent data and to gaining more detailed insights into how much UK universities spend on paywalled/subscription journals.

In my teaching I constantly address issues of quantitative and qualitative methods. It’s possible that my background in critical theory, philosophy and information studies still defines many of my approaches to numerical data and their interpretation. When it comes to visualising data, I assume that the map is not the territory (Korzybski 1931), and that no source data set, and no data visualisation is ever neutral. No matter how ‘clearly’ a visualisation may appear to represent data, different types of processes of interpretation have been required to render data in a particular shape and form. No matter how ‘correctly’ the data has been presented and how ‘appropriately’ the visualisation techniques have been applied, the result is never neutral nor objective. I appreciate this is still controversial, as science and the scientific method appear to still be concerned with revealing truths.

As I like doing research openly on the Web, I may sometimes share some bits of work in progress that perhaps no one else would otherwise see as they are part of the process and not definitive results. I like sharing them to produce reactions and as a means to obtain feedback as I’m still working on something. (Also because I like sharing, period). It’s also a way of documeting publicly some of my own workflow. I like experimenting, for example, with very quick, basic visualisations of data I have curated and cleaned a bit. Often these visualisations do not necessarily employ the ‘best’ or ‘correct’ ways of visualising the source data. Sometimes this is the result of trial and error (I like experimenting and I seek to learn from my mistakes) but sometimes I am aware I am just playing or seeking to use a ‘wrong’ method (on purpose!).

For instance, in this case I wanted to once again draw attention online (i.e. on Twitter) to the issue of the amount of money spent by UK institutions on journal subscriptions according to Lawson et al (2015). It’s been a year since they published the last version of the dataset so I wanted to reuse and link back again to their output. I knew they had already created a very handy interactive visualisation tool so people could explore the data in different ways.  I wanted to focus not on which institutions had spent how much, but on the total amount spent on 2014 on each of the ten main publishers. Of course this data could be visualised with a basic bar chart, and this already exists. My intention instead, I hoped, was to attract new attention, on Twitter, to Lawson et al’s data. My intention was to create a colourful image to make people wonder ‘what on earth is this?!‘ and hopefully follow the references (if they made the effort to click on the image, expand it, save it o their desktop to see more clearly).

Alluvial: Amount of money paid by 145 UK higher education institutions for journal subscriptions to ten major publishers in 2014

So.

I appreciate how annoying inadequate visualisation can be. I am aware an alluvial diagram such as the above is not the ‘right’ form to visualise this data. I saw this exercise as a very simple, very quick, consciously tongue-in-cheek strategy to use colour and flows to call attention to the names of the publishers and the flow of cash, in very boring alphabetical order, and if your disposition/sense of humour/tolerance and browser/screen/device allowed it, invite you to follow the paths to amounts spent. I was of course asware this was a labyrinthine way of doing it. I hoped to make those watching to stop for a second and try to figure out stuff. My intention sharing a chart like this is not to say ‘this is the clearest and best way to communicate this data’, but simply to attract some attention once again to the fact these publishers exist and that those amounts appear as spent on them in the source dataset.

I have read and often recommend Edward Tufte’s The Visual Display of Quantiative Information (1983), The Functional Art by Alberto Cairo (2013), Visualise This by Nathan Yau, and of course Tufte’s Beautiful Evidence (2006).  Cairo is right when he writes that “graphics, charts and maps are not just tools to be seen, but to be read and scrutinised” (2013:xx), and Tufte’s dictum “graphics reveal data” (1983:13) is the foundation of a widely-accepted code of best practice for visualising data.

Most handbooks and textbooks will insist that choosing the ‘right type’ of chart of visualisation is essential. Strictly speaking as professionals we can all agree this is correct, but we rarely interrogate what else could be done with different types of visualisations, or what the ‘wrong type’ of chart could help people make of the data. In my teaching, I often show ‘wrong types’ of visualisations as a means to explain why different types of data organisation and curation require, enable or reject different types of visualisation.

Moreover, thinking for longer about why a visualisation seems ‘wrong’ can help us think more carefully about our own assumptions and cultural paradigms. Hierarchical data, taxonomies and different methodologies of classification define and impose (and are themselves the result of) discoursive practices that, to my constant amazement, can themselves offer the very keys to their own unlocking and revealing. In other words, ‘wrong’ visualisations can illuminate things both about data and about visualisation techniques– and in a way they always-already do because if we can detect the visualisation is ‘wrong’ it’s because we have in some way paid attention to the data the visualisation is supposed to represent. In my book, ‘wrong’ visualisations are useful because they can make us pay attention and realise that visualisation is not transparent and should not be taken for granted as a method of interpretation. I suppose the best visualisations can almost make us forget they are constructs and that they are not reality itself but mediated representations.

When thinking about monopolies of knowledge, I’d like to argue that we need to focus on the key commercial entities playing a leading role in scholarly publishing. Knowing how much our institutions spend on subscriptions to journals from these publishers is important, but this tells only a part of the story. Perhaps my mindset when creating and sharing a quick exercise in visualisation like the above is that we need to focus on different aspects of the data, and that the data is never ‘raw’, but displayed and organised in different ways according to what we may want to emphasise or achieve. Perhaps it’s just overthinking what to some will be downright bad practice. As online visibility was what I was aiming to obtain I guess one of those objectives has been met (impressions are in the thousands now). I know data visualisation is a serious discipline and there are experts and scholars achieving absolute excellence out there. I am willing to accept the whole little exercise was a total failure. In any case I hope it does not hurt anyone for me to play a little bit with visualisation tools in order to see what happens if we do things in a quick and basic, non-orthodox way, in order to simply call a little bit of more attention to what I consider a pressing issue in scholarly communications.

References

Cairo, Alberto (2013): The Functional Art. An Introduction to Information Graphics and Visualization. New Riders.

Fiormonte, Domenico and Priego, Ernesto (2016): “Knowledge Monopolies and Global Academic Publishing”. The Winnower. https://thewinnower.com/papers/4965-knowledge-monopolies-and-global-academic-publishing

Korzybski, Alfred (1931): “A Non-Aristotelian System and its Necessity for Rigour in Mathematics and Physics”, a paper presented before the American Mathematical Society at the New Orleans, Louisiana, meeting of the American Association for the Advancement of Science, December 28, 1931. Reprinted in Science and Sanity, 1933, p. 747–61.

Lawson, Stuart; Meghreblian, Ben; Brook, Michelle (2015): Journal subscription costs – FOIs to UK universities. figshare. https://dx.doi.org/10.6084/m9.figshare.1186832.v23

Tufte, Edward (1983): The Visual Display of Quantiative Information. Graphics Press.

Tufte, Edward (2006): Beautiful Evidence. Graphics Press.

Yau, Nathan (2011): Visualise This: The FlowingData Guide to Design, Visualization and Statistics. Wiley.

 

A Library is Not a Library is Not a Library

Screen Shot 2016-03-29 at 09.00.46

This morning many of us in the UK woke up to these headlines: ‘Libraries lose a quarter of staff as hundreds close’ (BBC); ‘Libraries: The decline of a profession? (BBC)’; ‘Libraries facing ‘greatest crisis’ in their history’ (Guardian). [Post-publication Update: at the same time I was publishing this post, the Telegraph published a piece titled ‘Don’t mourn the loss of libraries – the internet has made them obsolete’].

You will notice that the three headlines start with the term ‘Libraries’.  The second headline suggests ‘a profession’ (we are to understand ‘librarianship’) is or might be in decline. Like many people I saw the headline shared on Twitter. I suppose the headline is meant to promise the reader an answer in the linked piece; it is designed to make the reader click on the link and therefore read the piece: is the library profession as a whole in decline?

In this brief comment I will not be providing the reader with alternative statistics (those so inclined can look at Chartered Institute of Library and Information Professionals‘ CEO Nick Poole’s CILIP 2020 strategy slides). I was however moved to write this quick post as a means to briefly expand on some thoughts I have already shared this morning on Twitter.

The BBC Freedom of Information requests results have been made available as a Google spreadsheet. The BBC offered some insights:

Change across UK

4,290 Council-run libraries in 2010

3,765 Council-run libraries now

343 libraries closed, 207 of them buildings, 132 mobile and four “other”

232 transferred, 174 to community groups and 58 outsourced

50 new libraries started, 20 of them buildings, 8 mobile and 22 “other”

111 proposed for closure over the next year

Source: BBC FOI requests

These are, no doubt, distressing figures, and they provide evidence of the extent of  public budget cuts to Council-run libraries. I don’t think there is anyone remotely related to the Library and Information sector who won’t think this is frankly terrible, but I don’t think there is any of us who were suprised by this news. It mainly confirms the extent of the damage of the government funding policy in the last six years to the public library sector.

What I would like to say here though is that the way the news have been disseminated, and this includes the way it is being shared and discussed in the media including social media, shows to me there is now more than ever before a need for Library and Information Science skills. Take for example the obvious absence of the adjective ‘Public’ or the adjectival phrase ‘Council-run’ from the headlines and bodies of the BBC and Guardian news items linked to in my first paragraph. The result is the confusion of  public or council-run libraries and the library sector as a whole.

A library is not a library is not a library because not all ‘libraries’ face the same challenges and not all librarians do the same jobs. Abbreviating ‘public libraries’ to merely ‘libraries’ creates misinformation as it feeds cultural anxieties regarding the role of information professionals in a digital age.  Confusing ‘public libraries’ with all libraries and even worse with a whole profession confuses a specific situation (public library closures in the UK due to public funding cuts) with ‘the demise of a profession’. The library profession is practised well beyond the specific realm of public or council-run libraries, and often in places that at first sight do not look at all like what many people would idenfity as a ‘library’.  Like GPs and other medical specialists, or lawyers, or most other professionals, those in the library profession are active in many sectors requiring advanced information and knowledge literacy and management skills, which in the 21st century amounts to most organisations in most if not all domains.

Like Gertude Stein’s rose in her ‘Sacred Emily‘ poem (1913), the word ‘library’ names a phenomenon which invokes the imagery and emotions that individuals in a particular context associate with it. All libraries, of course, have something essential in common. At the very least they share the professional, systematic selection, organisation, storage, management, preservation and dissemination of information, amongst other taks requiring specialised skills. However, it is important to be able to make distinctions, and state what may seem obvious, that not all libraries are the same: public or council-run libraries face a series of quite specific challenges, in the same way that academic libraries, or libraries in say legal or media organisations face different challenges that public libraries do not.

Everyone interested in libraries as a whole should be concerned about the demise of public funding for council-run libraries, but this does not mean that the whole library profession is facing a ‘demise’.  Everyone interested in the public good should be concerned about the demise of public funding for public services, and this includes council-run libraries. The vicious circle is clear, as media coverage and public discourse around the closure of public libraries often goes back to expressing cultural anxieties regarding the role of libraries in general in a digital age. Innovation is accepted as a pressing need, but without funding technological innovation including the hiring of specialised human resources proves harder if not impossible. You need the funding to up your game but if you don’t up your game, the official narrative goes, you won’t get any funding because you haven’t upped your game.

Technological ‘solutionism‘ is a great cover for politically motivated budget cuts to public services. This is where lack of context leads to even more misinformation, and where the debate expresses, to a meta level, the pressing need for specialised Library and Information Science skills as 21st century critical information literacy skills. Take as an example the public opinions of a news editor of a ‘free market think-tank’ this morning on Twitter: [screenshot anonymised]

 

Screen Shot 2016-03-29 at 11.16.03.png

These opinions are well-known by most information professionals, as they reflect a widespread misunderstanding about access to information today, namely, in the case of the example above, that 1) reading as an activity (particularly fiction) is a ‘hobby’ and therefore not important for a society’s welfare, that 2) owning a smart phone can replace billions spent on libraries, and that 3) Google Scholar provides access to millions of documents directly, making libraries unnecessary.

The example above is only a needle in a massive haystack of myths and misunderstandings about how the Internet and the Web operate, and more importantly about what it is that public libraries do. Let’s focus only in the third opinion above. For the sake of argument let’s suppose everyone in the UK has access to fast, robust, reliable Internet at home and own and know how to use a reliable up-to-date device to access it (we know this is not the case).  If you can access any full content of quality through Google Scholar it is because a library or network of libraries were doing hard and expensive work behind. Even if all academic content were Open Access, or at least publicly freely available to read online, it would also have been the result of concerted efforts with libraries and librarians, even if you accessed it from the comfort of your home or train carriage. The publishing  and discovery of said hypothetical content online via Google Scholar  would have always-already meant the result of specific library and information skills and technologies, such as mark-up languages like XML, including taxonomies and ontologies, schemas and search algorithms, all working for your enjoyment behind the scenes. And that is just a superficial, quick example.

The narrative we need to see more of is that Library and Information Science skills are today more needed than ever before. Precisely because of important technological and cultural developments such as widespread access to the mobile Internet and search engine indexing services such as Google Scholar, LIS skills must come to the public fore as an essential critical skillset to idenfity, filter, curate, disseminate and interpret data and information of all types.

The news today have revealed again that the political arena is a rapidly-changing information landscape. The crisis of UK public libraries is a political problem. It is a situation created by a political, ideological agenda that has chosen to privilege free market as extreme individualism (the privileging of algorthimc access to information is free market ideology in full effect). The crisis of UK public libraries is not simple, but a main driver for the current crisis is not the lack of relevance of librarianship as a profession, but very clearly the result of ideologically-motivated budget cuts to public services.

I suggest that at the very least we should avoid an apocalyptic tone in discussions about libraries in general. We must be able to contextualise and to focus on the specifics of each phenomenon. Phenomena can be related to each other, and solidarity and empathy are important, but this does not exclude the importance of distinguishing domains. We must frame the crisis of ‘UK libraries’ as presented in the news today as a crisis caused by particular public funding policies affecting the everyday functions of council-run libraries. The crisis of ‘UK libraries’ is part of a larger crisis caused by, essentially, funding cuts to public services.

The larger cultural context of digital transformations demands from all of us interested in libraries and information to up our game in successfully demonstrating why the word ‘library’ means many different things to different people, and why ‘the profession’ should be more needed than ever in an age of overwhelming data deluge and information overload.

If the unfounded, misinformed opinion that smartphones or Google can replace all types of libraries and information professionals keeps gaining currency, the future will look increasingly grim. It will be grim because it will mean the triumph of an impoverished vision that privileges only the hyper-privileged, leaving the rest of the public doomed to accessing only the information they are given or the information they can personally afford.

Suggesting that librariship as a whole is in crisis and that the ‘solution’ lies in giving people smartphones only benefits those who benefit from dismantling public services, including the public right to council-run libraries as professional, reliable, fair, safe spaces for creativity, education, research, entertainment, and in a nutshell good ol’ public good.

 

Return to the Infinite Library (Notes)

Where I attempt to extend some Tweets into a longer piece.

From Jorge Luis Borges, "On the Cult of Books". Photo CC-BY Ernesto Priego, translation copyright Eliot Weinberger, 1984

“A book, any book, is for us a sacred object”

-Jorge Luis Borges, “On the Cult of Books”, 1951

I grew up in a house full of books. The more books we had access to the more books we knew we did not have. We were very privileged to have that kind of scarcity. My parents’ material wealth was their personal library, and their inheritance was a love for reading and a love for books (and other published, printed stuff).

One book leads to other books; other books to even more books. It never ends. Libraries and the Web are gateways, maps, templates, enablers. Growing up with scarcity of information (because reading a book was a means of realising how many other books you had to read) made me seek libraries like shelters.

I suppose nowadays it is possible to love reading without having to love the ‘physical’ artifacts that used to be equated with books. Print does not equal reading now. A book is more than the material device in which one reads. I do most of my reading on screens these days, and as a student I could not afford to buy many books. Libraries and the Web kept my hunger satiated. At the same time, libraries made me perpetually hungry for more information and more books.

I have started buying printed publications again, and I have been thinking again about what it means to be a reader, what the function of libraries and book shops and printed matter is in a time in which digital information is semi-ubiquitous.

I don’t like not knowing. Recently I have been thinking it must have something to do with growing up knowing there was always something you *had* to know and that you could miss out on. Life becomes an endless research exercise, an ongoing process of discovery, compilation, organisation and sharing of resources for later reading.

I wonder what it would be like to take access to a wealth of information resources for granted.

I think of Borges, blind, and his love for writing and reading, in the middle of an infinite library.

Would the “total” library exist if all of it could be read?

Or is the very “essence” (excuse the term) of book loving and collecting defined by the limitations to have it all?

What is curation if not, also, a way of reading a collection (a universe) and organising it in a way that it is accessible to the human mind?

Is curation a kind of reading and writing based on preexisting materials, a creative act humbled by the overwhelming amount of work that has already been done by other people?

I am aware these questions go in different directions. These are notes on scraps. I send them out before they disappear.

Interviewed by Open Access Button

botón-open-access

I was interviewed for the Open Access Button weekly series highlighting Open Access Button users from around the world, discussing their work, and sharing their stories. You can read the interview here.

 

Today! ACLAIIR Seminar: Open Access: the future of academic publication?

ACLAIIR word cloud

Below some quick information about the event in which I am participating today at the Cambridge University Library.

Advisory Council on Latin American & Iberian Information Resources
ACLAIIR Seminar 2014
Tuesday 17 June, 2014
Milstein Seminar Rooms, Cambridge University Library
West Road, Cambridge CB3 9DR

Seminar: Open Access: the future of academic publication?

Panel I (13:45 – 14:50) OA : Perspectives from the world of publishing

Chair: Joanne Edwards, University of Oxford

  1. Ellen Collins (OAPEN UK)
  2. Daniel Pearce (CUP)
  3. Dr. Rupert Gatti (Open Book Publishers)

Tea (14:50 – 15:05)

Panel II (15:05 – 16:10) OA and its impact on research and teaching

Chair: Aquiles Alencar Brayner,Digital Curator, British Library

  1. Dr. Martin Eve (University of Lincoln)
  2. Dr. Ernesto Priego (City University, London)
  3. Dr. Jenny Bunn (University College, London)

Conclusion (16:10 – 16:20)

New Publication: Comics Unmasked: A Conversation with Adrian Edwards, The British Library

Design by Jamie Hewlett for Comics Unmasked at the British Library © Jamie Hewlett 2014.
Design by Jamie Hewlett for Comics Unmasked at the British Library © Jamie Hewlett 2014.

I have published a new article on The Comics Grid: Journal of Comics Scholarship:

Comics Unmasked: A Conversation with Adrian Edwards, lead curator of Printed Historical Sources, The British Library

In this interview Adrian Edwards, lead curator of Printed Historical Sources, The British Library talks to me about the Comics Unmasked: Art and Anarchy in the UK exhibition at The British Library which opens on Friday and will stay open until 19th August 2014.

 

How to cite: Priego, E 2014. Comics Unmasked: A Conversation with Adrian Edwards, lead curator of Printed Historical Sources, The British Library. The Comics Grid: Journal of Comics Scholarship 4(1):2, DOI: http://dx.doi.org/10.5334/cg.an

This is an Open Access article distributed under the terms of the Creative Commons Attribution License
(http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Copyright is retained by the author(s).

Published on 30 April 2014.

 

With many thanks to Ludi Price for her super speedy transcription help and to everyone at Ubiquity Press, who worked at neck-breaking speed to ensure this article was published before the opening of the exhibition.

#LibPub Session 10: Libraries, Publishing: The Future?

Image from ‘An Introduction to the Study of Metallurgy, etc’, 000144847 via the Mechanical Curator
Image from ‘An Introduction to the Study of Metallurgy, etc’, 000144847 via the Mechanical Curator

Today we’ll have our last taught session of the term. Time flies when you are having fun…

For the past ten weeks we’ve been unveiling pieces of the complex, large jigsaw puzzle of the libraries and publishing landscape. “Libraries and publishing”, “library publishing” and “libraries as publishers” are three distinct inter-connected terms that refer to distinct issues and different levels of granularity. It can be argued that each of them create different scenarios, like neighbour countries in a larger map, often the borders blurring yet still present. We must also remember that the “landscape” we can see is possible by a series of layers we cannot always see (they might be below us… or above), and that the map is not the territory.

Through a series of lectures from different professional voices and points of view, the aim has to been to facilitate an understanding of the ways in which publishing (and this means current understandings of what the term means) and to explore the impact that this will have on libraries, other information providers, and their users.

We have discussed how the technical (this includes “technological”) economic, social and political factors defining the transformations in publishing, and consequently in librarianship. The module has had a strong emphasis on scholarly publishing, but we also covered trade publishing and the industry as a whole. As technologies diversify the forms in which information is recorded and disseminated, the quantity, quality, form and content of the recorded information that libraries acquire, collect, archive, preserve and make available has also changed, and this includes the methods for performing these functions. These discourses, technologies and methodologies have not evolved out of a vacuum, but as integral/integrated pieces of the social, cultural, economic and political landscape.

Today we’ll have a guest lecture by Alastair Horne (@pressfuturist); one of the best-known UK specialists spearheading online innovation and social media engagement  in UK publishing. He will discuss with us his vision of the role that social media currently plays in the publishing landscape. Though we have covered and discussed social media throughout the module, Alastair’s presentation will give us a chance to zoom in and grasp the key issues.

The intention of this last session is also to discuss the key issues we covered throughout the course and to brainstorm all together as a rehearsal in preparation of the coursework submission.

As usual, this #LibPub #citylis post was originally published on my City University London blog.