Metricating #respbib18 and #ResponsibleMetrics: A Comparison

I’m sharing summaries of Twitter numerical data from collecting the following bibliometrics event hashtags:

  • #respbib18 (Responsible use of Bibliometrics in Practice, London, 30 January 2018) and
  • #ResponsibleMetrics (The turning tide: A new culture of responsible metrics for research, London, 8 February 2018).

 

#respbib18 Summary

Event title Responsible use of Bibliometrics in Practice
Date 30-Jan-18
Times 9:00 am – 4:30 pm  GMT
Sheet ID RB
Hashtag #respbib18
Number of links 128
Number of RTs 100
Number of Tweets 360
Unique tweets 343
First Tweet in Archive 23/01/2018 11:44 GMT
Last Tweet in Archive 01/02/2018 16:17 GMT
In Reply Ids 15
In Reply @s 49
Unique usernames 54
Unique users who used tag only once 26 <–for context of engagement

Twitter Activity

#respbib18 twitter activity last three days
CC-BY. Originally published as https://twitter.com/ernestopriego/status/958424112547983363

 

#ResponsibleMetrics Summary

Event title The turning tide: A new culture of responsible metrics for research
Date 08-Feb-18
Times 09:30 – 16:00 GMT
Sheet ID RM
Hashtag #ResponsibleMetrics
Number of links 210
Number of RTs 318
Number of Tweets 796
Unique tweets 795
First Tweet in Archive 05/02/2018 09:31 GMT
Last Tweet in Archive 08/02/2018 16:25 GMT
In Reply Ids 43
In Reply @s 76
Unique usernames 163
Unique usernames who used tag only once 109 <–for context of engagement

Twitter Activity

#responsiblemetrics Twitter activity last three days
CC-BY. Originally published as https://twitter.com/ernestopriego/status/961639382150189058

#respbib18: 30 Most Frequent Terms

 

Term RawFrequency
metrics 141
responsible 89
bibliometrics 32
event 32
data 29
snowball 25
need 24
use 21
policy 18
today 18
looking 17
people 16
rankings 16
research 16
providers 15
forum 14
forward 14
just 14
practice 14
used 14
community 13
different 12
metric 12
point 12
using 12
available 11
know 11
says 11
talks 11
bibliometric 10

#ResponsibleMetrics: 30 Most Frequent Terms

Term RawFrequency
metrics 51
need 36
research 29
indicators 25
panel 16
responsible 15
best 13
different 13
good 13
use 13
index 12
lots 12
people 12
value 12
like 11
practice 11
context 10
linear 10
rankings 10
saying 10
used 10
way 10
bonkers 9
just 9
open 9
today 9
universities 9
coins 8
currency 8
data 8

Methods

Twitter data mined with Tweepy. For robustness and quick charts a parallel collection was done with TAGS. Data was checked and deduplicated with OpenRefine. Text analysis performed with Voyant Tools. Text was anonymised through stoplists; two stoplists were applied (one to each dataset), including usernames and Twitter-specific terms (such as RT, t.co, HTTPS, etc.), including terms in hashtags. Event title keywords were not included in stoplists.

No sensitive, personal nor personally-identifiable data is contained in this data. Any usernames and names of individuals were removed at data refining stage and again from text analysis results if any remained.

Please note that both datasets span different number of days of activity, as indicated in the summary tables. Source data was refined but duplications might have remained, which would logically affect the resulting term raw frequencies, therefore numbers should be interpreted as indicative only and not as exact measurements.  RTs count as Tweets and raw frequencies reflect the repetition of terms implicit in retweeting.

So?

As usual I share this hoping others might find interesting and draw their own conclusions.

A very general insight for me is that we need a wider group engaging with this discussions. At most we are talking about a group of approximately 50 individuals that actively engaged on Twitter on both events.

From the Activity charts it is noticeable that tweeting recedes at breakout times, possibly indicating that most tweeting activity is coming from within the room– when hashtags create wide engagement, activity is more constant and does not exactly reflect the timings of actual real-time activity in the room.

It seems to me that the production, requirement, use and interpretation of metrics for research assessment directly affects everyone in higher education, regardless of their position or role. The topic should not be obscure or limited to bibliometricians and RDM, Research and Enterprise or REF panel people.

Needless to say I do not think everyone ‘engaged’ with these events or topics is or should be actively using the hashtag on Twitter (i.e. we don’t know how many people followed on Twitter). An assumption here is that we cannot detect nor measure anything if there is not a signal– more folks elsewhere might be interested in these events but if they did not use the hashtag they were logically not detected here. That there is no signal measurable with the selected tools does not mean there is not a signal elsewhere, and I’d like this to be a comment on metrics for assessment as well.

In terms of frequent terms it remains apparent (as in other text analyses I have performed on academic Twitter hashtag archives) that frequently tweeted terms remain ‘neutral’ nouns, or adjectives if they are a keyword in the event’s title, subtitle or panel sessions (e.g. ‘responsible’). When a term like ‘snowball’ or ‘bonkers’ appears, it stands out. Due to the lack of more frequent modifiers, it remains hard to distant-read sentiment or critical stances, or even positions. Most frequent terms do come from RTs, not because of consensus in ‘original’ Tweets.

It seems that if we wanted to demonstrate the value added by live-tweeting or using an event’s hashtag remotely, quantifying (metricating?) the active users, tweets over time, days of activity and frequent words would not be the way to go for all events, particularly not for events with relatively low Twitter activity.

As we have seen, automated text analysis is more likely to reveal mostly-neutral keywords, rather than any divergence of opinion on or additions to the official discourse. We would have to look at those words less repeated, and perhaps to replies that did not use the hashtag, but this is not recommended as it would complicate things ethically: though it is generally accepted that RTs do not imply endorsement, less frequent terms in Tweets with the hashtag could single-out individuals, and if a hashtag was not included on a Tweet it should be interpreted the Tweet is not meant to be part of that public discussion/corpus.

 

 

 

 

Assessing the Assessment Evaluation Reports: Are They Setting the Example?

Reflections logo

As I write this the Higher Education Funding Council for England (HEFCE) is hosting an invitation-only event titled “REFlections: Evaluation of the Research Excellence Framework (REF) 2014 and a look to the future“.

At the time of writing this line my #REFlections archive has collected more than 1,100 Tweets published today. (I’ll share the archive later after the event).

This is a quick note to refer to two of the reports shared today:

These two reports are available online for free. However, it is no small detail, particularly given both the general context and specific topic of these reports, that none of the reports are available with an open license (I don’t mean CC-BY here, but any license at all).

Both reports indicate they are © Copyright HEFCE 2015 and © HEFCE 2015, which is itself as we know not in contradiction with open licensing (open licenses complement copyright).  However, page 2 of the RAND Corporation report also indicates clearly:

Manville et al, 2015, page 2
“All rights reserved. No part of this book may be reproduced in any form by any electronic or mechanical means (including photocopying, recording, or information storage and retrieval) without permission in writing from the sponsor.”

[I am aware I am most likely infringing copyright law by reproducing this copyright notice here. I invoke fair dealing for educational use].

Unless I totally missed it, the Digital Science and King’s report file does not contain any licensing information telling the reader/user under what conditions the report can be reproduced or re-used or if it can indeed be adapted or enhanced in any way under attribution or any other conditions without having to request previous permission (which takes time, which means resources, which means money).

According to the Open Knowledge Foundation’s Open Definition, the first requirement for a work to qualify as “open” is that

“[t]he work must be available under an open license […]. Any additional terms accompanying the work (such as a terms of use, or patents held by the licensor) must not contradict the terms of the license.”

[See definition of “Open License” in 2.2. in http://opendefinition.org/od/].

Some authors consider this definition of open licensing too permissive, hence unappealing to academic authors or organisations that may have reasons to restrict the conditions under which they publish their work. However, the reports mentioned above do not provide any licensing indication, apart from the copyright notice, and in the case of the RAND report a very clear All Rights Reserved notice.

Open Access and Open Licensing are of course related. The relationship is the object of a long discusssion and it has taken place elsewhere. In this case the reports in question refer to a research assessment exercise that had the Open Access requirement at its core. HEFCE’s “Policy for open access in the post-2014 Research Excellence Framework” indicates that the open access requirement

“applies only to journal articles and conference proceedings with an International Standard Serial Number. It will not apply to monographs, book chapters, other long-form publications, working papers, creative or practice-based research outputs, or data. The policy applies to research outputs accepted for publication after 1 April 2016, but we would strongly urge institutions to implement it now.”

It is clear that the HEFCE policy cannot be applied to the reports mentioned above. They are not journal articles or conference proceedings with an ISSN.

However, I’d like to suggest that in order to engage fully in a transition towards open access to research data and information all stakeholders would need to adopt good practices in open sharing themselves.

The lack of licensing potentially limits the reach and (ironically in this case) the impact of these reports. Certainly users with an awareness of open access, open data and open licensing will notice the lack of open licensing in these reports, and find in this a limitation if not an obstacle.

Since both reports, as far as I understand, were partially or fully funded by a public body, and hence by the taxpayer, and since both reports are available for no economic cost online, noting here that citizens should have been provided with clear open licensing indicating what they may or may not do with the reports seems to me fair.

How can we demand the adoption of open practices if outputs assessing assessment mechanisms (there’s meta for you) based on the mandate to share openly do not adopt open licensing?

The publication and availability of these reports is welcome and worthy of celebration. The fact one of them explicitly forbids any reproduction without previous permission and that none of them contain licensing information is a disappointment.

[Post published 13:35 PM GMT]

14:00 PM GMT Update

This just in:

https://twitter.com/HEFCE/status/580729154787766272

A #HEFCEmetrics Twitter Archive (Friday 16 January 2015, Warwick)

HEFCE logo

The HEFCE metrics workshop: metrics and the assessment of research quality and impact in the arts and humanities took place on Friday 16 January 2015, 1030 to 1630 GMT at the Scarman Conference Centre, University of Warwick, UK.

I have uploaded a dataset of 821 Tweets tagged with #HEFCEmetrics (case not sensitive):

Priego, Ernesto (2015): A #HEFCEmetrics Twitter Archive (Friday 16 January 2015, Warwick). figshare.
http://dx.doi.org/10.6084/m9.figshare.1293612

TheTweets in the dataset were publicly published and tagged with #HEFCEmetrics between 16/01/2015 00:35:08 GMT and 16/01/2015 23:19:33 GMT. The collection period corresponds to the day the workshop took place in real time.

The Tweets contained in the file were collected using Martin Hawksey’s TAGS 6.0. The file contains 2 sheets.

Only users with at least 2 followers were included in the archive. Retweets have been included. An initial automatic deduplication was performed but data might require further deduplication.

Please note the data in this file is likely to require further refining and even deduplication. The data is shared as is. The contents of each Tweet are responsibility of the original authors. This dataset is shared to encourage open research into scholarly activity on Twitter. If you use or refer to this data in any way please cite and link back using the citation information above.

For the #HEFCEmetrics Twitter archive corresponding to the one-day workshop hosted by the University of Sussex on Tuesday 7 October 2014, please go to

Priego, Ernesto (2014): A #HEFCEmetrics Twitter Archive. figshare.
http://dx.doi.org/10.6084/m9.figshare.1196029

You might also be interested in

Priego, Ernesto (2014): The Twelve Days of REF- A #REF2014 Archive. figshare.
http://dx.doi.org/10.6084/m9.figshare.1275949

The Twelve Days of REF: A #REF2014 Archive

Cirrus word cloud visualisation of a corpus of 23,791 #REF2014 Tweets

I have uploaded a new dataset to figshare:

Priego, Ernesto (2014): The Twelve Days of REF- A #REF2014 Archive. figshare.

http://dx.doi.org/10.6084/m9.figshare.1275949

The file contains approximately 31,855 unique Tweets published publicly and tagged with #REF2014 during a 12-day period between 08/12/2014 11:18 and 20/12/2014 10:13 GMT.

For some context and an initial partial analysis, please see my previous blog post from 18 December 2014.

As always, this dataset is shared to encourage open research into scholarly activity on Twitter. If you use or refer to this data in any way please cite and link back using the citation information above.

Happy Christmas everybody.

The REF According to Twitter: A #REF2014 Update (18/12/14 16:28 GMT)

As everyone in some way aware of UK higher education knows, the results from the REF 2014 were announced in the first minute of the 18th of december 2014. Two main hashtags have been used to refer to it on Twitter; #REF and the more popular (“official”?) #REF2014.

There’s been of course other variations of these hashtags, including discussion about it not ‘hashing’ the term REF at all. Here I share a quick first look at a sample corpus of  texts from Tweets publicly tagged with #REF2014.

This is just a quick update of a work in progress. No qualitative conclusions are offered, and the quantitative data shared and analysed is provisional. Complete data sets will be published openly once the collection has been completed and the data has been further refined.

The Numbers

I looked at a sample corpus of 23,791 #REF2014 Tweets published by 10,654 unique users between 08/12/2014 11:18 GMT and 18/12/2014 16:32 GMT.

  • The sample corpus only included Tweets from users with a minimum of two followers.
  • The sample corpus consists of 1 document with a total of 454,425 words and 16,968 unique words.
  • The range of Tweets per user varied between 70 and 1, with the average being 2.3 Tweets per user.
  • Only 8 of the total of 10,654 unique users in the corpus published between 50 and 80 Tweets; 30 users published more than 30 Tweets, with 9,473 users publishing between 1 and 5 Tweets only.
  • 6,585 users in the corpus published one Tweet only.

A Quick Text Analysis

Voyant Tools was used to analyse the corpus of 23,791 Tweet texts. A customised English stop words list was applied globally. The most frequent word was “research”, repeated 8,760 times in the corpus; it was included in the stop-word list (as well as, logically, #REF2014).

A word cloud of the whole corpus using the Voyant Cirrus tool looked like this (you can click on the image to enlarge it):

Cirrus word cloud visualisation of a corpus of 23,791 #REF2014 Tweets

#REF2014  Top 50 Most frequent words so far

Word Count
uk 4605
results 4558
top 2784
impact 2091
university 1940
@timeshighered 1790
ranked 1777
world-leading 1314
excellence 1302
universities 1067
world 1040
quality 1012
internationally 933
excellent 931
overall 910
great 827
staff 827
academics 811
proud 794
congratulations 690
rated 690
power 666
@cardiffuni 653
oxford 645
leading 641
best 629
news 616
education 567
5th 561
@gdnhighered 556
@phil_baty 548
ucl 546
number 545
law 544
today 536
table 513
analysis 486
work 482
higher 470
uni 460
result 453
time 447
day 446
cambridge 430
just 428
@ref2014official 427
group 422
science 421
big 420
delighted 410

Limitations

The map is not the territory. Please note that both research and experience show that the Twitter search API isn’t 100% reliable. Large tweet volumes affect the search collection process. The API might “over-represent the more central users”, not offering “an accurate picture of peripheral activity” (González-Bailón, Sandra, et al. 2012). It is not guaranteed this file contains each and every Tweet tagged with the archived hashtag during the indicated period. Further dedpulication of the dataset will be required to validate this initial look at the data, and it is shared now merely as an update of a work in progress.

References

Gonzalez-Bailon, Sandra and Wang, Ning and Rivero, Alejandro and Borge-Holthoefer, Javier and Moreno, Yamir, “Assessing the Bias in Samples of Large Online Networks” (December 4, 2012). Forthcoming in Social Networks. Available at SSRN: http://ssrn.com/abstract=2185134 or http://dx.doi.org/10.2139/ssrn.2185134

At Altmetric: Kick-starting Fieldwork

Altmetric banner

Today the Altmetric blog published my first post in the “Fieldwork” series, an entry on online attention in the humanities and the REF.

In the “Fieldwork” series of blog posts, we will explore through journal and article-level stories whether alt-metrics can provide a holistic image of impact on diverse audiences.