Thoughts on University Module Evaluation

Update: I have now deposited a slightly revised version of this text (that has already gone various versions since its original publication) at figshare as

Priego, Ernesto (2019): Recommendations for University Module Evaluation Policies. figshare.

Also available at City Research Online:

[Frequent readers will know I have a long-standing interest in scholarly communications, metrics and research assessment. The post below fits within my academic research practice, this time focusing on teaching evaluation (“module evaluation” in UK parlance). For an older post on metrics and research asssessment, for example, see this post from June 30 2014. As all of my work here this post is shared in a personal capacity and it does not represent in any way the views of colleagues or employers. I share these ideas here as a means to contribute publicly to a larger scholarly dialogue which is not only inter-disciplinary but inter-institutional and international].


tl; dr

This post discusses the limitations of University Module Evaluation processes and shares a series of  recommendations that could improve their design and implementation. The post concludes that regardless of staff gender, age, academic position or ethnic background, no metric or quantitative indicator should be used without thoughtful, qualitative social and organisational context awareness and unconscious bias awareness. The post concludes there is a need to eliminate the use of Module Evaluation metrics in appointment and promotion considerations.


Module Evaluation

“Module evaluation” refers to the process in which students feedback, assess and rate their academic studies and the quality of teaching on the module (in other countries “modules” might be known as courses or subjects). Below I  discuss the limitations of Module Evaluation processes and sharesa series of recommendations that I hope could improve their design and implementation.

On “Potential Bias”

Research has shown how internationally “potential bias” against gender and ethnic minorities is real. Holland has described how

“different treatment of male and female staff is increasingly well evidenced: some studies have found that students may rate the same online educators significantly higher if perceived as male compared to female (MacNell, Driscoll, and Hunt 2015), while other studies have shown that students can make more requests of and expect a greater level of nurturing behaviour from females compared to males, penalising those who do not comply (El-Alayli, Hansen-Brown, and Ceynar 2018)” (Holland 2019).

Research has also suggested “that bias may decrease with better representation of minority groups in the university workforce” (Shepherd et al 2019). However, even if an institution, school or department has good staff representation of (some) minority groups in some areas, it would be important that a policy went beyond mandating support for staff from minority groups to prepare for promotion. The way to tackle bias is not necessarily by giving more guidance and support to minority staff, but by re-addressing the data collection tools and the assessment of the resulting indicators and its practical professional and psychological consequences for staff.

As discussed above the cause for lower scores might be related to the bias implicit in the evaluation exercise itself. Arguably, lower scores can in many cases be explained not by the lecturer’s lack of skills or opportunities, but by other highly influential circumstances beyond the lecturer’s control, such as cultural attitudes to specific minority groups, demographic composition of specific student cohorts, class size, state of facilities where staff teach, etc.

In my view Universities need policies that clearly state that ME scores should not to be used as unequivocal indicators of a member of staff’s performance. The fact that the scores are often perceived by staff (correctly or incorrectly) to be used as evidence of one’s performance, that those indicators will be used as evidence in promotion processes, can indeed be a deterrent for those members of staff to apply for promotion. It can also play a role in the demoralisation of staff.


On Student Staff Ratios (SSR), Increased Workloads and Context Awareness

University Module Evaluation policies could be improved by acknowledging that workload and Student Staff Ratios are perceived to have an effect on the student experience and therefore on ME scores.

Though there is a need for more recent and UK-based research regarding the impact of class size and SSR on ME, higher education scholars such as McDonald are clear that

“research testifies to the fact that student satisfaction is not entirely dependent on small class sizes, a view particularly popular in the 1970s and late twentieth century (Kokkelenberg et al.,2008). Having said that, recent literature (post-2000) on the issue is focused heavily on the detrimental impact raised SSRs has on students, teachers and teaching and learning in general. The Bradley Review of higher education in Australia was just one ‘voice’ amongst many in the international arena, arguing that raised SSRs are seriously damaging to students and teachers alike” (McDonald 2013).

Module Evaluation policies should take into account current settings in Higher Education in relation to student attitudes to educational practices, including expectations of students today, communication expectations established by VLEs, mobile Internet, email and social media.

Raised SSRs do create higher workload for lecturers and have required new workload models. Raised SSRs imply that lecturers may not be able to meet those expectations and demands, or be forced to stretch their personal resources to the maximum, endangering their wellbeing beyond all reasonable sustainability. As I discussed in my previous post (Priego 2019) the recent HEPI Report on Mental Health in Higher Education shows “a big increase in the number of university staff accessing counselling and occupational health services”, with “new workload models” and “more directive approaches to performance management” as the two main factors behind this rise (Morish 2019).

Module Evaluation policies could do well to recognise that time is a finite resource, and that raised SSRs mean that a single lecturer will not be able to allocate the same amount of time to each student if there were lower SSRs. Raised SSRs also mean that institutions struggle to find enough appropriate rooms for lectures, which can also lead to lower scores as they impact negatively the student experience.


Who is being evaluated in multi-lecturer modules?

As part of context awareness, it is essential any interpretation of ME scores takes into account that various modules are delivered by a team of lecturers and often TAs and visiting lecturers. However, in practice the ME questionnaires are standardised and often outsourced and designed with individual session leaders in mind and generic settings that may not apply to the institution, school, department, module or session which is the setting and objective of the evaluation.

Regardless of clarification in the contrary, students often evaluate the lecturer they have in front of them that specific day in which they complete the questionnaires, not necessarily the whole team, and if they do the questionnaire’s data collection design does not allow for distinguishing what member of staff students had in mind.

Hence module leaders of large modules can arguably be penalised doubly at least, first by leading complex modules taught to many students, and second by being assessed for the performance of a group of peers, not themselves alone. Any truly effective ME Policy would need to address the urgent need to periodically revise and update MEQ’s design in consultation with the academic staff that would be evaluated with those instruments. Given who mandates the evaluations and their role in other assessment exercises such as rankings or league tables, a user-centred approach to designing module evaluation questionnaires/surveys seems sadly unlikely, but who knows.


Module Evaluation scores are more than just about staff performance

As we all know teaching is never disconnected from its infrastructural context. Room design, location, temperature, state of the equipment, illumination, level of comfort of the seats and tables, and importantly, the timing (stage in the teaching term, day of the week, time of the day, how many MEQs students have completed before, whether examinations or coursework deadlines are imminent or not) have a potential effect on the feedback given by students. ME policies would be more effective by acknowledging that academic staff do not teach in a vacuum and that many factors that might affect negatively the evaluation scores may have in fact very little to do with a member of staff’s actual professional performance.

Module Evaluation assessment done well

Members of staff potentially benefit from discussing their evaluation scores during appraisal sessions, where they can provide qualitative self-assessments of their own performance in relation to their academic practice teaching a module, get peer review and co-design strategies for professional development with their appraiser.

When done well, module evaluation scores and their discussion can help academics learn from what went well, what could go even better, what did not go as well (or went badly), interrogate the causes, and co-design strategies for improvement.

However, any assessment of module evaluation scores should be done in a way that takes into consideration a whole set of contextual issues around the way the data is collected. How to address this issue? Better designed data collection tools could address it, but it  would also be much welcome if module evaluation policies stated that scores should never be taken verbatim as unequivocal indicators of an academic’s performance.

In Conclusion…

University Module Evaluation policies should acknowledge module evaluation scores can be potentially useful for staff personal professional development, particularly if the the data collection mechanisms have been co-designed with staff with experience in the evaluated practice within the context of a specific institution, and the discussion takes place within productive, respectful, and sensitive appraisal sessions.

Policies should acknowledge that, as indicators, the evaluation scores never tell the whole story and, depending on the way the data is collected and quantified, the numbers can present an unreliable and potentially toxic picture. The objective of the evaluation should be to be a means to improve what can be improved within a specific context, not a measure of surveillance and repression that can potentially affect more negatively those who are already more likely to be victims of both conscious and unconscious bias or working within already-difficult circumstances.

Regardless of staff gender, age, academic position or ethnic background, no metric or quantitative indicator should be used without social and organisational context awareness and unconscious bias awareness.

To paraphrase the San Francisco Declaration on Research Assessment, I would argue there is a “need to eliminate the use of [Module Evaluation] metrics in funding, appointment, and promotion considerations” [DORA 2012-2018].



Fan Y, Shepherd LJ, Slavich E, Waters D, Stone M, et al. (2019) Gender and cultural bias in student evaluations: Why representation matters. PLOS ONE 14(2): e0209749.

Holland, E. P. (2019) Making sense of module feedback: accounting for individual behaviours in student evaluations of teaching, Assessment & Evaluation in Higher Education, 44:6, 961-972, DOI: 10.1080/02602938.2018.1556777

McDonald, G. (2013). “Does size matter? The impact of student-staff ratios”. Journal of higher education policy and management (1360-080X), 35 (6), p. 652.

Morish, L. (23 May 2019). Pressure Vessels: The epidemic of poor mental health among higher education staff , HEPI Occasional Paper 20. Available from [Accessed 6 June 2019].

Priego, E. (30/05/2019) Awareness, Engagement and Overload – The Roles We All Play. Available at [Accessed 6 June 2019]

San Francisco Declaration on Research Assessment (2012-2018) [Accessed 6 June 2019]


[This post is shared in a personal capacity and does not represent in any way the views of colleagues or employers. I share these ideas here as a means to contribute publicly to a larger scholarly dialogue which is not only inter-disciplinary but inter-institutional and international].

[…and yes, if you noticed the typo in the URL, thank you, we noticed it belatedly too but cannot change it now as the link had already been publicly shared.]


Metricating #respbib18 and #ResponsibleMetrics: A Comparison

I’m sharing summaries of Twitter numerical data from collecting the following bibliometrics event hashtags:

  • #respbib18 (Responsible use of Bibliometrics in Practice, London, 30 January 2018) and
  • #ResponsibleMetrics (The turning tide: A new culture of responsible metrics for research, London, 8 February 2018).


#respbib18 Summary

Event title Responsible use of Bibliometrics in Practice
Date 30-Jan-18
Times 9:00 am – 4:30 pm  GMT
Sheet ID RB
Hashtag #respbib18
Number of links 128
Number of RTs 100
Number of Tweets 360
Unique tweets 343
First Tweet in Archive 23/01/2018 11:44 GMT
Last Tweet in Archive 01/02/2018 16:17 GMT
In Reply Ids 15
In Reply @s 49
Unique usernames 54
Unique users who used tag only once 26 <–for context of engagement

Twitter Activity

#respbib18 twitter activity last three days
CC-BY. Originally published as


#ResponsibleMetrics Summary

Event title The turning tide: A new culture of responsible metrics for research
Date 08-Feb-18
Times 09:30 – 16:00 GMT
Sheet ID RM
Hashtag #ResponsibleMetrics
Number of links 210
Number of RTs 318
Number of Tweets 796
Unique tweets 795
First Tweet in Archive 05/02/2018 09:31 GMT
Last Tweet in Archive 08/02/2018 16:25 GMT
In Reply Ids 43
In Reply @s 76
Unique usernames 163
Unique usernames who used tag only once 109 <–for context of engagement

Twitter Activity

#responsiblemetrics Twitter activity last three days
CC-BY. Originally published as

#respbib18: 30 Most Frequent Terms


Term RawFrequency
metrics 141
responsible 89
bibliometrics 32
event 32
data 29
snowball 25
need 24
use 21
policy 18
today 18
looking 17
people 16
rankings 16
research 16
providers 15
forum 14
forward 14
just 14
practice 14
used 14
community 13
different 12
metric 12
point 12
using 12
available 11
know 11
says 11
talks 11
bibliometric 10

#ResponsibleMetrics: 30 Most Frequent Terms

Term RawFrequency
metrics 51
need 36
research 29
indicators 25
panel 16
responsible 15
best 13
different 13
good 13
use 13
index 12
lots 12
people 12
value 12
like 11
practice 11
context 10
linear 10
rankings 10
saying 10
used 10
way 10
bonkers 9
just 9
open 9
today 9
universities 9
coins 8
currency 8
data 8


Twitter data mined with Tweepy. For robustness and quick charts a parallel collection was done with TAGS. Data was checked and deduplicated with OpenRefine. Text analysis performed with Voyant Tools. Text was anonymised through stoplists; two stoplists were applied (one to each dataset), including usernames and Twitter-specific terms (such as RT,, HTTPS, etc.), including terms in hashtags. Event title keywords were not included in stoplists.

No sensitive, personal nor personally-identifiable data is contained in this data. Any usernames and names of individuals were removed at data refining stage and again from text analysis results if any remained.

Please note that both datasets span different number of days of activity, as indicated in the summary tables. Source data was refined but duplications might have remained, which would logically affect the resulting term raw frequencies, therefore numbers should be interpreted as indicative only and not as exact measurements.  RTs count as Tweets and raw frequencies reflect the repetition of terms implicit in retweeting.


As usual I share this hoping others might find interesting and draw their own conclusions.

A very general insight for me is that we need a wider group engaging with this discussions. At most we are talking about a group of approximately 50 individuals that actively engaged on Twitter on both events.

From the Activity charts it is noticeable that tweeting recedes at breakout times, possibly indicating that most tweeting activity is coming from within the room– when hashtags create wide engagement, activity is more constant and does not exactly reflect the timings of actual real-time activity in the room.

It seems to me that the production, requirement, use and interpretation of metrics for research assessment directly affects everyone in higher education, regardless of their position or role. The topic should not be obscure or limited to bibliometricians and RDM, Research and Enterprise or REF panel people.

Needless to say I do not think everyone ‘engaged’ with these events or topics is or should be actively using the hashtag on Twitter (i.e. we don’t know how many people followed on Twitter). An assumption here is that we cannot detect nor measure anything if there is not a signal– more folks elsewhere might be interested in these events but if they did not use the hashtag they were logically not detected here. That there is no signal measurable with the selected tools does not mean there is not a signal elsewhere, and I’d like this to be a comment on metrics for assessment as well.

In terms of frequent terms it remains apparent (as in other text analyses I have performed on academic Twitter hashtag archives) that frequently tweeted terms remain ‘neutral’ nouns, or adjectives if they are a keyword in the event’s title, subtitle or panel sessions (e.g. ‘responsible’). When a term like ‘snowball’ or ‘bonkers’ appears, it stands out. Due to the lack of more frequent modifiers, it remains hard to distant-read sentiment or critical stances, or even positions. Most frequent terms do come from RTs, not because of consensus in ‘original’ Tweets.

It seems that if we wanted to demonstrate the value added by live-tweeting or using an event’s hashtag remotely, quantifying (metricating?) the active users, tweets over time, days of activity and frequent words would not be the way to go for all events, particularly not for events with relatively low Twitter activity.

As we have seen, automated text analysis is more likely to reveal mostly-neutral keywords, rather than any divergence of opinion on or additions to the official discourse. We would have to look at those words less repeated, and perhaps to replies that did not use the hashtag, but this is not recommended as it would complicate things ethically: though it is generally accepted that RTs do not imply endorsement, less frequent terms in Tweets with the hashtag could single-out individuals, and if a hashtag was not included on a Tweet it should be interpreted the Tweet is not meant to be part of that public discussion/corpus.





A #HEFCEmetrics Twitter Archive (Friday 16 January 2015, Warwick)

HEFCE logo

The HEFCE metrics workshop: metrics and the assessment of research quality and impact in the arts and humanities took place on Friday 16 January 2015, 1030 to 1630 GMT at the Scarman Conference Centre, University of Warwick, UK.

I have uploaded a dataset of 821 Tweets tagged with #HEFCEmetrics (case not sensitive):

Priego, Ernesto (2015): A #HEFCEmetrics Twitter Archive (Friday 16 January 2015, Warwick). figshare.

TheTweets in the dataset were publicly published and tagged with #HEFCEmetrics between 16/01/2015 00:35:08 GMT and 16/01/2015 23:19:33 GMT. The collection period corresponds to the day the workshop took place in real time.

The Tweets contained in the file were collected using Martin Hawksey’s TAGS 6.0. The file contains 2 sheets.

Only users with at least 2 followers were included in the archive. Retweets have been included. An initial automatic deduplication was performed but data might require further deduplication.

Please note the data in this file is likely to require further refining and even deduplication. The data is shared as is. The contents of each Tweet are responsibility of the original authors. This dataset is shared to encourage open research into scholarly activity on Twitter. If you use or refer to this data in any way please cite and link back using the citation information above.

For the #HEFCEmetrics Twitter archive corresponding to the one-day workshop hosted by the University of Sussex on Tuesday 7 October 2014, please go to

Priego, Ernesto (2014): A #HEFCEmetrics Twitter Archive. figshare.

You might also be interested in

Priego, Ernesto (2014): The Twelve Days of REF- A #REF2014 Archive. figshare.

#HEFCEMetrics: More on Metrics for the Arts and Humanities

Today I’ll participate via Skype at the HEFCE Metrics and the assessment of research quality and impact in the Arts and Humanities workshop, commissioned by the independent review panel. I share below some notes. For previous thoughts on metrics for research assessment, see my 23 June 2014 post.

What metrics?

Traditionally, two main form of metrics have been used to measure the “impact” of academic outputs: usage statistics and citations.

“Usage statistics” usually refers to mainly two things: downloads and page views (they are often much more than that though). These statistics are often sourced from individual platforms through their web logs and Google Analytics. Some of the data platform administrators have collected from web logs and Google Analytics apart from downloads and page views  include indicators of what type of operating systems and devices are being used to access content and landing pages for most popular content. This data is often presented in custom-made reports that collate the different data, and the methods of collection and collation vary from platform to platform and user to user. The methods of collection are not transparent and often not reproducible.

Citations on the other hand can be obtained from proprietary databases like Scopus and Web of Knowledge, or from platforms like PubMed (in the sciences) Google Scholar, and CrossRef. These platforms have traditionally favoured content from the sciences (not the arts and humanities). Part of the reason is that citations are more easily tracked when the content is published with a Digital Object Identifier, a term that remains largely obscure and esoteric to many in the arts and humanities. Citations traditionally take longer to take place, and therefore take longer to collect. Again, the methods for their collection are not always transparent, and the source data is more often than not closed rather than open. Citations privilege more ‘authoritative’ content from publishers that provide and count with the necessary infrastructure, and that has been available for a longer time.


Altmetrics is “the creation and study of new metrics based on the Social Web for analyzing, and informing scholarship.” (Priem et al 2010). Altmetrics normally employ APIs and algorithms to track and create metrics from the activity on the web (normally social media platforms such as Twitter and Facebook, but also from online reference managers like Mendeley and tracked news sources) around the ‘mentioning’ (i.e. linking) of scholarly content. Scholarly content is recognised by their having an identifier such as a DOI, PubMed ID, ArXiv ID, or Handle.  This means that outputs without these identifiers cannot be tracked and/or measured.  Altmetrics are so far obtained through third-party commercial services such as, and ImpactStory.

Unlike citations, altmetrics (also known as “alternative metrics” or “article-level metrics” when usage statistics are included too) can be obtained almost immediately, and since in some cases online activity can be hectic the numbers can grow quite quickly. Altmetrics providers do not claim to measure “research quality”, but “attention”; they agree that the metrics alone are not sufficient indicators and that therefore context is always required.  Services like Altmetric, ImpactStory and PlumX have interfaces that collect the tracked activity in one single platform (that can also be linked to with widgets embeddable on other web pages). This means that these platforms also function like search and discovery tools where users can explore the “conversarions” happening around an output on line.

The rise of altmetrics and a focus on their role as a form or even branch of bibliometrics, infometrics, webometrics or scientometrics (Cronin, 2014) has taken place in the historical and techno-sociocultural context of larger transformations in scholarly communications. The San Francisco Declaration on Research Assessment (DORA, 2012) [PDF], for example, was prompted by the participation of altmetrics tools developers, researchers and open access publishers, making the general recommendation of not using journal-based metrics, such as Journal Impact Factors, as a surrogate measure of the quality of individual research articles, to assess an individual researcher’s contributions, or in hiring, promotion, or funding decisions.

The technical and cultural premise of altmetrics services is that if academics are using social media (various web services such as Twitter and Facebook only made possible by APIs) to link to (“mention”) online academic outputs, then a service “tapping” into those APIs would allow users such as authors, publishers, libraries, researchers and the general public to conduct searches across information sources from a single platform (in the form of a Graphical Unit Interface) and obtain results from all of them. Through an algorithm, it is possible to quantify, summarise and visualise the results of those searches.

The prerequisites for altmetrics compose a complex set of cultural and technological factors. Three infrastructural factors are essential:

  1. Unlike traditional usage statistics, altmetrics can only be obtained if the scholarly outputs have been published online with Digital Object Identifiers or other permanent identifiers.
  2. The online platforms that might link to these outputs need to be known, predicted and located by the service providing the metrics.
  3. Communities of users must exist using the social media platforms tracked by altmetrics services linking to these outputs.

The scholarly, institutional, technological, economic and social variables are multiple and platform and culture-dependent, and will vary from discipline to discipline and country to country.

Standards and Best Practices

Michelle Dalmau, Dave Scherer, and Stacy Konkiel led The Digital Library Federation Forum 2013 working session titled Determining Assessment Strategies for Digital Libraries and Institutional Repositories Using Usage Statistics and Altmetrics, and produced a series of recommendations for “developing best practices for assessment of digital content published by libraries”. Dalmau et al emphasised the importance of making data and methods of collection transparent, as well as of including essential context with the metrics.

As open access mandates and the REF make “impact” case studies more of a priority for researchers, publishers and institutions, it is important to insist that any metrics and their analysis, provided by either authors, publishers, libraries or funding bodies, should be openly available “for reuse under as permissive a license as possible” (Dalmau, Scherer and Konkiel).

Arts and Humanities

If altmetrics are to be used in some way for research assessment, the stakeholders involded in arts and humanities scholarly publishing need to understand the technical and cultural prerequisites for altmetrics to work. There are a series of important limitations that justify scepticism towards altmetrics as an objective “impact” assessment method. A bias towards Anglo-american and European sources, as well as for STEM disciplines, casts a shadow on the growth of altmetrics for non-STEM disciplines (Chimes, 2014). A prevalence of academic journals, particularly in the arts and humanities, have yet to have a significant, sustainable online presence, and many still lack DOIs to enable their automated and transparent tracking.

At their best, altmetrics tools are meant to encourage scholarly activity around published papers on line. It can seem, indeed, like a chicken-and-egg situation: without healthy, collegial, reciprocal cultures of scholarly interaction on the web, mentions of scholarly content will not be significant. Simultaneously, if publications do not provide identifiers like DOIs and authors, publishers and/or institutitons do not perceive any value in sharing their content, altmetrics will yet again be less significant. Altmetrics can work as search and discovery tools for both scholarly communities around academic outputs on the web, but they cannot and should not be thought as unquestionable proxies of either “impact” or “quality”. The value of these metrics lies in providing us with indicators of activity– any value obtained from them can only be the result of asking the right questions, providing context and doing the leg work– assessing outputs on their own right and their own context.

Libraries could do more to create awareness of the potential for altmetrics within the arts of humanities. The role of the library through its Institutional Repository (IR) to encourage online mentioning and the development of impact case studies should be readdressed; particularly if ‘Green’ open access is going to be the mandated form of access. Some open access repositories are already using them (City University London’s open access repository has had Altmetric widgets for its items since January 2013); but the institution-wide capabilities of some of the altmetrics services are fairly recent (Altmetric for Institutions was officially launched in June 2014). There is much work to be done, but the opportunity for cultural change that altmetrics can contribute to seems too good to waste.


A 2014 Numeralia

Here an attempt to visualise what I was up to in 2014 publishing, research and teaching engagement wise. I have focused first on how many blog posts I published on this blog per month, how many blog posts I edited and/or authored for the Comics Grid blog, how many outputs I shared on figshare and finally a general numeralia of some main categories of my 2014 activity.

This post is not meant to contribute to heighten already-pervasive anxieties of academic productivity (I’m fully aware most of this activity does not ‘count’ for many anyway), but merely as a humble, personal yet public exercise of reminding myself of the work I’ve done. You can click on the charts to enlarge them.

Happy new year everyone! See you in 2015!

Blog Posts per Month in 2014

Blog Posts per Month in 2014 comicsgrid

Figshare Uploads per Month in 2014

Ernesto Priego Selected Numeralia from 2014


HEFCE Metrics: A one-day workshop hosted by the University of Warwick

University of Warwick Faculty of Arts banner

Metrics and the assessment of research quality and impact in the Arts and Humanities

A one-day workshop hosted by the University of Warwick, as part of the Independent Review of the Role of Metrics in Research Assessment.

Date: Friday 16th January 2015 (10:30 to 16:30)

Location: Scarman Conference Centre, University of Warwick

The workshop will have the following objectives:

1. Offering a clear overview of the progress to date in the development of metrics of relevance to arts and humanities to date and persisting challenges.

2. Exploring the potential benefits and drawbacks of metrics use in research assessment and management from the perspective of disciplines within the arts and humanities.

3. Generating evidence, insights and concrete recommendations that can inform the final report of the independent metrics review.

The workshop will be attended by several members of the metrics review steering group, academics and stakeholders drawn from across the wider HE and research community.

Confirmed speakers include:

  • Prof. Jonathan Adams, King’s College London
  • Prof. Geoffrey Crossick, AHRC Cultural Value Project and Crafts Council
  • Prof. Maria Delgado, Queen Mary, University of London
  • Dr Clare Donovan, Brunel University
  • Dr Martin Eve, University of Lincoln and Open Library of Humanities
  • Prof. Mark Llewellyn, Director of Research, AHRC
  • Dr Alis Oancea, University of Oxford
  • Dr Ernesto Priego, City University London
  • Prof. Mike Thelwall, University of Wolverhampton (member of the HEFCE review steering group)
  • Prof. Evelyn Welch, King’s College London

Please register here.

A #HEFCEmetrics Twitter Archive

#hefcemetrics top tweeters

I have uploaded a new dataset to figshare:
Priego, Ernesto (2014): A #HEFCEmetrics Twitter Archive. figshare.

“In metrics we trust? Prospects & pitfalls of new research metrics” was a one-day workshop hosted by the University of Sussex, as part of the Independent Review of the Role of Metrics in Research Assessment. It took place on Tuesday 7 October 2014 at the Terrace Room, Conference Centre, Bramber House, University of Sussex, UK.

The file contains a dataset of 1178 Tweets tagged with #HEFCEmetrics (case not sensitive). These Tweets were published publicly and tagged with #HEFCEmetrics between 02/10/2014 10:18 and 08/10/2014 00:27 GMT.

The Tweets contained in the file were collected using Martin Hawksey’s TAGS 6.0. The file contains 3 sheets.

Please note the data in this file is likely to require further refining and even deduplication. The data is shared as is. The contents of each Tweet are responsibility of the original authors. This dataset is shared to encourage open research into scholarly activity on Twitter.

For more information refer to the upload itself.

If you use or refer to this data in any way please cite and link back using the citation information above.

An #altmetrics14 Twitter Archive

"Altmetrics14: expanding impacts and metrics" (#altmetrics 14) was an ACM Web Science Conference 2014 Workshop that took place on June 23, 2014 in Bloomington, Indiana, United States, between 10:00AM and 17:50 local time.

Altmetrics14: expanding impacts and metrics” (#altmetrics 14) was an ACM Web Science Conference 2014 Workshop that took place on June 23, 2014 in Bloomington, Indiana, United States, between 10:00AM and 17:50 local time.

I have uploaded to figshare a dataset of 1758 Tweets tagged with #altmetrics14 (case not sensitive).

The dataset contains an archive of 1758 Tweets published publicly and tagged with #altmetrics14 between Mon Jun 02 17:41:56 +0000 2014 and Wed Jul 16 00:48:38 +0000 2014.

During the day of the workshop, 1294 Tweets tagged with #altmetrics14 were collected.

If you use or refer to the shared file in any way please cite and link back using the following citation information:

Priego, Ernesto (2014): An #altmetrics14 Twitter Archive.  figshare. 

I have shared the file with a Creative Commons- Attribution license (CC-BY) for academic research and educational use.

The Tweets contained in the file were collected using Martin Hawksey’s TAGS 5.1.  The file contains 3 sheets.

The third sheet in the file contains 1294 Tweets tagged with #altmetrics14 collected during the day of the workshop.

The usual fair warnings apply:

Only users with at least 2 followers were included in the archive. Retweets have been included. An initial automatic deduplication was performed but data might require further deduplication.

Please note that both research and experience show that the Twitter search API isn’t 100% reliable. Large Tweet volumes affect the search collection process. The API might “over-represent the more central users”, not offering “an accurate picture of peripheral activity” (González-Bailón, Sandra, et al. 2012). It is therefore not guaranteed this file contains each and every Tweet tagged with #altmetrics14 during the indicated period, and is shared for comparative and indicative educational and research purposes only.

Please note the data in this file is likely to require further refining and even deduplication. The data is shared as is.  This dataset is shared to encourage open research into scholarly activity on Twitter. If you use or refer to this data in any way please cite and link back using the citation information above.

On Metrics and Research Assessment

Oh research where art thou?

Next Monday 30th June 2014 at noon is the deadline to reply to the ‘Call for Evidence’ for HEFCE’s Independent review of the role of metrics in research assessment. I share some quick notes on my position as an individual researcher. Needless to say my personal position is based on my experience as a researcher and as the result of reading research in the area. Please excuse the lack of hyperlinked references in the body of the text; however  I have included a bibliography at the end of the post where I link to each reference.

A combination of traditional citation metrics and ‘alternative’ article-level metrics can be used across different academic disciplines to assess the reach (in terms of academic and public ‘impact’) of excellent research undertaken in the higher education sector (Liu and Adie 2013).

If increased international public access and impact are to be key factors in 21st century research assessment, the adoption of metrics, and particularly article-level metrics, is essential. Scholarly outputs published with Digital Object Identifiers can be easily tracked and measured, and as scholars in different fields adopt online methods of dissemination more widely, the data we can obtain from tracking it should not be ignored by assessment panels.

Article-level-metrics on scholarly outputs are already being tested by institutional repositories and publishers across the board. The data is open and facilitates further research, and some evidence for qualitative impact storytelling. Merely on their own, metrics of any kind (understood as mostly quantitative data) cannot and should not be used to assess either impact or ‘excellence’.

However, citation metrics and online mention metrics (“altmetrics”) can provide valuable data that can and should be subject to quantitative and qualitative analyses and review. Qualitative assessment in the form of “impact stories” can be informed by quantitative data provided by alternative metrics providers and methodologies (Neylon 2010; Priego 2012).

The San Francisco Declaration on Research Assessment (DORA) made the general recommendation of not using journal-based metrics, such as Journal Impact Factors, as a surrogate measure of the quality of individual research articles, to assess an individual researcher’s contributions, or in hiring, promotion, or funding decisions.

DORA provides recommendations for institutions, funding agencies, publishers and organisations that supply metrics.  An analysis of available data on individual DORA signers as of June 24, 2013, showed that 10963 individuals and 484 institutions had signed and that 6% were in the humanities and 94% in scientific disciplines; this in itself reflects an important disparity across fields that should be taken into account.

The ‘gaming’ of any kind of metric is possible by definition. It is critical that previous efforts in developing good practice in the measurement and assessment of research are adopted or at least taken into account. DORA makes it explicit that the gaming of metrics will not be tolerated and altmetrics service providers are openly working towards good practice and transparent methodologies (Adie 2013).

Social media adoption by scholars for scholarly dissemination is an important aspect of academic communications. It is wide, varies across disciplines and is still fairly recent (Priem 2011: Adie and Roe 2013;  Sud and Thelwall 2013). Therefore the discovery of correlations between online mentions, downloads and traditional citations is expected to be low since the citation window is still too small. Previous research however demonstrates there are positive yet still low correlations between downloads and citation counts.

Recent and ongoing research shows that Open Access publications can lead to greater number of downloads and social media mentions. Though research looking for possible correlations between Open Access and citation counts exists, the findings vary and the citation window is still too small and more time and research will be needed to determine if positive correlations exist as a general rule (Alperin 2014; Bernal 2013; Costas 2014; Kousha & Thelwall 2007).

It could be predicted that it is likely there will be positive correlations in some cases but not all, as the scholarly, institutional, technological, economic and social variables are multiple and platform and culture-dependent. Likewise, current business models from so-called hybrid publishers that enable Open Access via Article Processing Charges are likely to privilege the dissemination of outputs of those with existing funding schemes to cover them. Similarly a prevalence of academic journals, particularly in the arts and humanities have yet to have a significant, sustainable online presence, and many still lack DOIs to enable their automated and transparent tracking. However, institutional repositories are already embracing altmetrics as a means of both tracking and encouraging engagement with the resources, and the ability to track and measure engagement with grey literature can be a good source of evidence of the role these outputs play in the research and publication life-cycle.

Moreover, some fields privilege the publication of multi-author outputs whilst others prefer single author publications. This clearly puts both those without Open Access funding and single author papers at a quantitative disadvantage. As stated above it is crucial that research assessment employing metrics is based on  qualitative analyses and takes differences in disciplinary cultures into account. Research assessment employing metrics should be conducted on a case-by-case basis even if it is difficult, time-consuming and/or costly.

It is also critical that any assessment of article-level metrics understands how these metrics are possible in the first place and  has an informed awareness of the disparities in social media adoption for scholarly purposes across different disciplinary boundaries in the Higher Education sector. Direct experience and ongoing research shows evidence that at the moment some STEM fields are over-represented online (on blogs, social media and Open Access journals and monographs) while social sciences, arts and humanities outputs are lagging behind.

Traditional citation metrics unfairly benefit those publishing in standard channels and particularly those in the Global North, leaving developing countries scholars at a disadvantage (Alperin 2013; Priego 2014). Alternative metrics more accurately measure the wider reach of scholarly outputs, and might better serve most scholars fostering a research culture that supports national and international research impact objectives.

Even though there is still a bias towards North American and European publications, altmetrics can provide advantages to scholars interested in promoting their research online internationally by addressing public needs and enabling easier discovery and access to research outputs long underrepresented in the traditional literature and databases (Alperin 2014). Moreover, the geolocation data obtainable through altmetrics services offers evidence of both the disparities and international reach of both the production and consumption of research online.

In the internaitonal context some recent and ongoing research suggests that Open Access publications tracked via article-level metrics have a wider international reach and impact; there is a growing body of evidence this is the case in both Latin America and some regions in Africa (see the OpenUCT/ Scholarly Communication in Africa Programme (SCAP) reports as well as Alperin 2014; Priego 2013, 2014; Neylon, Willmers & King 2014).

The success of automated methods to obtain quantitative indicators of the reach, reception and use of scholarly outputs depends on our ability as scholarly communities to realise and develop the potential of the Web for scholarly communications. Developers, adopters and advocates of the use of article-level metrics do not claim that quantitative indicators should be taken at face value. Online publishing offers the unique opportunity to track, measure and evaluate what happens to scholarly outputs once they have been published on the Web. They allow us to make comparisons between dissemination and access models across countries and disciplinary boundaries. More importantly the data they provide is not static, passive quantitative data, but ‘interactive’ as they work as platforms for social interactions between researchers (potentially worldwide, where conditions allow it) enabling the easier, faster discoverability, collecting, exchange and discussion of outputs.

Not embracing article-level metrics or alternative metrics/ altmetrics in research assessment when the 21st century is well underway would be a missed opportunity to push towards a scholarly culture of wider public engagement and adoption of innovative online platforms for scholarly dissemination.

Adopting purely quantitative methods, and even more suggesting that any metric, however large, can equate to “excellence” would be misguided and potentially catastrophic, particularly for those not in STEM areas or without the backing of elite institutions. Only the careful, professional qualitative assessment of live, transparent publishing data will be able to provide evidence of the public and scholarly, local and international reach and reception of excellent research.


Adie, E., & Roe, W. (2013). Enriching scholarly content with article-level discussion and metrics. Learned Publishing, 26(1), 11–17. doi:10.6084/m9.figshare.105851

Adie, E. (2013). Gaming Altmetrics. Altmetric. September 18 2013. Available from

Alperin, J. P. (2013). Ask not what altmetrics can do for you, but what altmetrics can do for developing countries. Bulletin of the American Society for Information Science and Technology, 39(4), 18–21. doi:10.1002/bult.2013.1720390407

Alperin, Juan Pablo (2014): Exploring altmetrics in an emerging country context. figshare.

Bernal, I. (2013). Open Access and the Changing Landscape of Research Impact Indicators: New Roles for Repositories. Publications, 1(2), 56–77. Retrieved from

Costas, R., Zahedi, Z., & Wouters, P. (2014). Do “altmetrics” correlate with citations? Extensive comparison of altmetric indicators with citations from a multidisciplinary perspective (p. 30). Leiden. Retrieved from

Konkiel, S. (2013, November 5). Altmetrics in Institutional Repositories. Retrieved from

Kousha, K., & Thelwall, M. (2007). The Web impact of open access social science research. Library & Information Science Research, 29(4), 495–507. Retrieved from

Liu, J., & Adie, E. (2013, May 30). Altmetric: Getting Started with Article-Level Metrics. figshare.

Mohammadi, E., & Thelwall, M. (2014). Mendeley readership altmetrics for the social sciences and humanities: Research evaluation and knowledge flows. Journal of the American Society for Information Science and Technology. Retrieved from

Neylon, C. (2011). Re-use as Impact: How re-assessing what we mean by “impact” can support improving the return on public investment, develop open research practice, and widen engagement . Altmetrics. Retrieved from

Neylon, C. (2010). Beyond the Impact Factor: Building a community for more diverse measurement of research. Science in the Open. Retrieved November 29, 2010, from

Neylon C, Willmers M and King T (2014). Rethinking Impact: Applying Altmetrics to Southern African Research. Working Paper 1, Scholarly Communication in Africa Programme.

OpenUCT Initiative Publications and SCAP reports. Available from

Priego, E. (2012). Altmetrics’: quality of engagement matters as much as retweets. Guardian Higher Education Network, Friday 24 August 2012. Retrieved from

Priego, E. (2013). Fieldwork: Apples and Oranges? Online Mentions of Papers About the Humanities. Altmetric, January 11 2013. Retrieved from

Priego, E. (2013). Alt-metrics, Digital Opportunity and Africa. Impact of Social Sciences, London School of Economics. February 6 2013. Retrieved from

Priego, E. (2014). The Triple A: Africa, Access, Altmetrics. 22 February 2014. Retrieved from

Priem, J., Hall, M., Hill, C., Piwowar, H., & Waagmeester, A. (2011). Uncovering impacts : CitedIn and total-impact , two new tools for gathering altmetrics . iConference 2012, 9–11. Retrieved from

Priem, J., Piwowar, H. A., & Hemminger, B. H. (n.d.). Altmetrics in the wild: An exploratory study of impact metrics based on social media. Metrics 2011: Symposium on Informetric and Scientometric Research. New Orleans, LA, USA. Retrieved from

Sud, P., & Thelwall, M. (2013). Evaluating altmetrics. Scientometrics. Retrieved from