Thoughts on University Module Evaluation

Update: I have now deposited a slightly revised version of this text (that has already gone various versions since its original publication) at figshare as

Priego, Ernesto (2019): Recommendations for University Module Evaluation Policies. figshare. https://doi.org/10.6084/m9.figshare.8236607

Also available at City Research Online: http://openaccess.city.ac.uk/22318/

[Frequent readers will know I have a long-standing interest in scholarly communications, metrics and research assessment. The post below fits within my academic research practice, this time focusing on teaching evaluation (“module evaluation” in UK parlance). For an older post on metrics and research asssessment, for example, see this post from June 30 2014. As all of my work here this post is shared in a personal capacity and it does not represent in any way the views of colleagues or employers. I share these ideas here as a means to contribute publicly to a larger scholarly dialogue which is not only inter-disciplinary but inter-institutional and international].

 

tl; dr

This post discusses the limitations of University Module Evaluation processes and shares a series of  recommendations that could improve their design and implementation. The post concludes that regardless of staff gender, age, academic position or ethnic background, no metric or quantitative indicator should be used without thoughtful, qualitative social and organisational context awareness and unconscious bias awareness. The post concludes there is a need to eliminate the use of Module Evaluation metrics in appointment and promotion considerations.

 

Module Evaluation

“Module evaluation” refers to the process in which students feedback, assess and rate their academic studies and the quality of teaching on the module (in other countries “modules” might be known as courses or subjects). Below I  discuss the limitations of Module Evaluation processes and sharesa series of recommendations that I hope could improve their design and implementation.

On “Potential Bias”

Research has shown how internationally “potential bias” against gender and ethnic minorities is real. Holland has described how

“different treatment of male and female staff is increasingly well evidenced: some studies have found that students may rate the same online educators significantly higher if perceived as male compared to female (MacNell, Driscoll, and Hunt 2015), while other studies have shown that students can make more requests of and expect a greater level of nurturing behaviour from females compared to males, penalising those who do not comply (El-Alayli, Hansen-Brown, and Ceynar 2018)” (Holland 2019).

Research has also suggested “that bias may decrease with better representation of minority groups in the university workforce” (Shepherd et al 2019). However, even if an institution, school or department has good staff representation of (some) minority groups in some areas, it would be important that a policy went beyond mandating support for staff from minority groups to prepare for promotion. The way to tackle bias is not necessarily by giving more guidance and support to minority staff, but by re-addressing the data collection tools and the assessment of the resulting indicators and its practical professional and psychological consequences for staff.

As discussed above the cause for lower scores might be related to the bias implicit in the evaluation exercise itself. Arguably, lower scores can in many cases be explained not by the lecturer’s lack of skills or opportunities, but by other highly influential circumstances beyond the lecturer’s control, such as cultural attitudes to specific minority groups, demographic composition of specific student cohorts, class size, state of facilities where staff teach, etc.

In my view Universities need policies that clearly state that ME scores should not to be used as unequivocal indicators of a member of staff’s performance. The fact that the scores are often perceived by staff (correctly or incorrectly) to be used as evidence of one’s performance, that those indicators will be used as evidence in promotion processes, can indeed be a deterrent for those members of staff to apply for promotion. It can also play a role in the demoralisation of staff.

 

On Student Staff Ratios (SSR), Increased Workloads and Context Awareness

University Module Evaluation policies could be improved by acknowledging that workload and Student Staff Ratios are perceived to have an effect on the student experience and therefore on ME scores.

Though there is a need for more recent and UK-based research regarding the impact of class size and SSR on ME, higher education scholars such as McDonald are clear that

“research testifies to the fact that student satisfaction is not entirely dependent on small class sizes, a view particularly popular in the 1970s and late twentieth century (Kokkelenberg et al.,2008). Having said that, recent literature (post-2000) on the issue is focused heavily on the detrimental impact raised SSRs has on students, teachers and teaching and learning in general. The Bradley Review of higher education in Australia was just one ‘voice’ amongst many in the international arena, arguing that raised SSRs are seriously damaging to students and teachers alike” (McDonald 2013).

Module Evaluation policies should take into account current settings in Higher Education in relation to student attitudes to educational practices, including expectations of students today, communication expectations established by VLEs, mobile Internet, email and social media.

Raised SSRs do create higher workload for lecturers and have required new workload models. Raised SSRs imply that lecturers may not be able to meet those expectations and demands, or be forced to stretch their personal resources to the maximum, endangering their wellbeing beyond all reasonable sustainability. As I discussed in my previous post (Priego 2019) the recent HEPI Report on Mental Health in Higher Education shows “a big increase in the number of university staff accessing counselling and occupational health services”, with “new workload models” and “more directive approaches to performance management” as the two main factors behind this rise (Morish 2019).

Module Evaluation policies could do well to recognise that time is a finite resource, and that raised SSRs mean that a single lecturer will not be able to allocate the same amount of time to each student if there were lower SSRs. Raised SSRs also mean that institutions struggle to find enough appropriate rooms for lectures, which can also lead to lower scores as they impact negatively the student experience.

 

Who is being evaluated in multi-lecturer modules?

As part of context awareness, it is essential any interpretation of ME scores takes into account that various modules are delivered by a team of lecturers and often TAs and visiting lecturers. However, in practice the ME questionnaires are standardised and often outsourced and designed with individual session leaders in mind and generic settings that may not apply to the institution, school, department, module or session which is the setting and objective of the evaluation.

Regardless of clarification in the contrary, students often evaluate the lecturer they have in front of them that specific day in which they complete the questionnaires, not necessarily the whole team, and if they do the questionnaire’s data collection design does not allow for distinguishing what member of staff students had in mind.

Hence module leaders of large modules can arguably be penalised doubly at least, first by leading complex modules taught to many students, and second by being assessed for the performance of a group of peers, not themselves alone. Any truly effective ME Policy would need to address the urgent need to periodically revise and update MEQ’s design in consultation with the academic staff that would be evaluated with those instruments. Given who mandates the evaluations and their role in other assessment exercises such as rankings or league tables, a user-centred approach to designing module evaluation questionnaires/surveys seems sadly unlikely, but who knows.

 

Module Evaluation scores are more than just about staff performance

As we all know teaching is never disconnected from its infrastructural context. Room design, location, temperature, state of the equipment, illumination, level of comfort of the seats and tables, and importantly, the timing (stage in the teaching term, day of the week, time of the day, how many MEQs students have completed before, whether examinations or coursework deadlines are imminent or not) have a potential effect on the feedback given by students. ME policies would be more effective by acknowledging that academic staff do not teach in a vacuum and that many factors that might affect negatively the evaluation scores may have in fact very little to do with a member of staff’s actual professional performance.

Module Evaluation assessment done well

Members of staff potentially benefit from discussing their evaluation scores during appraisal sessions, where they can provide qualitative self-assessments of their own performance in relation to their academic practice teaching a module, get peer review and co-design strategies for professional development with their appraiser.

When done well, module evaluation scores and their discussion can help academics learn from what went well, what could go even better, what did not go as well (or went badly), interrogate the causes, and co-design strategies for improvement.

However, any assessment of module evaluation scores should be done in a way that takes into consideration a whole set of contextual issues around the way the data is collected. How to address this issue? Better designed data collection tools could address it, but it  would also be much welcome if module evaluation policies stated that scores should never be taken verbatim as unequivocal indicators of an academic’s performance.

In Conclusion…

University Module Evaluation policies should acknowledge module evaluation scores can be potentially useful for staff personal professional development, particularly if the the data collection mechanisms have been co-designed with staff with experience in the evaluated practice within the context of a specific institution, and the discussion takes place within productive, respectful, and sensitive appraisal sessions.

Policies should acknowledge that, as indicators, the evaluation scores never tell the whole story and, depending on the way the data is collected and quantified, the numbers can present an unreliable and potentially toxic picture. The objective of the evaluation should be to be a means to improve what can be improved within a specific context, not a measure of surveillance and repression that can potentially affect more negatively those who are already more likely to be victims of both conscious and unconscious bias or working within already-difficult circumstances.

Regardless of staff gender, age, academic position or ethnic background, no metric or quantitative indicator should be used without social and organisational context awareness and unconscious bias awareness.

To paraphrase the San Francisco Declaration on Research Assessment, I would argue there is a “need to eliminate the use of [Module Evaluation] metrics in funding, appointment, and promotion considerations” [DORA 2012-2018].

 

References

Fan Y, Shepherd LJ, Slavich E, Waters D, Stone M, et al. (2019) Gender and cultural bias in student evaluations: Why representation matters. PLOS ONE 14(2): e0209749.https://doi.org/10.1371/journal.pone.0209749

Holland, E. P. (2019) Making sense of module feedback: accounting for individual behaviours in student evaluations of teaching, Assessment & Evaluation in Higher Education, 44:6, 961-972, DOI: 10.1080/02602938.2018.1556777

McDonald, G. (2013). “Does size matter? The impact of student-staff ratios”. Journal of higher education policy and management (1360-080X), 35 (6), p. 652. http://0-www.tandfonline.com.wam.city.ac.uk/loi/cjhe20

Morish, L. (23 May 2019). Pressure Vessels: The epidemic of poor mental health among higher education staff , HEPI Occasional Paper 20. Available from https://www.hepi.ac.uk/2019/05/23/new-report-shows-big-increase-in-demand-for-mental-health-support-among-higher-education-staff/ [Accessed 6 June 2019].

Priego, E. (30/05/2019) Awareness, Engagement and Overload – The Roles We All Play. Available at https://epriego.blog/2019/05/30/awareness-engagement-and-overload-the-roles-we-play/ [Accessed 6 June 2019]

San Francisco Declaration on Research Assessment (2012-2018) https://sfdora.org/ [Accessed 6 June 2019]

 


[This post is shared in a personal capacity and does not represent in any way the views of colleagues or employers. I share these ideas here as a means to contribute publicly to a larger scholarly dialogue which is not only inter-disciplinary but inter-institutional and international].

[…and yes, if you noticed the typo in the URL, thank you, we noticed it belatedly too but cannot change it now as the link had already been publicly shared.]

 

Metricating #respbib18 and #ResponsibleMetrics: A Comparison

I’m sharing summaries of Twitter numerical data from collecting the following bibliometrics event hashtags:

  • #respbib18 (Responsible use of Bibliometrics in Practice, London, 30 January 2018) and
  • #ResponsibleMetrics (The turning tide: A new culture of responsible metrics for research, London, 8 February 2018).

 

#respbib18 Summary

Event title Responsible use of Bibliometrics in Practice
Date 30-Jan-18
Times 9:00 am – 4:30 pm  GMT
Sheet ID RB
Hashtag #respbib18
Number of links 128
Number of RTs 100
Number of Tweets 360
Unique tweets 343
First Tweet in Archive 23/01/2018 11:44 GMT
Last Tweet in Archive 01/02/2018 16:17 GMT
In Reply Ids 15
In Reply @s 49
Unique usernames 54
Unique users who used tag only once 26 <–for context of engagement

Twitter Activity

#respbib18 twitter activity last three days
CC-BY. Originally published as https://twitter.com/ernestopriego/status/958424112547983363

 

#ResponsibleMetrics Summary

Event title The turning tide: A new culture of responsible metrics for research
Date 08-Feb-18
Times 09:30 – 16:00 GMT
Sheet ID RM
Hashtag #ResponsibleMetrics
Number of links 210
Number of RTs 318
Number of Tweets 796
Unique tweets 795
First Tweet in Archive 05/02/2018 09:31 GMT
Last Tweet in Archive 08/02/2018 16:25 GMT
In Reply Ids 43
In Reply @s 76
Unique usernames 163
Unique usernames who used tag only once 109 <–for context of engagement

Twitter Activity

#responsiblemetrics Twitter activity last three days
CC-BY. Originally published as https://twitter.com/ernestopriego/status/961639382150189058

#respbib18: 30 Most Frequent Terms

 

Term RawFrequency
metrics 141
responsible 89
bibliometrics 32
event 32
data 29
snowball 25
need 24
use 21
policy 18
today 18
looking 17
people 16
rankings 16
research 16
providers 15
forum 14
forward 14
just 14
practice 14
used 14
community 13
different 12
metric 12
point 12
using 12
available 11
know 11
says 11
talks 11
bibliometric 10

#ResponsibleMetrics: 30 Most Frequent Terms

Term RawFrequency
metrics 51
need 36
research 29
indicators 25
panel 16
responsible 15
best 13
different 13
good 13
use 13
index 12
lots 12
people 12
value 12
like 11
practice 11
context 10
linear 10
rankings 10
saying 10
used 10
way 10
bonkers 9
just 9
open 9
today 9
universities 9
coins 8
currency 8
data 8

Methods

Twitter data mined with Tweepy. For robustness and quick charts a parallel collection was done with TAGS. Data was checked and deduplicated with OpenRefine. Text analysis performed with Voyant Tools. Text was anonymised through stoplists; two stoplists were applied (one to each dataset), including usernames and Twitter-specific terms (such as RT, t.co, HTTPS, etc.), including terms in hashtags. Event title keywords were not included in stoplists.

No sensitive, personal nor personally-identifiable data is contained in this data. Any usernames and names of individuals were removed at data refining stage and again from text analysis results if any remained.

Please note that both datasets span different number of days of activity, as indicated in the summary tables. Source data was refined but duplications might have remained, which would logically affect the resulting term raw frequencies, therefore numbers should be interpreted as indicative only and not as exact measurements.  RTs count as Tweets and raw frequencies reflect the repetition of terms implicit in retweeting.

So?

As usual I share this hoping others might find interesting and draw their own conclusions.

A very general insight for me is that we need a wider group engaging with this discussions. At most we are talking about a group of approximately 50 individuals that actively engaged on Twitter on both events.

From the Activity charts it is noticeable that tweeting recedes at breakout times, possibly indicating that most tweeting activity is coming from within the room– when hashtags create wide engagement, activity is more constant and does not exactly reflect the timings of actual real-time activity in the room.

It seems to me that the production, requirement, use and interpretation of metrics for research assessment directly affects everyone in higher education, regardless of their position or role. The topic should not be obscure or limited to bibliometricians and RDM, Research and Enterprise or REF panel people.

Needless to say I do not think everyone ‘engaged’ with these events or topics is or should be actively using the hashtag on Twitter (i.e. we don’t know how many people followed on Twitter). An assumption here is that we cannot detect nor measure anything if there is not a signal– more folks elsewhere might be interested in these events but if they did not use the hashtag they were logically not detected here. That there is no signal measurable with the selected tools does not mean there is not a signal elsewhere, and I’d like this to be a comment on metrics for assessment as well.

In terms of frequent terms it remains apparent (as in other text analyses I have performed on academic Twitter hashtag archives) that frequently tweeted terms remain ‘neutral’ nouns, or adjectives if they are a keyword in the event’s title, subtitle or panel sessions (e.g. ‘responsible’). When a term like ‘snowball’ or ‘bonkers’ appears, it stands out. Due to the lack of more frequent modifiers, it remains hard to distant-read sentiment or critical stances, or even positions. Most frequent terms do come from RTs, not because of consensus in ‘original’ Tweets.

It seems that if we wanted to demonstrate the value added by live-tweeting or using an event’s hashtag remotely, quantifying (metricating?) the active users, tweets over time, days of activity and frequent words would not be the way to go for all events, particularly not for events with relatively low Twitter activity.

As we have seen, automated text analysis is more likely to reveal mostly-neutral keywords, rather than any divergence of opinion on or additions to the official discourse. We would have to look at those words less repeated, and perhaps to replies that did not use the hashtag, but this is not recommended as it would complicate things ethically: though it is generally accepted that RTs do not imply endorsement, less frequent terms in Tweets with the hashtag could single-out individuals, and if a hashtag was not included on a Tweet it should be interpreted the Tweet is not meant to be part of that public discussion/corpus.

 

 

 

 

A #HEFCEmetrics Twitter Archive (Friday 16 January 2015, Warwick)

HEFCE logo

The HEFCE metrics workshop: metrics and the assessment of research quality and impact in the arts and humanities took place on Friday 16 January 2015, 1030 to 1630 GMT at the Scarman Conference Centre, University of Warwick, UK.

I have uploaded a dataset of 821 Tweets tagged with #HEFCEmetrics (case not sensitive):

Priego, Ernesto (2015): A #HEFCEmetrics Twitter Archive (Friday 16 January 2015, Warwick). figshare.
http://dx.doi.org/10.6084/m9.figshare.1293612

TheTweets in the dataset were publicly published and tagged with #HEFCEmetrics between 16/01/2015 00:35:08 GMT and 16/01/2015 23:19:33 GMT. The collection period corresponds to the day the workshop took place in real time.

The Tweets contained in the file were collected using Martin Hawksey’s TAGS 6.0. The file contains 2 sheets.

Only users with at least 2 followers were included in the archive. Retweets have been included. An initial automatic deduplication was performed but data might require further deduplication.

Please note the data in this file is likely to require further refining and even deduplication. The data is shared as is. The contents of each Tweet are responsibility of the original authors. This dataset is shared to encourage open research into scholarly activity on Twitter. If you use or refer to this data in any way please cite and link back using the citation information above.

For the #HEFCEmetrics Twitter archive corresponding to the one-day workshop hosted by the University of Sussex on Tuesday 7 October 2014, please go to

Priego, Ernesto (2014): A #HEFCEmetrics Twitter Archive. figshare.
http://dx.doi.org/10.6084/m9.figshare.1196029

You might also be interested in

Priego, Ernesto (2014): The Twelve Days of REF- A #REF2014 Archive. figshare.
http://dx.doi.org/10.6084/m9.figshare.1275949

The Twelve Days of REF: A #REF2014 Archive

Cirrus word cloud visualisation of a corpus of 23,791 #REF2014 Tweets

I have uploaded a new dataset to figshare:

Priego, Ernesto (2014): The Twelve Days of REF- A #REF2014 Archive. figshare.

http://dx.doi.org/10.6084/m9.figshare.1275949

The file contains approximately 31,855 unique Tweets published publicly and tagged with #REF2014 during a 12-day period between 08/12/2014 11:18 and 20/12/2014 10:13 GMT.

For some context and an initial partial analysis, please see my previous blog post from 18 December 2014.

As always, this dataset is shared to encourage open research into scholarly activity on Twitter. If you use or refer to this data in any way please cite and link back using the citation information above.

Happy Christmas everybody.

The REF According to Twitter: A #REF2014 Update (18/12/14 16:28 GMT)

As everyone in some way aware of UK higher education knows, the results from the REF 2014 were announced in the first minute of the 18th of december 2014. Two main hashtags have been used to refer to it on Twitter; #REF and the more popular (“official”?) #REF2014.

There’s been of course other variations of these hashtags, including discussion about it not ‘hashing’ the term REF at all. Here I share a quick first look at a sample corpus of  texts from Tweets publicly tagged with #REF2014.

This is just a quick update of a work in progress. No qualitative conclusions are offered, and the quantitative data shared and analysed is provisional. Complete data sets will be published openly once the collection has been completed and the data has been further refined.

The Numbers

I looked at a sample corpus of 23,791 #REF2014 Tweets published by 10,654 unique users between 08/12/2014 11:18 GMT and 18/12/2014 16:32 GMT.

  • The sample corpus only included Tweets from users with a minimum of two followers.
  • The sample corpus consists of 1 document with a total of 454,425 words and 16,968 unique words.
  • The range of Tweets per user varied between 70 and 1, with the average being 2.3 Tweets per user.
  • Only 8 of the total of 10,654 unique users in the corpus published between 50 and 80 Tweets; 30 users published more than 30 Tweets, with 9,473 users publishing between 1 and 5 Tweets only.
  • 6,585 users in the corpus published one Tweet only.

A Quick Text Analysis

Voyant Tools was used to analyse the corpus of 23,791 Tweet texts. A customised English stop words list was applied globally. The most frequent word was “research”, repeated 8,760 times in the corpus; it was included in the stop-word list (as well as, logically, #REF2014).

A word cloud of the whole corpus using the Voyant Cirrus tool looked like this (you can click on the image to enlarge it):

Cirrus word cloud visualisation of a corpus of 23,791 #REF2014 Tweets

#REF2014  Top 50 Most frequent words so far

Word Count
uk 4605
results 4558
top 2784
impact 2091
university 1940
@timeshighered 1790
ranked 1777
world-leading 1314
excellence 1302
universities 1067
world 1040
quality 1012
internationally 933
excellent 931
overall 910
great 827
staff 827
academics 811
proud 794
congratulations 690
rated 690
power 666
@cardiffuni 653
oxford 645
leading 641
best 629
news 616
education 567
5th 561
@gdnhighered 556
@phil_baty 548
ucl 546
number 545
law 544
today 536
table 513
analysis 486
work 482
higher 470
uni 460
result 453
time 447
day 446
cambridge 430
just 428
@ref2014official 427
group 422
science 421
big 420
delighted 410

Limitations

The map is not the territory. Please note that both research and experience show that the Twitter search API isn’t 100% reliable. Large tweet volumes affect the search collection process. The API might “over-represent the more central users”, not offering “an accurate picture of peripheral activity” (González-Bailón, Sandra, et al. 2012). It is not guaranteed this file contains each and every Tweet tagged with the archived hashtag during the indicated period. Further dedpulication of the dataset will be required to validate this initial look at the data, and it is shared now merely as an update of a work in progress.

References

Gonzalez-Bailon, Sandra and Wang, Ning and Rivero, Alejandro and Borge-Holthoefer, Javier and Moreno, Yamir, “Assessing the Bias in Samples of Large Online Networks” (December 4, 2012). Forthcoming in Social Networks. Available at SSRN: http://ssrn.com/abstract=2185134 or http://dx.doi.org/10.2139/ssrn.2185134

On Metrics and Research Assessment

Oh research where art thou?

Next Monday 30th June 2014 at noon is the deadline to reply to the ‘Call for Evidence’ for HEFCE’s Independent review of the role of metrics in research assessment. I share some quick notes on my position as an individual researcher. Needless to say my personal position is based on my experience as a researcher and as the result of reading research in the area. Please excuse the lack of hyperlinked references in the body of the text; however  I have included a bibliography at the end of the post where I link to each reference.

A combination of traditional citation metrics and ‘alternative’ article-level metrics can be used across different academic disciplines to assess the reach (in terms of academic and public ‘impact’) of excellent research undertaken in the higher education sector (Liu and Adie 2013).

If increased international public access and impact are to be key factors in 21st century research assessment, the adoption of metrics, and particularly article-level metrics, is essential. Scholarly outputs published with Digital Object Identifiers can be easily tracked and measured, and as scholars in different fields adopt online methods of dissemination more widely, the data we can obtain from tracking it should not be ignored by assessment panels.

Article-level-metrics on scholarly outputs are already being tested by institutional repositories and publishers across the board. The data is open and facilitates further research, and some evidence for qualitative impact storytelling. Merely on their own, metrics of any kind (understood as mostly quantitative data) cannot and should not be used to assess either impact or ‘excellence’.

However, citation metrics and online mention metrics (“altmetrics”) can provide valuable data that can and should be subject to quantitative and qualitative analyses and review. Qualitative assessment in the form of “impact stories” can be informed by quantitative data provided by alternative metrics providers and methodologies (Neylon 2010; Priego 2012).

The San Francisco Declaration on Research Assessment (DORA) made the general recommendation of not using journal-based metrics, such as Journal Impact Factors, as a surrogate measure of the quality of individual research articles, to assess an individual researcher’s contributions, or in hiring, promotion, or funding decisions.

DORA provides recommendations for institutions, funding agencies, publishers and organisations that supply metrics.  An analysis of available data on individual DORA signers as of June 24, 2013, showed that 10963 individuals and 484 institutions had signed and that 6% were in the humanities and 94% in scientific disciplines; this in itself reflects an important disparity across fields that should be taken into account.

The ‘gaming’ of any kind of metric is possible by definition. It is critical that previous efforts in developing good practice in the measurement and assessment of research are adopted or at least taken into account. DORA makes it explicit that the gaming of metrics will not be tolerated and altmetrics service providers are openly working towards good practice and transparent methodologies (Adie 2013).

Social media adoption by scholars for scholarly dissemination is an important aspect of academic communications. It is wide, varies across disciplines and is still fairly recent (Priem 2011: Adie and Roe 2013;  Sud and Thelwall 2013). Therefore the discovery of correlations between online mentions, downloads and traditional citations is expected to be low since the citation window is still too small. Previous research however demonstrates there are positive yet still low correlations between downloads and citation counts.

Recent and ongoing research shows that Open Access publications can lead to greater number of downloads and social media mentions. Though research looking for possible correlations between Open Access and citation counts exists, the findings vary and the citation window is still too small and more time and research will be needed to determine if positive correlations exist as a general rule (Alperin 2014; Bernal 2013; Costas 2014; Kousha & Thelwall 2007).

It could be predicted that it is likely there will be positive correlations in some cases but not all, as the scholarly, institutional, technological, economic and social variables are multiple and platform and culture-dependent. Likewise, current business models from so-called hybrid publishers that enable Open Access via Article Processing Charges are likely to privilege the dissemination of outputs of those with existing funding schemes to cover them. Similarly a prevalence of academic journals, particularly in the arts and humanities have yet to have a significant, sustainable online presence, and many still lack DOIs to enable their automated and transparent tracking. However, institutional repositories are already embracing altmetrics as a means of both tracking and encouraging engagement with the resources, and the ability to track and measure engagement with grey literature can be a good source of evidence of the role these outputs play in the research and publication life-cycle.

Moreover, some fields privilege the publication of multi-author outputs whilst others prefer single author publications. This clearly puts both those without Open Access funding and single author papers at a quantitative disadvantage. As stated above it is crucial that research assessment employing metrics is based on  qualitative analyses and takes differences in disciplinary cultures into account. Research assessment employing metrics should be conducted on a case-by-case basis even if it is difficult, time-consuming and/or costly.

It is also critical that any assessment of article-level metrics understands how these metrics are possible in the first place and  has an informed awareness of the disparities in social media adoption for scholarly purposes across different disciplinary boundaries in the Higher Education sector. Direct experience and ongoing research shows evidence that at the moment some STEM fields are over-represented online (on blogs, social media and Open Access journals and monographs) while social sciences, arts and humanities outputs are lagging behind.

Traditional citation metrics unfairly benefit those publishing in standard channels and particularly those in the Global North, leaving developing countries scholars at a disadvantage (Alperin 2013; Priego 2014). Alternative metrics more accurately measure the wider reach of scholarly outputs, and might better serve most scholars fostering a research culture that supports national and international research impact objectives.

Even though there is still a bias towards North American and European publications, altmetrics can provide advantages to scholars interested in promoting their research online internationally by addressing public needs and enabling easier discovery and access to research outputs long underrepresented in the traditional literature and databases (Alperin 2014). Moreover, the geolocation data obtainable through altmetrics services offers evidence of both the disparities and international reach of both the production and consumption of research online.

In the internaitonal context some recent and ongoing research suggests that Open Access publications tracked via article-level metrics have a wider international reach and impact; there is a growing body of evidence this is the case in both Latin America and some regions in Africa (see the OpenUCT/ Scholarly Communication in Africa Programme (SCAP) reports as well as Alperin 2014; Priego 2013, 2014; Neylon, Willmers & King 2014).

The success of automated methods to obtain quantitative indicators of the reach, reception and use of scholarly outputs depends on our ability as scholarly communities to realise and develop the potential of the Web for scholarly communications. Developers, adopters and advocates of the use of article-level metrics do not claim that quantitative indicators should be taken at face value. Online publishing offers the unique opportunity to track, measure and evaluate what happens to scholarly outputs once they have been published on the Web. They allow us to make comparisons between dissemination and access models across countries and disciplinary boundaries. More importantly the data they provide is not static, passive quantitative data, but ‘interactive’ as they work as platforms for social interactions between researchers (potentially worldwide, where conditions allow it) enabling the easier, faster discoverability, collecting, exchange and discussion of outputs.

Not embracing article-level metrics or alternative metrics/ altmetrics in research assessment when the 21st century is well underway would be a missed opportunity to push towards a scholarly culture of wider public engagement and adoption of innovative online platforms for scholarly dissemination.

Adopting purely quantitative methods, and even more suggesting that any metric, however large, can equate to “excellence” would be misguided and potentially catastrophic, particularly for those not in STEM areas or without the backing of elite institutions. Only the careful, professional qualitative assessment of live, transparent publishing data will be able to provide evidence of the public and scholarly, local and international reach and reception of excellent research.

References

Adie, E., & Roe, W. (2013). Enriching scholarly content with article-level discussion and metrics. Learned Publishing, 26(1), 11–17. doi:10.6084/m9.figshare.105851

Adie, E. (2013). Gaming Altmetrics. Altmetric. September 18 2013. Available from http://www.altmetric.com/blog/gaming-altmetrics/

Alperin, J. P. (2013). Ask not what altmetrics can do for you, but what altmetrics can do for developing countries. Bulletin of the American Society for Information Science and Technology, 39(4), 18–21. doi:10.1002/bult.2013.1720390407

Alperin, Juan Pablo (2014): Exploring altmetrics in an emerging country context. figshare.
http://dx.doi.org/10.6084/m9.figshare.1041797

Bernal, I. (2013). Open Access and the Changing Landscape of Research Impact Indicators: New Roles for Repositories. Publications, 1(2), 56–77. Retrieved from http://www.mdpi.com/2304-6775/1/2/56

Costas, R., Zahedi, Z., & Wouters, P. (2014). Do “altmetrics” correlate with citations? Extensive comparison of altmetric indicators with citations from a multidisciplinary perspective (p. 30). Leiden. Retrieved from http://www.cwts.nl/pdf/CWTS-WP-2014-001.pdf

Konkiel, S. (2013, November 5). Altmetrics in Institutional Repositories. Retrieved from https://scholarworks.iu.edu/dspace/handle/2022/17122

Kousha, K., & Thelwall, M. (2007). The Web impact of open access social science research. Library & Information Science Research, 29(4), 495–507. Retrieved from http://www.sciencedirect.com/science/article/B6W5R-4PX16VS-1/2/6c778fe766bc07c98ef39dbdd8f2b450

Liu, J., & Adie, E. (2013, May 30). Altmetric: Getting Started with Article-Level Metrics. figshare. http://figshare.com/articles/Altmetric_Getting_Started_with_Article_Level_Metrics/709018

Mohammadi, E., & Thelwall, M. (2014). Mendeley readership altmetrics for the social sciences and humanities: Research evaluation and knowledge flows. Journal of the American Society for Information Science and Technology. Retrieved from http://www.scit.wlv.ac.uk/~cm1993/papers/EhsanMendeleyAltmetrics.pdf

Neylon, C. (2011). Re-use as Impact: How re-assessing what we mean by “impact” can support improving the return on public investment, develop open research practice, and widen engagement . Altmetrics. Retrieved from http://altmetrics.org/workshop2011/neylon-v0/

Neylon, C. (2010). Beyond the Impact Factor: Building a community for more diverse measurement of research. Science in the Open. Retrieved November 29, 2010, from http://cameronneylon.net/blog/beyond-the-impact-factor-building-a-community-for-more-diverse-measurement-of-research/

Neylon C, Willmers M and King T (2014). Rethinking Impact: Applying Altmetrics to Southern African Research. Working Paper 1, Scholarly Communication in Africa Programme. http://openuct.uct.ac.za/sites/default/files/media/SCAP_Paper_1_Neylon_et_al_Rethinking_Impact.pdf

OpenUCT Initiative Publications and SCAP reports. Available from http://openuct.uct.ac.za/publications

Priego, E. (2012). Altmetrics’: quality of engagement matters as much as retweets. Guardian Higher Education Network, Friday 24 August 2012. Retrieved from http://www.theguardian.com/higher-education-network/blog/2012/aug/24/measuring-research-impact-altmetic

Priego, E. (2013). Fieldwork: Apples and Oranges? Online Mentions of Papers About the Humanities. Altmetric, January 11 2013. Retrieved from http://www.altmetric.com/blog/apples-oranges-online-mentions-papers-about-humanities/

Priego, E. (2013). Alt-metrics, Digital Opportunity and Africa. Impact of Social Sciences, London School of Economics. February 6 2013. Retrieved from http://blogs.lse.ac.uk/impactofsocialsciences/2013/02/06/alt-metrics-digital-opportunity-and-africa/

Priego, E. (2014). The Triple A: Africa, Access, Altmetrics. 22 February 2014. Retrieved from https://epriego.wordpress.com/2014/02/22/the-triple-a-africa-access-altmetrics/

Priem, J., Hall, M., Hill, C., Piwowar, H., & Waagmeester, A. (2011). Uncovering impacts : CitedIn and total-impact , two new tools for gathering altmetrics . iConference 2012, 9–11. Retrieved from http://jasonpriem.org/self-archived/two-altmetrics-tools.pdf

Priem, J., Piwowar, H. A., & Hemminger, B. H. (n.d.). Altmetrics in the wild: An exploratory study of impact metrics based on social media. Metrics 2011: Symposium on Informetric and Scientometric Research. New Orleans, LA, USA. Retrieved from http://jasonpriem.org/self-archived/PLoS-altmetrics-sigmetrics11-abstract.pdf

Sud, P., & Thelwall, M. (2013). Evaluating altmetrics. Scientometrics. Retrieved from http://link.springer.com/10.1007/s11192-013-1117-2