Questions of Access in the Digital Humanities: Data from JDSH

[On 8 August 2017, this post was selected as Editor’s Choice in Digital Humanities Now at]

[N.B. As usual, typos might still be present when you read this; this blog post is likely to be revised post-publication… thanks for understanding. This blog is a sandbox of sorts].

Para Domenico, siempre en deuda

tl;dr, scroll down to the charts

I used The Altmetric Explorer to locate any  articles from the Journal of Digital Scholarlship in the Humanities that had had any ‘mentions’ online anytime. An original dataset of 82 bibliographic entries was obtained. With the help of Joe McArthur the Open Access Button API was then employed to detect if any of the journal articles in the dataset had open access surrogates (for example, self-archived versions in institutional repositories) and if so, which content they actually provided access to. The API located 24 URLs of the 82 DOIs corresponding to each article in the dataset.

I then edited and refined the original dataset to include only the top 60 results. Each result was manually refined and cross-checked to verify the resulting links matched the correct outputs and to what kind of content they provided access to, as well as to identify the type of license and type of access of each article’s version of record.

A breakdown of the findings below:

Visualisation of numeralia from the JDSH 60 Articles Altmetric-OA Button Dataset

(Note numbers re OA Button results will not add up as there are overlaps and some results belong to categories not listed).

It must be highlighted that only one of the links located via the Open Access Button API provided access to an article’s full version.

This disciplinarily-circumscribed example from a leading journal in the field of the digital humanities provides evidence for further investigations into the effects of publishers’ embargos on the ability of institutional open access repositories to fufill their mission effectively.

The dataset was openly shared on figshare as

Priego, Ernesto (2017): A Dataset Listing the Top 60 Articles Published in the Journal of Digital Scholarship in the Humanities According to the Altmetric Explorer (search from 11 April 2017), Annotated with Corresponding License and Access Type and Results, when Available, from the Open Access Button API (search from 15 May 2017). figshare.


The Wordy Thing

Back in 2014, we suggested that “altmetrics services like the Altmetric Explorer can be an efficient method to obtain bibliographic datasets and track scholarly outputs being mentioned online in the sources curated by these services” (Priego et al 2014).  That time we used the Explorer to analyse a report obtained by searching for the term ‘digital humanities’ in the titles of outputs mentioned anytime at the time of our query.

It’s been three years since I personally presented that poster at DH2014 in Lausanne, but the topic of publishing pracitices within the digital humanities keeps being of great interest to me. It could be thought of as extreme academic navel-gazing, this business of deciding to look into bibliometric indicators and metadata of scholarly publications. For the digital humanities, however, questions of scholarly communications are questions of methodology, as the technologies and practices required for conducting research and teaching are closely related to the technologies and practices required to make the ‘results’ of teaching and research available. For DH insiders, this is closely connected to the good ol’ less-yacking-more-hacking, or rather, no yacking without hacking. Today, scholarly publishing is all about technological infrastructure, or at least about an ever-growing awareness of the challenges and opportunities of ‘hacking’ the modes of scholarly production.

Moreover, the digital humanities have also been for long preoccupied with the challenges in getting digital scholarship recoginsed and rewarded, and, also importantly, about the difficulties to ensure the human, technical and financial preconditions of sustainability. Scholarly publishing, or more precisely ‘scholarly communications’ as we prefer to say today, are also very much focused on those same concerns. If form and content are unavoidably interlinked and codependent in digital humanities practice, surely issues regarding the so-called ‘dissemination’ of said practice through publications remain vital to its development.

Anyway, I have now finally been able to share a dataset based on a report from the Altmetric Explorer looking into the articles published at the Journal of Digital Scholarship in the Humanities (from now on JDSH), one of the (if not the) leading journal in the field of digital humanities (it was previously titled Literary and Linguistic Computing). I first started looking into which JDSH articles were being tracked by Altmetric as mentioned online for the event organised by Domenico Fiormonte  at the University Roma Tre in April this year (the slides from my participation are here).

My motivation was no only to identify which JDSH outputs (and therefore authors, affiliations, topics, methodologies) were receiving online attention according to Altmetric. I wanted, as we had done previously in 2014, to use an initial report to look into what kind of licensing said articles had, whether they were ‘free to read’, paywalled or labeled with the orange open lock that identifies Open Access outputs.

Back in 2014 we did not have the Open Access Button nor its plugin and API. With it I had the possibility to try to check if any of the articles in my dataset had any openly/freely available versions through the Button. I contacted Joe McArthur from the Button to enquire whether it would be possible to run a list of DOIs through their API in bulk. It was, and we obtained some results.

Here’s a couple of very quick charts visualising some insights from the data.

It should also be highlighted that of the 6 links to institutional repository deposits found via the Open Access Button API, only one gave open access to the full version of the article. The rest were either metatada-only deposits or the full versions were embargoed.

As indicated above, the 60 ‘total articles’ refers to the number of entries in the dataset we are sharing. There are many more articles published in JDSH. The numbers presented represent only the data in question which is in turn the result of particular methods of collection and analysis.

In 2014 we detected that “the 3 most-mentioned outputs in the dataset were available without a paywall”, and we thought that could indicate “the potential of Open Access for greater public impact.” In this dataset, the three articles with the most mentions are also available without a paywall. The most mentioned article is the only one in the set that is licensed with a CC-BY license. The two that follow are ‘free’ articles that require permission for reuse.

The data presented is the result of the specific methods employed to obtain the data. In this sense this data represents as much a testing of the technologies employed as of the actual articles’ licensing and open availability. This means that data in columns L-P reflect the data available through the Open Access Button API at the moment of collection. It is perfectly possible that ‘open surrogates’ of the articles listed are available elsewhere through other methods. Likewise, it is perfectly possible that a different corpus of JDSH articles collected through other methods (for example, of articles without any mentions as tracked by Altmetric) have a different proportion of license and access types etc.

As indicated above the licensing and access type of each article were identified and added manually and individually. Article DOI’s were accessed one by one with a computer browser outside/without access to university library networks, as the intention was to verify if any of the articles were available to the general public without university library network/subscription credentials.

This blog post and the deposit of the data is part of a work in progress and is shared openly to document ongoing work and to encourage further discussion and analyses. It is hoped that quantitative data on the limited level of adoption of Creative Commons licenses and Institutional Repositories within a clearly-circumscribed corpora can motivate reflection and debate.


I am indebted to Joe McArthur for his kind and essential help cross-checking the original dataset with the OA Button API, and to Euan Adie and all the Altmetric team for enabling me to use the Altmetric Explorer to conduct research at no cost.

Previous Work Mentioned

Priego, Ernesto; Havemann, Leo; Atenas, Javiera (2014): Online Attention to Digital Humanities Publications (#DH2014 poster). figshare. Retrieved: 18:46, Aug 04, 2017 (GMT).

Priego, Ernesto; Havemann, Leo; Atenas, Javiera (2014): Source Dataset for Online Attention to Digital Humanities Publications (#DH2014 poster). figshare. Retrieved: 17:52, Aug 04, 2017 (GMT)

Priego, Ernesto (2017): Aprire l’Informatica umanistica / Abriendo las humanidades digitales / Opening the Digital Humanities. figshare. Retrieved: 18:00, Aug 04, 2017 (GMT)

The Triple A: Africa, Access, Altmetrics

Time flies and I can’t contain my excitement that I will be participating in the Discoverability of African Scholarship Online. Practical strategies and collaborative approaches workshop in Nairboi, Kenya, organised by the Open UCT Initiative. For me there’s three very important A’s in scholarly communications: Africa, Access and Altmetrics.

I have been doing some digging, refining and visualising this week, and today I shared two first rough drafts of a couple of alluvial charts I made visualising a dataset of the 25 highest scoring peer-reviewed articles with the term “Africa” in the title (within the timeframe of one year). To collect the articles data I used the Altmetric Explorer. The data corresponds to a report I exported on the 19th of February 2014.

The Altmetric score is a quantative measure of the quality and quantity of attention that a scholarly article has received. It takes into account three main factors:

  1. Volume. The score for an article rises as more people mention it. The Explorer only counts one mention from each person per source, so if someone tweet about the same paper more than once Altmetric will ignore everything but the first.
  2. Sources. Each category of mention contributes a different base amount to the final score. For instance, a newspaper article contributes more than a blog post which contributes more than a tweet.  Altmetric looks at how often the author of each mention talks about scholarly articles, whether or not there’s any bias towards a particular journal or publisher and at who their audience is.
  3. Authors.  For example, a scholar sharing a link with other scholars counts for far more than a journal account pushing the same link out automatically.

The focus of my study, however, is not necessarily the Altmetric score itself. One of my goals is to try to discover patterns or correlations between journal title, country of affiliation of Principal Investigator, access type of the article and the attention the article in question gets online. Logically the dataset I obtained and refined and its visualisations are not representative of all scholarly outputs with “Africa” in the title out there, but only of the data Altmetric is able to track in the first place.

The original dataset contained 2826 articles. I refined this set using Open Refine, to ensure there were no duplicates, text encoding errors, irrelevant entries (for example articles not about Africa but by authors whose first name is Africa, or academic news items that are not peer-reviewed). I then manually edited a CSV file of the top 25 peer-reviewed articles, and then created another one so I had only the categories I wanted to visualise and added other columns like PI country and Access Type.

I used Raw to make the diagrams. Alluvial diagrams can be helfpul to visualise flows and reveal correlations between categories; visually linking to the number of elements sharing the same categories. I wanted to see if this kind of diagram could provide a quick and clear insight on any possible correlations between access type and a higher number of online mentions. I manually looked at all the 25 articles, to check access type and country of affiliation of the Principal Investigators.

Though painstakingly time-consuming, I made some interesting discoveries in doing this by hand (for example many articles about Africa are co-authored by PIs based outside Africa with collaborators from African institutions, with an overwhelming South African majority). Another insight in the data but not visualised in these two charts is the dominance of articles with a focus on South Africa only.

I will share the original dataset file later on, as I still want to make sure the file is presentable enough to share publicly. In the meanwhile I have deposited both diagrams as figures to Figshare, and posted them here for your perusal. I will keep working on these diagrams, as they need to be edited to add different colours, etc., and to write-up a proper qualitative narrative of what we make of the data.

 Priego, Ernesto (2014): Alluvial Diagram- 25 Highest Scoring Academic Articles with 'Africa" in the Title, including Access Type. figshare.
Priego, Ernesto (2014): Alluvial Diagram- 25 Highest Scoring Academic Articles with ‘Africa” in the Title, including Access Type. figshare.
 Priego, Ernesto (2014): Alluvial Diagram- 25 Highest Scoring Academic Articles with 'Africa" in the Title, including Access Type. figshare.
Priego, Ernesto (2014): Alluvial Diagram- Country of Affiliation of Principal Investigator/Author of the 25 Highest Scoring Academic Articles with ‘Africa” in the Title, including Journal and Access Type. figshare.

A quick insight from both diagrams is that open access articles are having, according to Altmetric, more mentions online, on blogs, media and online social networks. African Principal Investigators, however, are the minority in this top 25 set, with only South African researchers representing the whole continent.

There is only one article which is not within the STEM disciplinary boundaries proper– on mobile phone coverage and its relationship to political violence, published in the American Political Science Review. This might also be a reflection on the sources Altmetric tracks (where social sciences, arts and humanities are a minority).

It is also noticeable there are two different articles in the Journal of Infectious Diseases on antiretroviral therapy, with very similar titles, one open access and the other paywalled. The former has a slightly higher Altmetric score. I could not find out if the authors of the paywalled article were the same, as the paywall did not link to author information.

It also appears that at least for articles with the term “Africa” in the title, from the journals that Altmetric tracks, UK authors are divided in their adoption of open access.

24 February 2014. Correction: I had accidentally added the same caption to both diagrams; I have corrected this so the second diagram has the correct caption and doi.

A follow-up was published here.