People, Government: Top 300 Terms in the Conservative and Labour Manifestos 2017 (Counts and Trends)

A word cloud of the most frequent 500 terms in the Conservative Manifesto 2017. Word cloud created with Voyant Tools.
A word cloud of the most frequent 500 terms in the Conservative Manifesto 2017. Word cloud created with Voyant Tools.

The Labour and Conservative Manifestos 2017 are arguably two of the most important public documents in the UK these days. I have just deposited the following data on figshare:

Priego, Ernesto (2017): Top 300 Terms in the Conservative and Labour Manifestos 2017 (Counts and Trends). figshare. https://doi.org/10.6084/m9.figshare.5016983.v1

I thought some may be interested in practicing some distant reading, or have some fun composing your own Manifesto…

The 2016 Altmetric Top 100 Outputs with ‘Comics’ as Keyword

screen-shot-2016-12-21-at-17-27-25

Any frequent readers of this blog will be aware I am interested in article level metrics. I am particularly interested in the work done by Altmetric. Last week they published their annual top 100 list. I wrote this post about it.

 The Altmetric Explorer is a tool for measuring the attention that scholarly articles receive online, and its intuitive user interface works as a live searchable database that allows users to browse the journals and repositories Altmetric tracks and obtain detailed reports.

On a weekly basis Altmetric captures hundreds of thousands of tweets, blog posts, news stories, Facebook walls and other content that mentions scholarly articles on the Web. The Explorer can browse, search and filter this data. The data can be exported by the user as ‘reports’ as simple text or spreadsheets, which can be then analysed in different forms. For example, The Explorer provides demographic data of the Twitter users found mentioning specific outputs, and thus works as a mechanism for the study of academic users of social media.

In the past few years I have often suggested, online, in talks, workshops and lectures, that the Altmetric Explorer can be useful to researchers as well. Librarians with access to the tool can help students and researchers get new views of recent articles that are receiving attention online. People often focus on ‘altmetrics’ as indicators of online activity around published outputs, but I often insist the Altmetric Explorer is useful as well as a tool for searching, discovering, collecting, creating, archiving, sharing and analysing bibliographic reference collections as datasets including not just bibliographic data including identifiers and/or URLs but also historical data of any metrics the service has tracked and quantified at the time of the data query/collection.

Inspired by Altmetric’s annual Top 100 list I used the Altmetric Explorer to search for the top articles with keyword ‘comics’ mentioned in the past 1 year. I did this particular search on the morning of Tuesday 20 December 2016. Dating the collection (and indicating the specific query) is always important as social media metrics are hopefully dynamic and not static (i.e. we expect an output’s altmetrics to change over time).

After my query I saved as usual my search as  a ‘workspace’ on the app and then exported the dataset as a CSV file. I then manually cleaned and refined the data to obtain a file listing the top 100 references specifically on comics including their altmetrics. Data refining was needed to ensure the list included articles about comics, eliminating any non-relevant outputs (i.e. they were not about comics) and to correct text rendering errors, add missing data (like output titles when missing from the initial export) and limit the set to only 100 items by deleting the extra outputs.*

I have deposited and shared the dataset as

Priego, Ernesto (2016): The 2016 Altmetric Top 100 Outputs with ‘Comics’ as Keyword Mentioned in the Past 1 Year. figshare. https://dx.doi.org/10.6084/m9.figshare.4483631.v4 Retrieved: 17 06, Dec 21, 2016 (GMT)

Hopefully it will be of interest to some of you out there. For comparison here’s these other datasets I have deposited on figshare in previous years:

Priego, Ernesto (2015): Almetrics of articles from the comics journals mentioned at least once in the past 1 year as tracked by Altmetric (20 August 2015). figshare. https://dx.doi.org/10.6084/m9.figshare.1514985.v3 Retrieved: 17 21, Dec 21, 2016 (GMT)

and

Priego, Ernesto (2014): Comics Journals Articles Tracked by Altmetric in the last year (Dec 2013-Dec 2014). figshare. https://dx.doi.org/10.6084/m9.figshare.1273850.v4 Retrieved: 17 23, Dec 21, 2016 (GMT)

 

Though the two datasets above are outputs from different search queries (focusing on specific comics journals tracked by Altmetric rather than in any articles with keyword ‘comics’) we should we able to continue collecting data for future transversal studies.

Having yearly datasets obtained from the same queries, over a series of years, would provide evidence of comics scholarship’s presence online, and of the field’s (and Altmetric’s)  evolving practices.

*It is possible the degree of relevance varies. Some outputs do not have ‘comics’ in their title but do discuss comics, for example ‘A randomized study of multimedia informational aids for research on medical practices: Implications for informed consent’ (Kraft et al 2016). It is possible however that a non-comics article or two remained, if you spot one do please let me know or leave a comment on the figshare output and I will correct and create a new version. It might also be noted that various outputs included are from The Conversation, which is not an academic journal, but it is tracked by Altmetric as it focuses on academic research news written by academics. For information and context about how Altmetric sources the data please read this.

Sheffield Digital Humanities Congress 2016: #dhcshef 100 Most Frequent Terms

 A view of the #dhcshef 2016 dataset with Martin Hawksey's TAGS Explorer
A view of the #dhcshef 2016 dataset created with Martin Hawksey’s TAGS Explorer

The Sheffield Digital Humanities Congress 2016 was held from the 8th to the 10th of September 2016 at the University of Sheffield. The full conference programme is available here: http://www.hrionline.ac.uk/dhc.

The event’s official hashtag was the same as in previous editions, #dhcshef.

I made a collection of Tweets tagged with #dhcshef published publicly between Monday September 05 2016 at 17:54:58 +0000 and Saturday September 10 2016 at 23:37:06 +0000. This time I used Tweepy 3.5.0, a Python wrapper for the Twitter API, for the collection. To compare results I also used, as usual, Martin Hawksey’s TAGS, with results being similar (I only collected Tweets from accounts with at least 1 follower).

As in previous occasions I extracted the text and usernames from this dataset and used VoyantTools for a basic text analysis. The dataset contained 1479 Tweets posted by 256 different accounts. 841 of those were RTs. The text of the Tweets composed a corpus with 26,094 total words and 3,057 unique word forms.

I used Voyant’s Terms tool to get the most frequent terms, applying an edited English stop words list that included Twitter and congress-specific terms (this means that words expected to be frequent like ‘digital’, ‘humanities’, ‘congress’, ‘sheffield’, as well as usernames, project’s names and people’s names were filtered out). I exported a list of 500 most frequent terms and then I manually refined the data so remaining people or project’s names were removed. (This is not case sensitive so I may have made mistakes and further disambiguation and refining would be required). If you are interested I previously detailed a similar methodology here.

Here’s my resulting list of the 100 most frequent terms.

Term Count
great

106

project

98

data

76

research

64

students

63

word

58

funding

55

work

55

just

53

spread

53

use

51

opportunity

48

text

47

historical

46

oa

46

looking

45

open

45

editions

44

pedagogy

40

academic

38

access

36

keynote

36

like

36

analysis

35

follow

35

using

34

book

33

new

33

projects

33

university

33

important

32

innovation

32

today

32

tomorrow

32

early

31

minimal

31

paper

31

south

31

content

30

excellent

30

love

30

social

30

look

29

talking

29

tools

29

discussing

28

global

28

grants

28

london

28

network

28

review

28

forward

27

libraries

27

resources

27

sudan

27

history

26

talk

26

books

25

online

25

programme

25

really

25

teach

25

teaching

25

digitisation

24

issues

24

tactical

24

archive

23

critique

23

make

23

different

22

need

22

peer

22

session

22

cultural

21

heritage

21

starts

21

studies

21

value

21

art

20

cool

20

don’t

20

good

20

live

20

press

20

start

20

arts

19

available

19

colleagues

19

delegates

19

going

19

metadata

19

presenting

19

day

18

digitised

18

let’s

18

networks

18

notes

18

person

18

started

18

begins

17

Please bear in mind that RTs count as Tweets and therefore the repetition implicit in RTs affects directly the frequent term counts. What terms made it into the top 100 reflects my own bias (I personally didn’t want to see how many times ‘digital’ or ‘humanities’ was repeated), but individual trend counts remain the same regardless.

I appreciate the stop words selection is indeed subjective (deictics like ‘tomorrow’ or ‘today’ may very well mean very little).  It’s up to the reader to judge if such a listing offers any insights at all; as Twitter moves relentlessly and as such data remains a moving a target, I’d like to believe that collecting and looking into frequent terms offers at least another point of view if not gateway into how a particular academic event is represented/discussed/reported on Twitter. Perhaps it’s my enjoyment of poetry that makes me think that seeing words out of context (or recontextualised) like this can offer some kind of food for thought or creativity.

Interestingly the dataset showed user_lang metadata other than en or en-GB: de, es, fr, it, nl and ru were also present even if in minority. The dataset also showed that some sources are clearly identified as bots.

I am fully aware this would be more interesting and useful if there were opportunities for others to replicate the text analysis through access to the source dataset I used. There are lots of interesting types of analysis that could be run and data to focus on in such a dataset as this. I am simply sharing this post right now as a quick indicative update after the event concluded.

 

 

A belated #Transitions4 Archive, and a post summarising some data about comics scholars on Twitter

 Comics Scholars on Twitter? Yeah, A Few…

A very long title to announce I have finally published an archive of #transitions4 (2013) I collected more than a year ago, and that I have published a post on The Comics Grid blog summarising some data from my archives of tweets from comics conferences this year. Links below.

A #transitions4 Archive. figshare.

http://dx.doi.org/10.6084/m9.figshare.1252098

“Comics Scholars on Twitter? Yeah, A Few…” The Comics Grid blog, 26 November 2014.

 

Segundo Día de las Humanidades Digitales; Un Archivo de #díahd14

Visualización de un archivo de #diahd14 con TAGSExplorer
Visualización de un archivo de #diahd14 con TAGSExplorer

 

Here I reblog what I published on my blog for the second day of digital humanities in Spanish (Segundo día de las humanidades digitales).

Llego tarde.  Crear un blog toma tiempo, y encontrarlo es cada vez más difícil. En fin, a la mañana siguiente le robo tiempo a actividades más urgentes para dejar este sitio y estas líneas.

Como todos en este sitio sabrán el  “Segundo día de las humanidades digitales” se llevó a cabo el 15 de Octubre de 2014. Aunque hubo un poco de confusión sobre el hashtag a usar al parecer se acordó #díahd14 (acento o mayúsculas no importan).

He creado y compartido un documento que contiene un archivo que contiene aproximadamente 425 tweets únicos etiquetados con #díahd14 entre el  10/10/2014 10:39 GMT y el  16/10/2014 05:52 GMT.

Está en figshare en acceso abierto:

Priego, Ernesto (2014): Un archivo de #díahd14. figshare.
http://dx.doi.org/10.6084/m9.figshare.1206315

Algunos puntos importantes:

  • La columna E incluye los horarios de publicación en hora de la ciudad de México, D.F.
  • Se realizó una limpieza de datos automática inicial para evitar entradas duplicadas pero es posible que el archivo requiera de más refinamiento.
  • Se incluyeron todos los RTs, que cuentan como Tweets únicos.
  • Se incluye también una tercera hoja con datos cuantitativos generales del archivo y una cuarta con una lista de los 95 usuarios únicos que publicaron tweets con el hashtag y otros datos de la actividad de cada usuario.
  • Recuérdese que la API de Twitter Search suele sobre-representar los usuarios más activos y no se puede garantizar que este archivo contenga todos y cada uno de los tweets etiquetados con #diahd14.
  • Los datos se comparten tal y como se obtuvieron mediante una Twitter Archiving Google Spreadsheet (TAGS; Martin Hawksey).
  • Todos los tweets incluídos en este archivo fueron publicados públicamente en Twitter con la etiqueta #diahd14. Los contenidos de cada tweet son responsabilidad del usuario que los publicó.
  • El archivo se comparte con fines educativos y de investigación académica bajo una licencia de Creative Commons Atribución.

Una vez que se tienen los datos en Excel es facilísimo hacer un CSV del archivo y pensar en hacer diferentes análisis y visualizaciones con los datos. Lo comparto, como varios de mis otros archivos, esperando que a alguien le interese hacer algo con ellos.

También es posible les interese echarle un vistazo a

Priego, Ernesto (2014): #2EHD Archivo de Tweets del 2o Encuentro de Humanidades Digitales, México DF 19-23 de Mayo 2014. figshare. http://dx.doi.org/10.6084/m9.figshare.1037351 Retrieved 12:13, Oct 16, 2014 (GMT)

Por si a alguien le interesa, éste es un post mío del 28 de mayo de este año sobre por qué considero importante ir creando archivos de hashtags académicos en Twitter.

Espero todos hayan pasado un feliz y no tan ocupado día de las humanidades digitales.

El texto en español de este post se publicó originalmente en http://diahd2014.filos.unam.mx/ernestopriego/2014/10/16/un-archivo-de-diahd14/