Visualise This: Getting Attention to Journal Subscription Expenditure

[If you just got here please reload as you may be reading an older version. Thanks]

As part of ongoing research in collaboration with my colleague Domenico Fiormonte on academic publishing and ‘monopolies of knowledge’ (if curious, see this, and this, and this) we have been looking at the available data on the amount of money universities spend on journal subscriptions (both on line and and on print).

Perhaps the most important scholarly contribution to highlight the level of expenditure in the UK has been by Stuart Lawson et al (2015). “Journal subscription costs – FOIs to UK universities” is an essential reference to both understanding the difficulty of accessing consistent data and to gaining more detailed insights into how much UK universities spend on paywalled/subscription journals.

In my teaching I constantly address issues of quantitative and qualitative methods. It’s possible that my background in critical theory, philosophy and information studies still defines many of my approaches to numerical data and their interpretation. When it comes to visualising data, I assume that the map is not the territory (Korzybski 1931), and that no source data set, and no data visualisation is ever neutral. No matter how ‘clearly’ a visualisation may appear to represent data, different types of processes of interpretation have been required to render data in a particular shape and form. No matter how ‘correctly’ the data has been presented and how ‘appropriately’ the visualisation techniques have been applied, the result is never neutral nor objective. I appreciate this is still controversial, as science and the scientific method appear to still be concerned with revealing truths.

As I like doing research openly on the Web, I may sometimes share some bits of work in progress that perhaps no one else would otherwise see as they are part of the process and not definitive results. I like sharing them to produce reactions and as a means to obtain feedback as I’m still working on something. (Also because I like sharing, period). It’s also a way of documeting publicly some of my own workflow. I like experimenting, for example, with very quick, basic visualisations of data I have curated and cleaned a bit. Often these visualisations do not necessarily employ the ‘best’ or ‘correct’ ways of visualising the source data. Sometimes this is the result of trial and error (I like experimenting and I seek to learn from my mistakes) but sometimes I am aware I am just playing or seeking to use a ‘wrong’ method (on purpose!).

For instance, in this case I wanted to once again draw attention online (i.e. on Twitter) to the issue of the amount of money spent by UK institutions on journal subscriptions according to Lawson et al (2015). It’s been a year since they published the last version of the dataset so I wanted to reuse and link back again to their output. I knew they had already created a very handy interactive visualisation tool so people could explore the data in different ways.  I wanted to focus not on which institutions had spent how much, but on the total amount spent on 2014 on each of the ten main publishers. Of course this data could be visualised with a basic bar chart, and this already exists. My intention instead, I hoped, was to attract new attention, on Twitter, to Lawson et al’s data. My intention was to create a colourful image to make people wonder ‘what on earth is this?!‘ and hopefully follow the references (if they made the effort to click on the image, expand it, save it o their desktop to see more clearly).

Alluvial: Amount of money paid by 145 UK higher education institutions for journal subscriptions to ten major publishers in 2014

So.

I appreciate how annoying inadequate visualisation can be. I am aware an alluvial diagram such as the above is not the ‘right’ form to visualise this data. I saw this exercise as a very simple, very quick, consciously tongue-in-cheek strategy to use colour and flows to call attention to the names of the publishers and the flow of cash, in very boring alphabetical order, and if your disposition/sense of humour/tolerance and browser/screen/device allowed it, invite you to follow the paths to amounts spent. I was of course asware this was a labyrinthine way of doing it. I hoped to make those watching to stop for a second and try to figure out stuff. My intention sharing a chart like this is not to say ‘this is the clearest and best way to communicate this data’, but simply to attract some attention once again to the fact these publishers exist and that those amounts appear as spent on them in the source dataset.

I have read and often recommend Edward Tufte’s The Visual Display of Quantiative Information (1983), The Functional Art by Alberto Cairo (2013), Visualise This by Nathan Yau, and of course Tufte’s Beautiful Evidence (2006).  Cairo is right when he writes that “graphics, charts and maps are not just tools to be seen, but to be read and scrutinised” (2013:xx), and Tufte’s dictum “graphics reveal data” (1983:13) is the foundation of a widely-accepted code of best practice for visualising data.

Most handbooks and textbooks will insist that choosing the ‘right type’ of chart of visualisation is essential. Strictly speaking as professionals we can all agree this is correct, but we rarely interrogate what else could be done with different types of visualisations, or what the ‘wrong type’ of chart could help people make of the data. In my teaching, I often show ‘wrong types’ of visualisations as a means to explain why different types of data organisation and curation require, enable or reject different types of visualisation.

Moreover, thinking for longer about why a visualisation seems ‘wrong’ can help us think more carefully about our own assumptions and cultural paradigms. Hierarchical data, taxonomies and different methodologies of classification define and impose (and are themselves the result of) discoursive practices that, to my constant amazement, can themselves offer the very keys to their own unlocking and revealing. In other words, ‘wrong’ visualisations can illuminate things both about data and about visualisation techniques– and in a way they always-already do because if we can detect the visualisation is ‘wrong’ it’s because we have in some way paid attention to the data the visualisation is supposed to represent. In my book, ‘wrong’ visualisations are useful because they can make us pay attention and realise that visualisation is not transparent and should not be taken for granted as a method of interpretation. I suppose the best visualisations can almost make us forget they are constructs and that they are not reality itself but mediated representations.

When thinking about monopolies of knowledge, I’d like to argue that we need to focus on the key commercial entities playing a leading role in scholarly publishing. Knowing how much our institutions spend on subscriptions to journals from these publishers is important, but this tells only a part of the story. Perhaps my mindset when creating and sharing a quick exercise in visualisation like the above is that we need to focus on different aspects of the data, and that the data is never ‘raw’, but displayed and organised in different ways according to what we may want to emphasise or achieve. Perhaps it’s just overthinking what to some will be downright bad practice. As online visibility was what I was aiming to obtain I guess one of those objectives has been met (impressions are in the thousands now). I know data visualisation is a serious discipline and there are experts and scholars achieving absolute excellence out there. I am willing to accept the whole little exercise was a total failure. In any case I hope it does not hurt anyone for me to play a little bit with visualisation tools in order to see what happens if we do things in a quick and basic, non-orthodox way, in order to simply call a little bit of more attention to what I consider a pressing issue in scholarly communications.

References

Cairo, Alberto (2013): The Functional Art. An Introduction to Information Graphics and Visualization. New Riders.

Fiormonte, Domenico and Priego, Ernesto (2016): “Knowledge Monopolies and Global Academic Publishing”. The Winnower. https://thewinnower.com/papers/4965-knowledge-monopolies-and-global-academic-publishing

Korzybski, Alfred (1931): “A Non-Aristotelian System and its Necessity for Rigour in Mathematics and Physics”, a paper presented before the American Mathematical Society at the New Orleans, Louisiana, meeting of the American Association for the Advancement of Science, December 28, 1931. Reprinted in Science and Sanity, 1933, p. 747–61.

Lawson, Stuart; Meghreblian, Ben; Brook, Michelle (2015): Journal subscription costs – FOIs to UK universities. figshare. https://dx.doi.org/10.6084/m9.figshare.1186832.v23

Tufte, Edward (1983): The Visual Display of Quantiative Information. Graphics Press.

Tufte, Edward (2006): Beautiful Evidence. Graphics Press.

Yau, Nathan (2011): Visualise This: The FlowingData Guide to Design, Visualization and Statistics. Wiley.

 

‘BBCDebate’ on Twitter. A First Look into an Archive of #BBCDebate Tweets

[For the previous post in this series, click here].

The BBC Debate

The BBC’s Great Debate” was broadcasted live in the UK by the BBC on Tuesday 21 June 2016 between 20:00 and 22:00 BST. It saw activity on Twitter with the #BBCDebate hashtag.

I collected some of the Tweets tagged with #BBCDebate using a Google Spreadsheet. (See the methodology section below). I have shared an anonymised dataset on figshare:

Priego, E. (2016) “The BBC’s Great Debate”: Anonymised Data from a #BBCDebate Archive. figshare. https://dx.doi.org/10.6084/m9.figshare.3457688.v1

[Note: figshare DOIs are not resolving or there are delays in resolving; it should be fixed soon…]

Archive Summary (#BBCDebate)

Number of links 16826
Number of RTs 32206 <-estimate based on occurrence of RT
Number of Tweets 38116
Unique tweets 38066 <-used to monitor quality of archive
First Tweet in Archive 14/06/2016 22:03:18 BST
Last Tweet in Archive 22/06/2016 09:12:32 BST
In Reply Ids 349
In Reply @s 456
Tweet rate (tw/min) 62 Tweets/min (from last archive 10mins)
Unique Users in archive:

                      20, 243

Tweets from StrongerIn in archive:

16

Tweets from vote_leave in archive:

15

The raw data was downloaded as an Excel spreadsheet file containing 38,166 Tweets (38,066 Unique Tweets) publicly published with the queried hashtag (#BBCDebate) between 14/06/2016 22:03:18 and 22/06/2016 09:12:32 BST.

Due to the expected high volume of Tweets only users with at least 10 followers were included in the archive.

As indicated above the BBC Debate was broadcasted live on UK national television on Tuesday 21 June 2016 between 20:00 and 22:00 BST. This means the data collection covered the real-time broadcasting of the live debate (see the chart below).

#BBCDebate Activity in the last 3 days
#BBCDebate Activity in the last 3 days. Key: blue: Tweet; red: Reply

The data collected indicated only 12 Tweets in the whole archive contained geolocation data. A variety of user languages (user_lang) were identified.

Number of Different User Languages (user_lang)

Note this is not the language of the Tweets’ text, but the language setting in the application used to post the Tweet. In other words user_lang indicates the language the Twitter user selected from the drop-down list on their Twitter Settings page. This metadata is an indication of a user’s primary language but it might be misleading. For example, a user might select ‘es’ (Spanish) as their preferred language but compose their Tweets in English.

The following list ranks  user_lang  by number of Tweets in dataset in descending order. Specific counts can be obtained by looking at the dataset shared.

user_lang
en
en-gb
fr
de
nl
es
it
ja
ru
pt
ar
sv
pl
tr
da
ca
fi
id
ko
th
el
cs
no
en-IN
he
zh-cn
hi
uk

If you are interested in user_lang, GET help/languages returns the list of languages supported by Twitter along with the language code supported by Twitter. At the time of writing the language code may be formatted as ISO 639-1 alpha-2 (en), ISO 639-3 alpha-3 (msa), or ISO 639-1 alpha-2 combined with an ISO 3166-1 alpha-2 localization (zh-tw).

It is interesting to note the variety of European user_lang selected by those tweeting about #BBCDebate.

Notes on Methodology

The Tweets contained in the Archive sheet were collected using Martin Hawksey’s TAGS 6.0.

Given the relatively large volume of activity expected around #BBCDebate and the public and political nature of the hashtag, I have only shared indicative data. No full tweets nor any other associated metadata have been shared.

The dataset contains a metrics summary as well as a table with column headings labeled  created_at,  time,    geo_coordinates (anonymised; if there was data YES has been indicated; if no data was present the corresponding cell has been left blank), user_lang and user_followers_count data corresponding to each Tweet.

Timestamps should suffice to prove the existence of the Tweets and could be useful to run analyses of activity on Twitter around a real-time media event.

Text analysis of the raw dataset was performed using Stéfan Sinclair’s & Geoffrey Rockwell’s Voyant Tools. I may share results eventually if I find the time.

The collection and analysis of the dataset complies with Twitter’s Developer Rules of the Road.

Some basic deduplication and refining of the collected data was performed.

As in all the previous datasets I have created and shared it must be taken into account this is just a sample dataset containing the tweets published during the indicated period and not a large-scale collection of the whole output. The data is presented as is as a research sample and as the result of an archival task. The sample’s significance is subject to interpretation.

Again as in all the previous cases please note that both research and experience show that the Twitter search API is not 100% reliable. Large Tweet volumes affect the search collection process. The API might “over-represent the more central users”, not offering “an accurate picture of peripheral activity” (González-Bailón, Sandra, et al. 2012). Google spreadsheet limits must also be taken into account. Therefore it cannot be guaranteed the dataset contains each and every Tweet actually published with the queried Twitter hashtag during the indicated period. [González-Bailón et al have done very interesting work regarding political discussions online and their work remains an inspiration].

Only data from public accounts was included and analysed. The data was obtained from the public Twitter Search API. The analysed data is also publicly available to all Twitter users via the Twitter Search API and available to anyone with an Internet connection via the Twitter and Twitter Search web client and mobile apps without the need of a Twitter account.

Each Tweet and its contents were published openly on the Web, they were explicitly meant for public consumption and distribution and are responsibility of the original authors. Any copyright belongs to its original authors.

No Personally identifiable information (PII), nor Sensitive Personal Information (SPI) was collected nor was contained in the dataset.

I have shared the dataset including the extra tables as a sample and as an act of citizen scholarship in order to archive, document and encourage open educational and historical research and analysis. It is hoped that by sharing the data someone else might be able to run different analyses and ideally discover different or more significant insights.

For the previous post in this series, click here. If you got all the way here, thank you for reading.

References
[vote_leave]. (2016) [Twitter account]. Retrieved from https://twitter.com/vote_leave. [Accessed 21 June 2016].

González-Bailón, S., Banchs, R.E. and Kaltenbrunner, A. (2012) Emotions, Public Opinion and U.S. Presidential Approval Rates: A 5 Year Analysis of Online Political Discussions. Human Communication Research 38 (2) 121-143.

González-Bailón, S. et al (2012) Assessing the Bias in Communication Networks Sampled from Twitter (December 4, 2012). DOI: http://dx.doi.org/10.2139/ssrn.2185134

Hawksey, M. (2013) What the little birdy tells me: Twitter in education. Published on November 12, 2013. Presentation given from the LSE NetworkED Seminar Series 2013 on the use of Twitter in Education. Available from http://www.slideshare.net/mhawksey/what-the-little-birdy-tells-me-twitter-in-education [Accessed 21 June 2016].

Priego, E. (2016) “Vote Leave”. A Dataset of 1,100 Tweets by vote_leave with Archive Summary, Sources and Corpus Terms and Collocates Counts and Trends. figshare. URL: DOI: https://dx.doi.org/10.6084/m9.figshare.3452834.v1

Priego, E. (2016) “Stronger In”. A Dataset of 1,005 Tweets by StrongerIn with Archive Summary, Sources and Corpus Terms and Collocates Counts and Trends. figshare.
https://dx.doi.org/10.6084/m9.figshare.3456617.v1

Priego, E. (2016) “Stronger In”: Looking Into a Sample Archive of 1,005 StrongerIn Tweets. 21 June 2016. Available from https://epriego.wordpress.com/2016/06/21/stronger-in-looking-into-a-sample-archive-of-1005-strongerin-tweets/. [Accessed 21 June 2016].

Priego, E. (2016) “The BBC’s Great Debate”: Anonymised Data from a #BBCDebate Archive. figshare. https://dx.doi.org/10.6084/m9.figshare.3457688.v1

A #HEFCEmetrics Twitter Archive (Friday 16 January 2015, Warwick)

HEFCE logo

The HEFCE metrics workshop: metrics and the assessment of research quality and impact in the arts and humanities took place on Friday 16 January 2015, 1030 to 1630 GMT at the Scarman Conference Centre, University of Warwick, UK.

I have uploaded a dataset of 821 Tweets tagged with #HEFCEmetrics (case not sensitive):

Priego, Ernesto (2015): A #HEFCEmetrics Twitter Archive (Friday 16 January 2015, Warwick). figshare.
http://dx.doi.org/10.6084/m9.figshare.1293612

TheTweets in the dataset were publicly published and tagged with #HEFCEmetrics between 16/01/2015 00:35:08 GMT and 16/01/2015 23:19:33 GMT. The collection period corresponds to the day the workshop took place in real time.

The Tweets contained in the file were collected using Martin Hawksey’s TAGS 6.0. The file contains 2 sheets.

Only users with at least 2 followers were included in the archive. Retweets have been included. An initial automatic deduplication was performed but data might require further deduplication.

Please note the data in this file is likely to require further refining and even deduplication. The data is shared as is. The contents of each Tweet are responsibility of the original authors. This dataset is shared to encourage open research into scholarly activity on Twitter. If you use or refer to this data in any way please cite and link back using the citation information above.

For the #HEFCEmetrics Twitter archive corresponding to the one-day workshop hosted by the University of Sussex on Tuesday 7 October 2014, please go to

Priego, Ernesto (2014): A #HEFCEmetrics Twitter Archive. figshare.
http://dx.doi.org/10.6084/m9.figshare.1196029

You might also be interested in

Priego, Ernesto (2014): The Twelve Days of REF- A #REF2014 Archive. figshare.
http://dx.doi.org/10.6084/m9.figshare.1275949

#MLA15 Twitter Archive, 8-11 January 2015

130th MLA Annual Convention Vancouver, 8–11 January 2015

#MLA15 is the hashtag which corresponded to the 2015 Modern Language Association Annual Convention. The Convention was held in Vancouver from Thursday 8 to Sunday 11 January 2015.

We have uploaded a dataset as a .xlsx file including data from Tweets publicly published with #mla15:

Priego, Ernesto; Zarate, Chris (2015): #MLA15 Twitter Archive, 8-11 January 2015. figshare.
http://dx.doi.org/10.6084/m9.figshare.1293600

The dataset includes Tweets posted during the actual convention with #mla15: the set starts with a Tweet from Thursday 08/01/2015 00:02:53 Pacific Time and ends with a Tweet from Sunday 11/01/2015 23:59:58 Pacific Time.

The total number of Tweets in this dataset sums 23,609 Tweets. Only Tweets from users with at least two followers were collected.

A combination of Twitter Archiving Google Spreadsheets (Martin Hawksey’s TAGS 6.0; available at https://tags.hawksey.info/ ) was used to harvest this collection. OpenRefine (http://openrefine.org/) was used for deduplicating the data.

Please note the data in the file is likely to require further refining and even deduplication. The data is shared as is. The dataset is shared to encourage open research into scholarly activity on Twitter. If you use or refer to this data in any way please cite and link back using the citation information above.

For the #MLA14 datasets, please go to
Priego, Ernesto; Zarate, Chris (2014): #MLA14 Twitter Archive, 9-12 January 2014. figshare.
http://dx.doi.org/10.6084/m9.figshare.924801

Internet Librarian International ’14. A #ili2014 Twitter Archive

Internet Librarian 2014 logo

I have uploaded file contains a dataset of ∼ 2958 Tweets tagged with #ili2014 (case not sensitive). These Tweets were published publicly and tagged with #ili2014 between 13/10/2014 09:49 and 26/10/2014 17:36 GMT.
Priego, Ernesto (2014): Internet Librarian International ’14. A #ili2014 Twitter Archive. figshare.
http://dx.doi.org/10.6084/m9.figshare.1217605

Internet Librarian International 2014 (#ili2014) took place between 20 and 22 October 2014 in the Olympia Centre, London, UK.

The Tweets contained in the file were collected using Martin Hawksey’s TAGS 6.0. This file contains 3 sheets.

Only users with at least 2 followers were included in the archive. Retweets have been included. An initial automatic deduplication was performed but data might require further deduplication.

Please note that both research and experience show that the Twitter search API isn’t 100% reliable. Large tweet volumes affect the search collection process. The API might “over-represent the more central users”, not offering “an accurate picture of peripheral activity” (Gonzalez-Bailon, Sandra, et al. 2012). It is not guaranteed this file contains each and every Tweet tagged with #ili2014 during the indicated period, and is shared for comparative and indicative educational and research purposes only.

Please note the data in the file is likely to require further refining and even deduplication. The data is shared as is. The contents of each Tweet are responsibility of the original authors. This dataset is shared to encourage open research into scholarly activity on Twitter.

If you use or refer to this data in any way please cite and link back using the citation information above.

Altmetrics data for Nature Communications articles by access type, Jan – Oct ’14

Euan Adie from Altmetric has published a very interesting article with insights into a dataset of Nature Communications articles published between October 2013 and October 2014. I uploaded an edited version of his set as a spreadsheet on figshare.

Adie, Euan (2014): Attention! A study of open access vs non-open access articles. figshare. http://dx.doi.org/10.6084/m9.figshare.1213690

Adie, Euan (2014): Altmetrics data for Nature Communications articles, Oct ’13 – Oct ’14. figshare. http://dx.doi.org/10.6084/m9.figshare.1213687

 

 Euan found that
Open access articles, at least those in Nature Communications, do seem to generate significantly more tweets – including tweets from people who tweet research semi-regularly – and attract more Mendeley readers than articles that are reader pays.
I took his dataset from figshare and opened the .txt file using Excel. Using filters I deleted the 2013 articles and only focused on the ones published between January and October 2014. I sorted them by month of publication from January to October and separated them into two sheets, one for the open access articles and another one for the paywalled ones. This way one can use this spreadsheet to access the open access articles without having to sort through the paywalled ones. Or one can just do an individual analysis per type of access. This is of course super rudimentary data refining (if it can be called that), but it helped me to focus on the differences between access types published this year.
The resulting edited file has two sheets organized by type of access and month of publication, including only articles published during 2014.  (Note: seven (7) records appearing incorrectly as published 1900-1 under the month column were removed. They had no online mentions).

Manual manipulation of the original data was performed so all data should be contrasted with the original source cited above.

The intention of sharing this edited file is to aid in focusing on the open access and paywalled outputs published between January to October 2014 as provided in the original dataset. Having a smaller dataset organized by date and type of access may make quick visualisations easier.

Priego, Ernesto (2014): Altmetrics data for Nature Communications articles by access type, Jan – Oct ’14. figshare http://dx.doi.org/10.6084/m9.figshare.1213719

Nature Communications Articles Published in September 2014 By Access Type

 September 2014 Nature Communications Articles by Altmetric Score

Segundo Día de las Humanidades Digitales; Un Archivo de #díahd14

Visualización de un archivo de #diahd14 con TAGSExplorer
Visualización de un archivo de #diahd14 con TAGSExplorer

 

Here I reblog what I published on my blog for the second day of digital humanities in Spanish (Segundo día de las humanidades digitales).

Llego tarde.  Crear un blog toma tiempo, y encontrarlo es cada vez más difícil. En fin, a la mañana siguiente le robo tiempo a actividades más urgentes para dejar este sitio y estas líneas.

Como todos en este sitio sabrán el  “Segundo día de las humanidades digitales” se llevó a cabo el 15 de Octubre de 2014. Aunque hubo un poco de confusión sobre el hashtag a usar al parecer se acordó #díahd14 (acento o mayúsculas no importan).

He creado y compartido un documento que contiene un archivo que contiene aproximadamente 425 tweets únicos etiquetados con #díahd14 entre el  10/10/2014 10:39 GMT y el  16/10/2014 05:52 GMT.

Está en figshare en acceso abierto:

Priego, Ernesto (2014): Un archivo de #díahd14. figshare.
http://dx.doi.org/10.6084/m9.figshare.1206315

Algunos puntos importantes:

  • La columna E incluye los horarios de publicación en hora de la ciudad de México, D.F.
  • Se realizó una limpieza de datos automática inicial para evitar entradas duplicadas pero es posible que el archivo requiera de más refinamiento.
  • Se incluyeron todos los RTs, que cuentan como Tweets únicos.
  • Se incluye también una tercera hoja con datos cuantitativos generales del archivo y una cuarta con una lista de los 95 usuarios únicos que publicaron tweets con el hashtag y otros datos de la actividad de cada usuario.
  • Recuérdese que la API de Twitter Search suele sobre-representar los usuarios más activos y no se puede garantizar que este archivo contenga todos y cada uno de los tweets etiquetados con #diahd14.
  • Los datos se comparten tal y como se obtuvieron mediante una Twitter Archiving Google Spreadsheet (TAGS; Martin Hawksey).
  • Todos los tweets incluídos en este archivo fueron publicados públicamente en Twitter con la etiqueta #diahd14. Los contenidos de cada tweet son responsabilidad del usuario que los publicó.
  • El archivo se comparte con fines educativos y de investigación académica bajo una licencia de Creative Commons Atribución.

Una vez que se tienen los datos en Excel es facilísimo hacer un CSV del archivo y pensar en hacer diferentes análisis y visualizaciones con los datos. Lo comparto, como varios de mis otros archivos, esperando que a alguien le interese hacer algo con ellos.

También es posible les interese echarle un vistazo a

Priego, Ernesto (2014): #2EHD Archivo de Tweets del 2o Encuentro de Humanidades Digitales, México DF 19-23 de Mayo 2014. figshare. http://dx.doi.org/10.6084/m9.figshare.1037351 Retrieved 12:13, Oct 16, 2014 (GMT)

Por si a alguien le interesa, éste es un post mío del 28 de mayo de este año sobre por qué considero importante ir creando archivos de hashtags académicos en Twitter.

Espero todos hayan pasado un feliz y no tan ocupado día de las humanidades digitales.

El texto en español de este post se publicó originalmente en http://diahd2014.filos.unam.mx/ernestopriego/2014/10/16/un-archivo-de-diahd14/

A #HEFCEmetrics Twitter Archive

#hefcemetrics top tweeters

I have uploaded a new dataset to figshare:
Priego, Ernesto (2014): A #HEFCEmetrics Twitter Archive. figshare.
http://dx.doi.org/10.6084/m9.figshare.1196029

“In metrics we trust? Prospects & pitfalls of new research metrics” was a one-day workshop hosted by the University of Sussex, as part of the Independent Review of the Role of Metrics in Research Assessment. It took place on Tuesday 7 October 2014 at the Terrace Room, Conference Centre, Bramber House, University of Sussex, UK.

The file contains a dataset of 1178 Tweets tagged with #HEFCEmetrics (case not sensitive). These Tweets were published publicly and tagged with #HEFCEmetrics between 02/10/2014 10:18 and 08/10/2014 00:27 GMT.

The Tweets contained in the file were collected using Martin Hawksey’s TAGS 6.0. The file contains 3 sheets.

Please note the data in this file is likely to require further refining and even deduplication. The data is shared as is. The contents of each Tweet are responsibility of the original authors. This dataset is shared to encourage open research into scholarly activity on Twitter.

For more information refer to the upload itself.

If you use or refer to this data in any way please cite and link back using the citation information above.

1:AM London Altmetrics Conference: A #1AMconf Twitter Archive

1:AM  London 2014 logo

I have uploaded a new dataset to figshare:

Priego, Ernesto (2014): 1:AM London Altmetrics Conference: A #1AMconf Twitter Archive .  figshare.
http://dx.doi.org/10.6084/m9.figshare.1185443

1:AM London, “the 1st Altmetrics Conference: London”, took place 25th—26th September 2014 at the Wellcome Collection, London, UK.

The  file contains a dataset of 4267 Tweets tagged with #1AMconf (case not sensitive). These Tweets were published publicly and tagged with #1AMconf  between Thursday September 18 17:29:56 +0000 2014 and Sunday September 28 16:07:49 +0000 2014.

Only users with at least 2 followers were included in the archive. Retweets have been included. An initial automatic deduplication was performed but data might require further deduplication. The Time column (D) has times in British Summer Time (BST).

Please go to the file cited above for more information.

 

A Summer of #digitalhumanities – A Twitter Archive

 Priego, Ernesto (2014): A Summer of #digitalhumanities - A Twitter Archive. figshare. http://dx.doi.org/10.6084/m9.figshare.1176099 Retrieved 22:37, Sep 20, 2014 (GMT)

I have uploaded a new dataset to figshare. This is a dataset titled A Summer of #digitalhumanities – A Twitter Archive.

Priego, Ernesto (2014): A Summer of #digitalhumanities. A Twitter Archive. Figshare.
http://dx.doi.org/10.6084/m9.figshare.1176099

The file contains a collection of 6549 Tweets tagged with #digitalhumanities (case not sensitive) posted publicly during the period between 1 June 2014 and 15 September 2014.

I have shared the file openly under a Creative Commons – Attribution License to encourage open, timely research and study of academic uses of social media.

The first sheet contains a text with this information and the second sheet contains the complete archive. Sheets 4, 5, 6 (fourth, fifth and sixth tabs) contain each the Tweets corresponding to June, July, August and 1-15 September 2014.

The Tweets contained in the file were collected using Martin Hawksey’s TAGS 5.1. I subsequently refined the data manually into various sheets, which have been included in the file.

The usual I always note when I share a dataset: please note that both research and experience show that the Twitter search API isn’t 100% reliable. Large tweet volumes affect the search collection process. The API might “over-represent the more central users”, not offering “an accurate picture of peripheral activity” (González-Bailón, Sandra, et al. 2012). Therefore, it cannot be guaranteed the file contains each and every tweet tagged with #digitalhumanities during the indicated period.

The archive does not represent nor claims to represent the totality of the #digitalhumanities activity on Twitter or elsewhere.

The data shared here was originally shared willingly by their authors through public accounts via Twitter postings publicly available through the Twitter API. Please note the data in the file is likely to require further refining. The data is shared as is.

I am hoping to find some time this term to work on offering here some findings from this dataset. In the meanwhile, if you use or refer to this data in any way please cite and link back using the citation information above.

 

An #altmetrics14 Twitter Archive

"Altmetrics14: expanding impacts and metrics" (#altmetrics 14) was an ACM Web Science Conference 2014 Workshop that took place on June 23, 2014 in Bloomington, Indiana, United States, between 10:00AM and 17:50 local time.

Altmetrics14: expanding impacts and metrics” (#altmetrics 14) was an ACM Web Science Conference 2014 Workshop that took place on June 23, 2014 in Bloomington, Indiana, United States, between 10:00AM and 17:50 local time.

I have uploaded to figshare a dataset of 1758 Tweets tagged with #altmetrics14 (case not sensitive).

The dataset contains an archive of 1758 Tweets published publicly and tagged with #altmetrics14 between Mon Jun 02 17:41:56 +0000 2014 and Wed Jul 16 00:48:38 +0000 2014.

During the day of the workshop, 1294 Tweets tagged with #altmetrics14 were collected.

If you use or refer to the shared file in any way please cite and link back using the following citation information:

Priego, Ernesto (2014): An #altmetrics14 Twitter Archive.  figshare.

http://dx.doi.org/10.6084/m9.figshare.1151577 

I have shared the file with a Creative Commons- Attribution license (CC-BY) for academic research and educational use.

The Tweets contained in the file were collected using Martin Hawksey’s TAGS 5.1.  The file contains 3 sheets.

The third sheet in the file contains 1294 Tweets tagged with #altmetrics14 collected during the day of the workshop.

The usual fair warnings apply:

Only users with at least 2 followers were included in the archive. Retweets have been included. An initial automatic deduplication was performed but data might require further deduplication.

Please note that both research and experience show that the Twitter search API isn’t 100% reliable. Large Tweet volumes affect the search collection process. The API might “over-represent the more central users”, not offering “an accurate picture of peripheral activity” (González-Bailón, Sandra, et al. 2012). It is therefore not guaranteed this file contains each and every Tweet tagged with #altmetrics14 during the indicated period, and is shared for comparative and indicative educational and research purposes only.

Please note the data in this file is likely to require further refining and even deduplication. The data is shared as is.  This dataset is shared to encourage open research into scholarly activity on Twitter. If you use or refer to this data in any way please cite and link back using the citation information above.

An #okfest14 Twitter Archive

#okfest14 logo

The Open Knowledge Festival 2014 (#okfest14) took place in Berlin, Germany, 15th to 17th July 2014.

You can catch up with what happened during the event in this post on the Open Knowledge Foundation blog.

I have shared a dataset with the Tweets I collected tagged with #okfest14 (case not sensitive).

If you use or refer to this data in any way please cite and link back using the following citation information:

Priego, Ernesto (2014): An #okfest14 Twitter Archive.   figshare.
http://dx.doi.org/10.6084/m9.figshare.1148962

The complete archive contains Tweets published publicly and tagged with #okfest14 between  Sat Jul 12 10:41:57 +0000 2014 and Thu Jul 17 20:16:24 +0000 2014

The Tweets contained in this file were collected using Martin Hawksey’s TAGS 5.1.  The file contains 7 sheets.

Only users with at least 2 followers were included in the archive. Retweets have been included. An initial automatic deduplication was performed but data might require further deduplication.

The data in this file has been manually organised and quantified into sets organised by day. It is possible that sets are not complete; particularly Wednesday 16 July and Thursday 17 July might be incomplete due to high volumes.

Please note that both research and experience show that the Twitter search API isn’t 100% reliable. Large tweet volumes affect the search collection process. The API might “over-represent the more central users”, not offering “an accurate picture of peripheral activity” (González-Bailón, Sandra, et al. 2012). It is not guaranteed this file contains each and every Tweet tagged with #okfest14 during the indicated period, and is shared for comparative and indicative educational and research purposes only.

As usual, please note the data in this file is likely to require further refining and even deduplication. The data is shared as is.  This dataset is shared to encourage open research into scholarly activity on Twitter. If you use or refer to this data in any way please cite and link back using the citation information above.