Blog coverage of this session comes from James Hardcastle – Research Manager at Taylor & Francis. James does bibliometric, altmetric and usage analysis across 1800 titles mainly in the social science and humanities.
This session was introduced by Mark Hammel (Figshare) looking at how can we measure use of non-article artefacts.
The first speaker was Sarah Callaghan, Chair of the RBA working group.
The British Atmospheric data centre covers everything from Earth Science to Oceanography, with a vast range of data types and sizes. They have physical data sources such as ice cores and samples, small data sets collected by individual researchers, up to massive 4D data sets modelling climate run from super computers.
Currently they collect data on how many views content gets, how long they are looking at the content. But this doesn’t give any information on how the content is being used. So sometimes they send out surveys which can provide case studies about use, but this is time consuming and doesn’t come close to providing the full picture of what is happening.
For years they have been asking people to cite their datasets, but the uptake has been low. But all data repositories can now issue DOIs, which encourages more people to cite data. However even when users cite data sources trying to get this information out Google Scholar is a very slow and manual process.
Why do we want to measure this? Several reasons but a big one is telling authors how their data and work has been used.
The RDA working group on bibliometrics for data has surved data users and data manager. In answer to “What do you use to evaluate your data?” the answer is consistently nothing, and “Does this work for you?” – No.
The second speaker from CERN:
As we know CERN authors publish papers, they look much the same as other papers. But CERN also produces a vast amount of data. The LHC records 100 events a second, although only a small section are stored for future use this has still left CERN with more than 100 Petabtyes of data on tape, what the researchers need is processed or summarised data.
However High Energy Physics isn’t like other fields, there is a discrete and limited population working in this area and therefore tools and systems such as InSPIRE – Digital Library can be used. InSPIRE contains more than 1 million records, more than 500k OA articles and 2 searches a second. InSPIRE also hosts large amounts of data and datasets all of which are given a DOI. Because everyone in HEP uses it you can track the citations and see use. The theorists love it as well; “like you asking for a pony, and someone giving you a unicorn.”
The final speaker was Daniel S. Katz, Program Director of the National Science Foundation. This talk focused more on software and how do we allocate credit back to software creators.
Although like Sarah there was disappointment that crediting data and software is currently infrequent and poorly done. Software is very important, most science in some way uses it. But software is not a one-time effort it requires maintaining and upgrading which is very people intensive. For it to be sustainable it needs to be treated like part of the scientific infrastructure.
Measuring the impact of software is very difficult and different types of tools will require different measures. An open source physics simulation might be downloaded a large number of times and the outputs appear in research papers, maths libraries might be downloaded once and then referred to in a piece of software. Effectively easy things to measure give us very little information, the high value measures are very hard to measure.
So therefore his suggestion was for funded research to also produce a credit map saying what tools and data it had used. Eventually these could be linked so example, paper 1 gives credit to software X, which also gives credit to library Y, Thereby linking paper 1 back to library Y.
Overall how we offer credit for data and other non-article tools is still a work in progress, all the speakers looked to the DOI system to try and provide a link between various research outputs. Citations and references rather than altmetric were seen by all the speakers as the most natural measure.