2 Jul 2013

A First Look at Data Citation

Thomson Reuters' Journal Citation Reports, Incites and Science Citation Index have (rightly or wrongly) become landmark products in the bibiometrics landscape. With data management one of the hottest topics in scholarly communications at the moment, TR were unsurprisingly one of the first to move into the data citation space as well.

The Data Citation Index was launched in November 2012 and is designed to quantify the impact and reuse of research data. This is still quite a new area for many of us, and a recent working paper by Torres-Salinas et al. looks at the coverage of the DCI in terms of disciplines, document types and repositories. The authors' analysis estimates that:

  • 80% of the records included in the index are classified as Science, 18% as Social Sciences and 2% as Humanities & Arts records. Engineering & Technology is almost non-existent, accounting for less than 0.1% of the total records.
  • The DCI uses three document types. There are 96 data repositories, and the predominant typology is the data set which is 94% of the entire database. The third document type, data studies, comprise around 6% of the total records included in the index.
  • 64 of the 96 repositories included in the index contain at least 100 records. However, there is a very high concentration across just four repositories, which together account for 75% of all records from repositories in the DCI: Gene Expression Omnibus, UniProt Knowledgebase, PANGAEA and U.S. Census Bureau TIGER/Line Shapefiles.
As it is still early days for open data, data sets and data repositories, it will be interesting to see if and how these trends change over time.
