17 Feb 2012

The value of metadata

In recent days I have come across two different examples illustrating the importance of getting metadata 'right' in order to make digital content easy to find and discover.

More and more authors are now archiving their research in institutional repositories. In theory this 'green route' to open access makes research considerably more visible, but in practice this depends largely upon how Google indexes the content from a given IR. The authors of a paper recently published in Library Hi-Tech (1) suggest that IRs typically have lower indexing ratios because they use the Dublin Core element set rather than the metadata schemas recommended by Google Scholar. Artlisch & O'Brien argue that Dublin Core can not adequately express bibliographic citation information, and found that after transforming the metadata of a subset of papers from the University of Utah's IR in line with GS recommendations, the indexing ratio increased to over 90%. 

The value of metadata can be seen in a different context in a recent white paper from Nielsen (2). Nielsen is a commercial company selling a product - enhanced bookdata - so it obviously has vested interests, but the paper does offer some interesting data on the links between metadata and sales nonetheless. Making sure content is easy to find is not just about research impact and citations; in the book industry it is also about sales.The evidence presented by Nielsen indicates that more is definitely more when it comes to the link between metadata and book sales. Books in the top 100,000 in 2011 which has complete BIC basic elements including a cover image (one of the most crucial elements), typically experienced higher average sales than those without. Furthermore, adding enhanced descriptive elements such as long and short descriptions, author biographies and reviews (available via Nielsen's Bookdata Enhanced service) also resulted in increased sales: Publishers who subscribed to the company's enhanced metadata services experienced 11% growth in volume sales year on year, with 44 out of the 65 publishers in this category seeing positive growth in volume sales. This compares with an overall market decline of 2.7% (obviously correlation doesn't necessarily equal causation!).

So it is not just librarians who are interested in metadata, but ultimately publishing CEOs and their accountants as well.

(1) Kenning Arlitsch, Patrick Shawn OBrien, (2012) "Invisible Institutional Repositories: Addressing the Low Indexing Ratios of IRs in Google
Scholar", Library Hi Tech, Vol. 30 Iss: 1

(2) White Paper: The Link Between Metadata and Sales by Andre Breedt, Head of Publisher Account Management & David Walter, Research and Development Analyst, Nielsen. 25th Jan 2012.


  1. It is interesting what you say about Dublin Core. I have done a bit of research on metadata standards and digitization as of recently and most sources seem to agree that Dublin Core is the best schema. It is not great to find out that it has limitations, and significant ones like compromising indexing. Although it seems to be mainly an issue with citations and academic papers (and Google Scholar indexing), it is something to keep in mind. Thanks for this post, and for all the other posts as well, this blog is super :)

  2. Thanks Giada :) Dublin Core does have a lot of advantages as well in that it is simple and the elements are optional and repeatable, and you can automate the process of using existing HTML metadata to generate DC (though I am far from an expert on this unlike your good self :)), and I agree that the way Google Scholar indexes content is probably the bigger issue but I can't see that changing radically anytime soon.

    SEO is certainly a very interesting area as far as research impact and visibility goes.

  3. Just to update on this via the well-informed folk (namely Tim Brody) on the JISC Repositories LISTSERV. It appears that most IR platforms are indexed fine by Google Scholar once content is fully open, and that it is only instances where software is badly implemented that are proving difficult. That said GS does have a policy of linking to the publisher version of an article first, rather than the archived version (where both exist).
    More info via https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1202&L=jisc-repositories&D=0&P=21886