21 May 2019

Beyond records storage… Institutional repository Digital CSIC as service for open science

Digital CSIC is the institutional repository (IR) of the Spanish National Research Council (CSIC). CSIC’s network of libraries and archives is in charge of the leadership and management of this IR.

A few preliminary notes: the institution and its libraries

CSIC is the largest public, research institution in Spain and the third largest in Europe. Its researchers generate approximately the 20% of all scientific production in the country. Its mission is to foster, coordinate, develop and promote scientific and technological research of a multidisciplinary nature in order to contribute to the advancement of knowledge and economic, social and cultural development, as well as training staff and advise public and private organizations on these matters.

CSIC’s research scope involves the following fields:
  • Biology and biomedicine.
  • Humanities and social science.
  • Natural resources.
  • Agricultural sciences.
  • Physical science and technologies.
  • Materials science and technology.
  • Food science and technology.
  • Chemical science and technology.
There are research centres all over the country that belong to CSIC. In a number of them there is a library and/or an archive.

No few services are managed thanks to a well-conceived network of libraries and archives:
  • A discovery tool that provides access to all information resources (papers, books, digitalised collections, databases, software licenses, etc.) kept, subscribed and managed by CSIC’s libraries.
  • Remote access to those resources, despite not being physically in the institution.
  • Traditional services, such as loan, interlibrary loan, user/library orientation, reading room and reproduction of documents.
  • A digital reference service.
  • An institutional repository in which research outcomes are archived: Digital CSIC. All members of the research community of CSIC can upload metadata-enriched files to it.
  • The Digital.CSIC Direct Archiving Service by which research community can delegate the archiving of its research outcomes to librarians so as to ensure higher-quality metadata and a faster uploading.
  • The service GRANADO aimed at improving the management of libraries space as well as ensuring the conservation of its collections regardless of its format.
  • 100% Digital plan, which is offered by CSIC’s network of libraries and archives to CSIC institutes without libraries. It includes a number of library services.

A new librarianship context: from open access to open science

According to the Open Access (OA) libguide of the University of Pittsburgh library system (2019), Open Access refers to:
  • “A family of copyright licensing policies under which authors and copyright owners make their works publicly available
  • A movement in higher education to increase access to scholarly research and communication, not limiting it solely to subscribers or purchasers of works
  • A response to the current crisis in scholarly communication”.
Although providing free online access to journal articles began many years before the term "open access" was formally coined, computer scientists had been sharing anonymous archives through FTP since the 1970s and physicists had been self-archiving on arxiv since the 1990s (History of open access).

The concept Open Access was not formally established until the 2000s due to these statements: the Budapest Open Access Initiative (2002), the Bethesda Statement on Open Access Publishing (2003), and the Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities (2003).

Two ways to accomplish Open Access statements emerged: green (research outcomes published on IRs) and gold roads (papers published on OA on their respective journals). However, the high costs of article processing charges (ACP) (Khoo, 2019) for pursuing gold road have resulted in that IRs are sometimes the only possible way for OA.

Not many years ago, the scope and sense of openness were widened by The Royal Society (2012) through its thought-provoking book Science as an open enterprise. The transcendence of Open Science (Anglada & Abadal, 2018) have come to the European Union. Indeed, European Commission (2019) has taken it into account on its new policies regarding research across Europe.

The FOSTER Plus (Fostering the practical implementation of Open Science in Horizon 2020 and beyond) project designed this taxonomy that organises all the related concepts:

Open Science Taxonomy. Source: Foster Open Science.
Open Science have brought out several relevant issues on how research is carried out, its outcomes and benefits for society, and the agents involved:
  • The fourth scientific revolution concerns big data, data mining and software.
  • Speaking of outcomes, openness does not only refer to papers published on journals or proceedings, but also the research data regardless of its format, e.g. databases, photographs, presentations, web sites and pages, videos, didactic materials, datasets, software and code. Moreover, research outcomes does not only belong to publishers and/or researchers, but also to society.
  • Data must be FAIR: Findable, Accessible, Interoperable and Reusable (Wilkinson et al, 2016. GO FAIR, 2019).
  • Ethics counts: data ownership, intellectual property rights, research integrity (SPARC Europe, 2019), privacy, security and safety.
  • There is much more need for investing in scientific literacy, science communication and open education than ever.
  • Now, a more variety of partnerships between research agents and society is feasible.
  • Evaluation of science and its metrics must change, as the current cites-based system is not enough to foster open science among scientists.

Digital CSIC as service for open science and researchers community

Looking at this new data-information-and-knowledge environment we will undoubtedly have to face, librarians must wonder how to adapt ourselves, our libraries and profession to address the issue.

Specifically, as for institutional repositories, the following are the actions taking up by the Digital CSIC IR to go beyond any digital library and play a service role for open science and research community.

Digital preservation

It goes without saying that the archiving of research papers on IRs contributes to their digital preservation. If we keep in mind the Levels of Digital Preservation established by the Digital Library Federation (2018), an Open Access Repository in and of itself can be a “tool” to cope with the five functional areas: storage and geographic location, file fixity and data integrity, information security, metadata and file formats.

Digital preservation must be planned. Although IRs can be a great deal of help, they must be tools that are integrated into a well-conceived digital preservation planification.

Digital CSIC (2019) currently offers the following digital-preservation-oriented actions:
  • Backups.
  • Storage of magnetic tapes.
  • Conversion of formats to more secure ones.
  • Periodic checks of the files integrity to prevent their corruption.
  • Monitoring of the technology environment to foresee possible migrations of obsolete formats or software.
  • Metadata for digital preservation.
  • Recommendation for file formats.

The archive of science

Digital CSIC pursue the archiving of all the research outcomes of its institution. As I said before, according to open science view, outcomes involves a wide range of resources: papers published on journals or proceedings, databases, photographs, presentations, web sites and pages, videos, didactic materials, datasets and software. That is precisely the mission of archives: the archiving and preservation of all the records generated as a result of the activity of the institution in which it is integrated and depends on. So, in a sense, an OA IR is the archive of science produced on its institution. In case copyright and intellectual issues do not allow to publish some resources on Open Access, it does not mean that those cannot have an embargo or be in closed access in order to preserve them.

Digital CSIC, which is built upon the software Dspace, has one community per field of knowledge in which CSIC researchers research. I listed those fields in the first epigraph of this post, all of them are accessible via https://digital.csic.es/community-list. There is a sub-community per each research institute devoted to a determined field of knowledge. Then, there are as many collections inside each sub-community as different types of information-or-data resources resulting from the research carried out by that research institute. The principle of provenance is present, thus the archive of science.

FAIR data

Taking FAIR Principles (GO FAIR, 2019) into consideration, I show how the IR Digital CSIC accomplishes them as followed:

Findable

F1. It uses the handle system to assign an URI to each digital object.

F2. The IR publishes intrinsic metadata, librarians ensure the contextual metadata and librarians along with researchers are committed in the description process to ensure rich metadata, such as the measurement devices used, the units of the captured data, the species involved, the genes/proteins/other that are the focus of the study, the physical parameter space of observed or simulated astronomical data sets, questions and concepts linked to longitudinal data, calculation of the properties of materials, or any other details about the experiment.

F3. Digital CSIC does it through dc.identifier.uri.

F4. They are, as Digital CSIC is indexed by the Spanish national aggregator RECOLECTA, OpenAIRE, share.osf.io, core.ac.uk, base-search.net, Google Scholar as well as being registered on re3data.org.

Accesible

A1.1 and A1.2. It uses OAI-PMH.

Interoperable

I1. It supports MARC, Dublin Core, RDF, ORE, MODS, METS and DIDL.

I2 and I3. It does.

Reusable

R1.1. dc.rights and dc.rights.license are used.

R1.2. dc.date.accessioned, dc.date.available and dc.description.provenance are used.

R1.3. It is partially accomplished. Digital CSIC tends to use dc.description as last resource.

Open Peer Review Module

Digital CSIC have integrated the first Open Peer Review Module (OPRM) for open access repositories that allows to make reviews and comments on already archived digital objects.

Open Peer Review Module. Source: Digital CSIC.
This tool is especially useful for receiving feedback that is bound to facilitate the improvement of research outcomes.

Impact, (alt)metrics and statistics of research

How can Digital CSIC measure the impact of its archived files?

First all of all, in the web page about general statistics, we can see them in terms of:
  • Number of research institutes per community (field of knowledge).
  • Number of items per community.
  • Number of items per research institute (top 20).
  • Number of research institutes by geographical distribution.
  • Number of items by geographical distribution.
  • Types of items (articles, conference paper, etc.).
  • Types of archived items per research institute.
  • Open Access: the percentage of OA items by type, year of deposit and community.
We can delimit them by date (year and/or month).

It also shows the number of archived objects by communities (field of knowledge), sub-communities (research centres), collections (types of documents per research centres) and authors. By research groups and research projects are being tested.
Source: Digital CSIC.
Source: Digital CSIC.
Source: Digital CSIC.
It is also possible to view statistics of any of the communities in terms of count of views, sub-community view count, collection view count, item count view and item download count. Besides, we can examine those by region/country/city in a geo map (thanks to Google Maps API) and along time.

As for single archived digital objects, we can see its views and downloads by region and along time. There is also information about altmetrics.
Source: Digital CSIC.
Source: Digital CSIC.

Web pages for researchers

Digital CSIC provides the possibility to generate web pages for researcher. They consist of:
  • An URI.
  • A personal statement with a nice picture.
  • Integration of profiles of other networks and IDs.
  • Statistics.
  • Concentration and organization of all their scientific production.

Automated archiving

Digital CSIC and some publishers came to an agreement so that they are archiving all the journal papers on this IR as long as their filiation contains CSIC.

Consultancy

Digital CSIC and its librarians give advice to the research community regarding a number of topics:
  • Technical use of Digital CSIC and guidance in the metadata description according to its policies.
  • Open Access mandates.
  • Profiles for researchers and research groups.
  • Intellectual property, copyright and licensing.
  • FAIR data.
  • Data management plans.

Final thoughts

The increasingly consciousness regarding the importance of open access, which we can even measure (Dubinsky, 2019), is undoubtedly good news. However, promoting open access is not enough. Institutional repositories seem need to evolve from merely digital libraries for storage of items. Librarians of research institutions must change their mindset to a service-oriented one. Service, here, has to do with open science and the researchers community. I have presented current developments of Digital CSIC, I hope it would be inspiring for other librarians. 

References

Anglada, L.; Abadal, E. (2018). ¿Qué es la ciencia abierta?. Anuario ThinkEPI, 12, 292-298.
doi: 10.3145/thinkepi.2018.43

Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities (2003). Retrieved from https://openaccess.mpg.de/67605/berlin_declaration_engl.pdf

Bethesda Statement on Open Access Publishing (2003). Retrieved from https://legacy.earlham.edu/~peters/fos/bethesda.htm

Budapest Open Access Initiative (2002). Retrieved from https://www.budapestopenaccessinitiative.org/read

Digital CSIC (2019). Digital preservation policy. Retrieved from http://digital.csic.es/dc/politicas/#politica8

Digital Library Federation (2018). Levels of Digital Preservation. Retrieved from https://ndsa.org/activities/levels-of-digital-preservation/

Dubinsky, E. (2019). Does open access make cents? Return on investment in the institutional repository. College & Research Libraries News, 80(5). doi: 10.5860/crln.80.5.281.

European Commission (2019). Open Science. Retrieved from https://ec.europa.eu/research/openscience/index.cfm

Foster Open Science. Open Science Taxonomy. Retrieved from https://www.fosteropenscience.eu/themes/fosterstrap/images/taxonomies/os_taxonomy.png

GO FAIR. FAIR Principles. Retrieved from https://www.go-fair.org/fair-principles/

History of open access. Retrieved from https://en.wikipedia.org/wiki/History_of_open_access

Khoo, S. Y.-S. (2019). Article Processing Charge Hyperinflation and Price Insensitivity: An Open Access Sequel to the Serials Crisis. LIBER Quarterly, 29(1), 1–18. doi: 10.18352/lq.10280

SPARC Europe (2019). Research Integrity through Open Science and FAIR Data. Retrieved from https://sparceurope.org/wp-content/uploads/dlm_uploads/2019/03/SPARCEurope_ResearchIntegrityBrief.pdf

The Royal Society (2012). Science as an open enterprise. Retrieved from https://royalsociety.org/~/media/royal_society_content/policy/projects/sape/2012-06-20-saoe.pdf

University of Pittsburgh library system (2019). Open Access @ Pitt: All About OA. Retrieved from https://pitt.libguides.com/openaccess

Wilkinson, M. D. et al (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific data, 3:160018. doi: 10.1038/sdata.2016.18.

0 comments:

Post a comment