|Speakers and Organisers day 1. Picture courtesy of Richard Bradfield|
This two-day training event was held in UCC in April, and was jointly hosted by UCC Library, UCC Research Support Services, Teagasc and the Repository Network of Ireland (RNI). The first of its kind to be held in Ireland, the event introduced attendees to the concepts of open research and research data management within the context of Horizon 2020. With speakers from the U.K. and Ireland sharing best practice, the event was an invaluable learning experience, and timely in the context of Horizon 2020’s Open Data Pilot.
To stage the event, the project team was successful in securing funding from the FP7 funded FOSTER project. FOSTER (Facilitate Open Science Training for European Research) is a two-year EU funded project which aims to promote & ‘foster’ open science in research, and to optimise research visibility and impact and the adoption of EU open access policies.
Research data management (RDM) generally refers to the processes of organising, structuring, storing, and preserving the data used or generated during a research project. Numerous factors are now influencing the drive for open data, but chief among them is the influence of funders seeking transparency and a demonstration of the wider impact of the research they are financing. In Horizon 2020 a limited pilot on open access to research data is being implemented, with participating projects required to develop a Data Management Plan (DMP). There is an expectation that this trend of research funding programmes requiring data management plans is set to continue, as has been happening in the U.K.
In addition to compliance, RDM benefits researchers and institutions through the potential for re-use of data, and the opportunity to demonstrate research excellence. Many institutions are taking a lead by establishing research data policies and seeking to coordinate cross-campus approaches to gathering and maintaining data. This often involves research support services, IT teams, libraries and researchers working together. However, RDM has been described by Cox et al. (2014) as a ‘wicked problem’, complex and difficult to define, requiring solutions that are flexible and pragmatic. RDM is still in the early stages at many Irish institutions, and this event offered an opportunity to learn from others and to draw on the expertise of those who are further down the road. It was a chance also for making connections within and across institutions in Ireland and the U.K.
|David O'Connell opening Day 1. Picture courtesy of Richard Bradfield|
Day 1: ‘Open research in H2020: how to increase your chances of success’
The first day was targeted to researchers and small and medium enterprises interested in developing Horizon 2020 proposals. David O’Connell, Director of Research Support Services, UCC, provided the opening remarks, mentioning that as former chief editor of ‘Nature Reviews Microbiology’ he has had a long interest in open access publishing, and a strong interest now in the application of open access to research data.
The project team were lucky to have support and guidance from Martin Donnelly from the Digital Curation Centre (DCC). The DCC is a UK-based world-leading centre of expertise in digital information curation, providing expert advice to higher education. Martin played an invaluable advisory role in the run-up to the event. Although he was unable to attend due to unavoidable reasons, he provided four recorded presentations for the event at short notice. Day 1 began with his first presentation: an overview of Open Science and Open Data in Horizon 2020. He started by providing a background to open access and RDM, looking back at open access in FP7, before looking at open science in Horizon 2020, and the specifics of the open data pilot.
Joe Doyle, Intellectual Property Manager, Enterprise Ireland, provided a background to how intellectual property relates to both innovation and collaboration, describing IP as a bridge between the creative and the commercial. Open access can generate greater collaboration, but it is important to acknowledge that what is free to access is not necessarily free to use without limits. Open access and patents can work hand-in-hand, as patents are about disclosing data. While they can’t be copied, much can be learned from previous innovations.
Jonathan Tedds, Senior Research Fellow, Department of Health Sciences, University of Leicester, spoke of RDM from the perspective of researchers, giving examples of projects he has been involved in, and issues encountered. He originally became convinced of the benefits of data sharing through his work as an astronomer, when he would ‘stitch together’ data he had generated for re-use. He cited the Royal Society (2012) report ‘Science as Open Enterprise’ suggesting that publishing articles without making data available is a form of scientific malpractice, and he noted that the number of papers based upon reuse of archived observations now exceeds those based on the use described in the original proposal. However researchers in many fields, especially those involved in smaller projects, need help to comply with funder requirements. He emphasised the iterative nature of research and data management planning, and the challenge of sustaining research software, not just the underlying data. The HALOGEN project was a good example of combining different kinds of data from different fields, achieved by creating a central scalable database infrastructure to support the project. The BRISSkit project involved developing software to link applications to create a data warehouse of anonymised (consented) patient data. It brings bed-side patient data to university researchers to be used for new biomedical research.
|Group shot. Picture courtesy of Richard Bradfield|
In the afternoon, Martin Donnelly’s second presentation focussed on data management plans (DMPs), providing an overview of these and their benefits. He went on to outline various data related policies and requirements in Europe and elsewhere, plus the supports and resources that are available to those writing DMPs, including those provided by the DCC. He demonstrated the DMPonline tool, which was created by the DCC, and can be customised by institutions. It can be used by researchers at the point of application and throughout the research project, and can be used for sharing and co-writing plans.
Brian Clayton, Research Cloud Service Manager, UCC, spoke of RDM as a work-in-progress at UCC. He described current UCC research cloud paid services which include data storage and compute services. The service has expanded to offer elements of data management, and a draft RDM policy is currently awaiting University committee approval. The aspiration is that RDM services can be provided at zero-cost to the researcher. Many outstanding issues will need to be explored, particularly in regard to data sharing, metadata, and who will carry out the various roles within the University.
Peter Mooney, Environmental Research Scientist, Environmental Protection Agency, looked back at over a decade of RDM at the EPA. As far back as 2004 the EPA made a commitment to researchers they were funding, that they would preserve data free of charge, and be responsible for long term management and infrastructure. The SAFER data archive was launched in 2006, linking data to papers and reports. Collaboration with researchers has been key to its success and development. Data reporting is now an essential element of the reporting process on EPA funded projects. He outlined some of the lessons learned, and suggested that open data is often misunderstood by researchers, and metadata is often a mystery, or seen as a burden. Modelling data correctly at the start of a project increases usability, and researchers would benefit from understanding the basics of relational databases. As an example, he cited over-reliance on Excel rather than using databases. He also cautioned against long embargo periods which only serve to make data lose relevance.
The final speaker of the day was Evelyn Flanagan, Data Manager at the UCC Clinical Research Facility, who spoke of her role as a data manager in clinical trials. She discussed how core principles of data management are a fundamental element of good clinical practice (GCP), before providing a thorough description of the ‘data sequence’ from protocol design right through to the report writing stage. She examined each stage of the process, including database design for case report forms (CRF), the importance of good metadata, data collection and data entry procedures. Like the previous speakers she stressed the value of DMPs at the early stages of a project, and how they underpin good practice at each stage of the data sequence.
|John Fitzgerald opening Day 2. Picture courtesy of Richard Bradfield|
The second day of the event was aimed at institutional support staff who can provide support to researchers engaging with RDM. Many speakers came from the UK where the policies of the UK research funders (RCUK) require researchers to engage with RDM. In Ireland, the open data pilot in Horizon 2020 is the first signal that research performing institutions here will have to address RDM in the coming years.
John FitzGerald, University Librarian and Head of Information Services, UCC, gave the opening remarks mentioning how RDM will ‘challenge us as professionals with broadly curatorial problems’ as we seek to ‘manage the ecosystem in which data exists’. The first invited speaker of the day, Martin Donnelly of the DCC, provided a clear overview into RDM for support staff. Although unable to attend the event in person, Martin provided a recorded presentation which was very well received by those present.
Stuart McDonald, Research Data Management Service Co-ordinator, University of Edinburgh, spoke about their comprehensive approach to RDM services which began early in 2008 with a JISC funded pilot project. There were some audible gasps when he outlined the resourcing and staffing of the RDM programme at Edinburgh where £1.2 million has been allocated via internal funding. Aside from the resourcing, it was also illuminating to see how Edinburgh approaches data management before, during and after research. They are now investigating how to ensure that systems used for data management do not duplicate effort required by researchers which they will undoubtedly be happy to hear about.
David McElroy, Research Services Librarian at the University of East London, demonstrated how they have used Eprints, their existing institutional repository software, to create a new data repository, Data.uel. Publications archived in their open access publications repository are then linked to underlying data archived in their data repository. This of course ensures traceability and reproducibility of research. It was really useful to see the repository development path taken from decision making and planning to functional and metadata specifications and right through to mock ups and branding.
The third speaker, Jonathan Greer highlighted how Queens University Belfast is taking an ‘incremental approach’ to RDM services as they seek to align the plans and policies of the institution with the practice of their researchers. He offered some consolation to those uninitiated in RDM services by relaying how challenging it can be to roll out a service in such a complex area.
In the afternoon, Gareth Cole, Research Data Manager at Loughborough University and formerly of the University of Exeter, outlined how both university libraries approached the delivery of training and support. This was very useful as it became clear throughout the day that there is no one size fits all approach to RDM services.
Julia Barrett, Research Services Manager, UCD Library, summarised how she has shaped their research services to facilitate effective data management and sharing in UCD. It was encouraging to see the potential for a range of services which the library can offer and Julia has categorised these into ‘Discover’; ‘Create / Analyse’; ‘Manage’ and ‘Disseminate / Publish’ services.
Louise Farragher, Information Specialist, Health Research Board, introduced the PASTEUR4OA project which seeks to align open access policies across Europe. While an earlier question from the audience queried the effectiveness of lots of policy, Louise was quick to reinforce the message that policy is a good starting point for open access adoption.
Finally Dermot Frost, Research IT services at Trinity College Dublin, gave an engaging account of his experiences of developing the technical infrastructure for the Digital Repository of Ireland (DRI). The DRI is a ‘green-field repository’, to be launched publicly in June 2015 at DPASSH and is Ireland’s trusted repository for humanities and social sciences data. The DRI has had a large inter-disciplinary project team and Dermot stressed that while the language barrier (tech vs. non-tech) can be challenging, it was very useful for the exchange of ideas to have different people on board.
|Q&A featuring Day 2 speakers. Picture courtesy of Richard Bradfield|
Overall take home messages
1. Challenging: developing RDM services can be challenging due to the complexity and variety of research data. However, it is possible to learn from established services at other institutions. All speakers were very open to sharing their own experiences, tools and resources for late adopters of RDM services. All highlighted the well-established services available in the UK e.g. Digital Curation Centre as well as various online tools and resources which can be reused.
2. Planning: it is essential to plan out a roadmap after first establishing an understanding of the needs of the stakeholders. Stuart McDonald discussed the Data Audit Framework used at Edinburgh to identify research data assets and their management before developing an RDM policy and service. Dermot Frost mentioned that a ‘repository needs data to justify its existence’ and so the DRI has a stakeholder advisory group to ensure that depositors were involved from the planning stages.
3. Cross campus collaboration required: due to the complexity of RDM, the different types of stakeholders involved and emerging funder requirements, coordination across the institution is essential for an effective approach to service development.
4. Planning at the research project level: the importance of DMPs in the early stages of projects was emphasised by all Day 1 speakers. They ensure good data management practice at each stage of the research process.
Cox, A.M., Pinfield. S., & Smith, J. (2014). Moving a brick building: UK libraries coping with research data management as a ‘wicked’ problem. Journal of Librarianship and Information Science, 46 (4), 299-316. doi:10.1177/0961000614533717
Royal Society. (2012). Science as Open Enterprise. Retrieved from https://royalsociety.org/policy/projects/science-public-enterprise/Report/