8 May 2015

Digital Preservation: Not Just Clouds & Unicorns (ANLTC – NLI 29th April – 1st May 2015 – Report)

Guest post by Elaine Harrington, Special Collections Librarian, UCC Library

I had previously attended a one-day seminar run by the DPC on Getting Started in Digital Preservation. This three-day course run by the DPTP is an intermediate course for practitioners of digital preservation. Over the course Ed Pinsent, a digital archivist and Steph Taylor, a senior consultant, both with University of London Computer Centre (ULCC) showed us tools, methods and strategies for engaging with digital preservation. We viewed practical examples, examined case studies and challenging and complex objects, and participated in group exercises to better understand what digital preservation is.

Over the three days there were moments when I thought I was in a different universe where acronyms ruled (AIP, SIP, DIP,  JHOVE, PLATO, SCAPE, SCOUT, METS, MODS, TDR) or on a Star Wars’ set (constant references to DROID) or looking at antique cars (parallel situation of finding parts to replace wear and tear in cars or older technologies). By the end of third day I was beginning to return to Earth.

The course was broken into modules each of which lasted approximately 45 minutes. Although the course was intensive there was plenty of opportunity to ask questions and Ed and Steph included plenty of examples. At certain points for example when we were discussing ‘migration’ in methods of digital preservation we noted that ‘migration’ would also feature in file formats and as part of a ‘Migration Strategy Exercise.’

Due to sheer volume of concepts and information covered over the three days it is impossible to write about all the modules.

What is Digital Preservation?
According to the National Archives at Kew a digital preservation policy is the mandate for an archive to support the preservation of digital records through a structured and managed digital preservation strategy.” In practical terms the following are needed for digital preservation:
a database to manage preservation and store metadata
tools to perform ingest functions
a place to store digital objects
an access or delivery platform
rules, workflows, policies
an IT infrastructure
people and skills

OAIS Model
Ed and Steph used the OAIS Model and its terminology throughout the course to illuminate the digital preservation process.

Courtesy of University of London Computer Centre

Day 1
On the first day we examined what the OAIS Model is and some of the implications in using it. This was useful as it would be used in some of the group exercises over the next three days and we would have the appropriate terminology to use. This section was followed by modules on methods of digital preservation and exercise; significant properties and the Performance Model; file format: their structure and treatment; and metadata for practitioners. Significant properties varied depending on the file type: 16 significant properties for moving images compared to 6 for audio.

It was clear from the exercise on digital preservation methods that while we understood what was being said to us it was another matter entirely to be given a method and to discuss the pros and cons of that method. Approaches included: migration, emulation and technology preservation. The group I was in was given the bit-level only approach which focuses on maintaining only the 1s and 0s of code.

Courtesy of Elaine Harrington

It was a little bit worrying when Ed said that someone (not on the course!) thought a way to preserve technology was to dip a laptop in Perspex and then chip it off in 20 years! If the computer specs were known and 3D printers still exist in 20 years’ time perhaps it would be possible to 3D print any parts that would be needed to fix a physical piece of technology.

Real world examples were used to explain each module. For example DIOSCURI was used to show how emulation works. The National Library & Archives of the Netherlands use DIOSCURI to run old operating systems such as DOS and WordPerfect 5.1.

Courtesy of University of London Computer Centre

Ed and Steph also mentioned Atari systems and Pac-Man. The Centre for Computer History in Cambridge was established to tell the story of the Information Age through exploring the historical, social and cultural impact of developments in personal computing.

Courtesy of Centre for Cambridge History of Computing.

Day 2
On the second day we covered XML for digital preservation; tools for ingest; how to do migration including an exercise; METS; PREMIS and an exercise; making a business case and an exercise; assessment, audit and TDR.

XML is Extensible Markup Language. Like HTML, XML uses tags but whereas HTML describes presentation XML describes content. There may be:
XML Schema which has specification for elements and tags you will use. In a digital preservation plan the schema being used must be declared.
XML Stylesheet which displays the underlying XML and renders text in useful way for readers
XML Document which is the document you are authoring and which describes the object.

XML is a preservable text format in that it is open, documented and is not tied to vendor or platform. It is both good for storing and conveying metadata. There are different types of metadata: Descriptive, technical, rights, structural and preservation, and XML can be used to describe them all. The Library of Congress uses XML to represent their metadata records in MARC, MODS and METS. XML can enclose a digital object and be used to build and AIP (see OAIS Model). XML allows for interoperability.

Courtesy of University of London Computer Centre

XML & Migration
It is not enough simply to have digital preservation for the file but also for the metadata. The metadata may be stored separately to the object, within a database or metadata is embedded in the files requiring preservation. Metadata can be used for the source file format and when migrating to the new target file format for example word moving to pdf. Migration exercises are like Fight Club: There will always be losses. We have to decide when migrating what could be lost, what is an acceptable loss, what should not be lost such as significant properties and what are the choice that need to be made so that only acceptable losses happen. Ed and Steph suggested doing very detailed use cases before migration.

Day 3
On the third day we looked at metadata exercise; email preservation; social media: communicating with the user community; social media: user community and engagement; understanding legal issues for preservation and access; preservation of databases; and managed storage.

Metadata Exercise
On paper we were shown a painting and its museum cataloguing record. The painting had been digitised and metadata was present. There were gaps in the metadata which had to be identified and what preservation data was also required. This exercise highlighted that no matter the source of the metadata some metadata will not be present.

Courtesy of Elaine Harrington

Cloud Storage
Cloud storage providers should meet ISO standards and should care about auditing standards. Discussion during an exercise showed that institutions who have cloud storage should limit the holdings to within the EU at the very least. If material is held on the cloud and moved to American then it is subject to different copyright laws, different data protection. Copyright law has not yet caught up with digital content. Indeed if a project for digital preservation has EU funding then the storage, cloud or otherwise, may need to remain in the EU. Cloud companies don’t mention how long the objects will be stored for and considering how fast technology changes (who remembers VHS or Betamax?) will the objects require a new digital preservation storage facility in a very short span of time? Of equal concern was cost: it may require little money to insert an object into cloud storage but it could take a long time and much more money to extract an object from cloud storage. If an object is requested will it pass through multiple countries’ before reaching its destination? This may happen as cloud providers move data on a regular basis.  We were advised to always read the fine print!

There is much more to digital preservation than placing objects in cloud storage and all the processes and details are real and not imaginary like unicorns. A good deal of discussion is required no matter which method of digital preservation is chosen, no matter which method of storage for digital preservation is chosen and no matter which tools are used during the process. It was clear that we should all be engaging in digital preservation and that we should be engaging right now.

Thanks to Ed Pinsent and Steph Taylor who shared their experiences and expertise so freely. The slides are available through Creative Commons and UCLC. Thanks also to ANLTC and NLI for organising and hosting the event.


Post a Comment