Skip to content

How can we prove that digital preservation systems will deliver? (LIBER 5)

david1 This blog post is about the LIBER2011 workshop with the poorest attendance (14 out of 400 conference participants having a choice between three parallel sessions). Attendance may have been poor, but the subject matter was important and thus I can only conclude that I and others who plead the cause of digital preservation still have a lot of work to do. (Or are the other 386 counting on me blogging about it in sufficient detail ;-)

Why testing?

Over the past 15 years or so we have been building preservation systems and putting our digital collections (or, more precisely, ‘digitally encoded information’) into them. But how do we know that they will deliver? Last month in Tallinn, Michael Seadle called our present systems ‘a leap of faith’ and with Andreas Rauber he pleaded for more testing and more exchanges of testing data (see post).

But what do you test? And how?

That was what David Giaretta’s workshop was about, in the context of the APARSEN project (a major European project with 32 partners) (slides in this post courtesy of David Giaretta).

_DSC7084 Giaretta explaining the four phases of APARSEN: Trust, Sustainability, Usability and Access. Testing is part of the trust package.

‘We need more than migration and emulation.’

The most well-known preservation techniques are migration and emulation. The results are tested on the basis of ‘significant properties’: Is the information an organization regards as essential still there after the object has been changed or, alternatively, in the new computer environment that purports to emulate the old computer?

Giaretta asserts that these techniques are useful for some digital objects – and they have a role to play in determining authenticity -, but the techniques do not work for all objects. APARSEN has developed a three-dimensional model to characterize objects technically to be able to determine what tools can be applied:

david2 

Which leads to these conclusions:

david3

So, we need other techniques in addition to migration and emulation. Especially if we want to our information to be part of the Global Brain of Linked Open Data Herbert van de Sompel spoke about on Wednesday.

First question: who do we preserve for?

Giaretta has developed a very elegant way of describing what we do all this work for: we have ‘unfamiliar’ stuff (rows of ones and zeros) which we must make ‘familiar’ for people to be able to use it. We must do that now, and we must continue to do it in the future. Over time, the job will become more difficult.

Second question: what do they need to use the object?

What ‘familiar’ means, depends on the context, on a central concept from the OAIS reference model, the ‘designated community’, the user group an institution works for and their knowledge bases. If the target audience is a group of five-year-olds, our rendering techniques must be very sophisticated so the five-year old only has to push a button. If the audience is a group of computer specialists, less help will be needed.

In this view, the representation information which is part of the OAIS is included in the AIP (OAIS term for the archival information package that includes both the object itself and all the extra information needed to process and render it) becomes the focus of testing the systems (see OAIS Information Model). Is everything there that the designated community needs to be able to use the information?

david4

The ‘representation information network’

As we saw above, the representation information varies between designated communities. But it will also change over time. A present-day computer will understand the information ‘this is XML’. But in 2080 XML is perhaps an archaic file format, and the rendering information will have to be much more specific in telling the computer how it can render XML so a human (or machine) can use it. And if the manual for the programme happens to be in PDF, it will need to include the same information about PDF. Discipline-specific information must also be included, such as vocabularies and ontologies. And when the information package contains a series of dates one must be able to determine the time zone, summer or winter time, etcetera.

It is a network, to which new information must be added as time goes on:

david5

This network can be tested. Is all the required information being preserved?

In the past month, APARSEN has been doing a series of test audits in Europe in preparation for the ISO16363 standard which is in the making. The tests were also designed to test prospective auditors. The provisional conclusions are as follows:

  • most audited organizations do a good job at preserving the bits;
  • quite a few organizations lack succession plans (what happens to the data when my organization ceases to exist?);
  • quite a few have not defined their designated communities;
  • typically, the representation information networks are insufficient or non-existing.

Giaretta concluded:

david6

2011-06-29 11-06-50 - 114

Plenty of empty chairs … (Photo: Jordi Aguilar)

Here is David’s impressive list of references for those of you who want to know more:

1. CCSDS. (2002), Reference model for an Open Archival Information System (OAIS). Retrieved from: http://public.ccsds.org/publications/archive/650x0b1.pdf

2. OAIS update (at the time of writing under CCSDS review), http://public.ccsds.org/sites/cwe/rids/Lists/CCSDS%206500P11/Attachments/650x0p11.pdf

3. Knight, G., 2008, Framework for the definition of significant properties. Retrieved from http://www.significantproperties.org.uk/documents/wp33-propertiesreport-v1.pdf

4. Wilson, A., 2007, Significant Properties Report. Retrieved from http://www.significantproperties.org.uk/documents/wp22_significant_properties.pdf

5. J. Rothenberg and T. Bikson, 1999, ‘Carrying Authentic, Understandable and Usable Digital Records Through Time’ report to the Dutch National Archives and Ministry of the Interior. Retrieved from http://www.digitaleduurzaamheid.nl/bibliotheek/docs/final-report_4.pdf

6. M. Hedstrom and C.A. Lee, “Significant properties of digital objects: definitions, applications, implications”, Proceedings of the DLM-Forum 2002. Retrieved from http://ec.europa.eu/transparency/archival_policy/dlm_forum/doc/dlm-proceed2002.pdf

7. Cedars project, http://www.leeds.ac.uk/cedars/

8. Investigating the Significant Properties of Electronic Content over time (InSPECT) http://www.significantproperties.org.uk/

9. The InterPARES project, http://www.interpares.org/

10. Wison, A., 2008, Significant Properties of Digital Objects, presented at “What to preserve? Significant Properties of Digital Objects”. Retrieved from http://www.dpconline.org/docs/events/080407sigpropsWilson.pdf

11. DELOS Digital Preservation Testbed. Retrieved from http://www.ifs.tuwien.ac.at/dp/testbed.html

12. OCLC/RLG Working Group on Preservation Metadata, 2002, Preservation Metadata and the OAIS Information Model, A Metadata Framework to Support the Preservation of Digital Objects. Retrieved from http://www.oclc.org/research/projects/pmwg/pm_framework.pdf

13. Derek Sergeant, 2002, Interpretation of the OAIS Model. Retrieved from http://www.erpanet.org/events/2002/copenhagen/presentations/dmserpanet.ppt

14. CASPAR Access Model, http://www.casparpreserves.eu/Members/cclrc/Deliverables/report-on-oais-access-model/at_download/file especially section 2.

15. Michael Factor, Ealan Henis, Dalit Naor, Simona Rabinovici-Cohen, Petra Reshef, Shahar Ronen, IBM Research Lab in Haifa, Israel and Giovanni Michetti, Maria Guercio, University of Urbino, Authenticity and Provenance in Long Term Digital Preservation: Modelling and Implementation in Preservation Aware Storage, TaPP ’09. First Workshop on the Theory and Practice of Provenance. San Francisco, 23 February 2009, http://www.usenix.org/event/tapp09/tech/full_papers/factor/factor.pdf

16. CASPAR Conceptual Model, http://www.casparpreserves.eu/Members/cclrc/Deliverables/caspar-conceptual-model-phase-1-1/at_download/file

17. Giaretta, D., 2007, The CASPAR Approach to Digital Preservation, The International Journal of Digital Curation, Issue 1, Volume 2, http://www.ijdc.net/index.php/ijdc/article/viewFile/29/18

18. CASPAR – Cultural, Artistic and Scientific knowledge for Preservation, Access and Retrieval. See http://www.casparpreserves.eu

19. Mike Coyne, David Duce, Bob Hopgood, George Mallen, Mike Stapleton. The Significant Properties of Vector Images. JISC report, 27 November 2007. http://www.jisc.ac.uk/media/documents/programmes/preservation/vector_images.pdf

20. Mike Coyne, Mike Stapleton. The Significant Properties of Moving Images. JISC report, 26 March 2008. http://www.jisc.ac.uk/media/documents/programmes/preservation/spmovimages_report.pdf

21. Brian Matthews, Brian McIlwrath, David Giaretta, Esther Conway. The Significant Properties of Software: A Study. JISC report, March 2008 http://www.jisc.ac.uk/media/documents/programmes/preservation/spsoftware_report_redacted.pdf

22. Kevin Ashley, Richard Davis, Ed Pinsent. Significant Properties of E-learning Objects. JISC report, March 2008. http://www.jisc.ac.uk/media/documents/programmes/preservation/spelos_report.pdf

23. PARADIGM project, Workbook on Digital Private Papers. http://www.paradigm.ac.uk/workbook/preservation-strategies/file-properties.html