On trust and risk and, more importantly: Can we measure any of it? – #iPRES2012 (3)

Published: Thu 04 Oct 2012

It is much to the credit of René van Horik of the Dutch Research Data Archive DANS that he did not just prepare a one-way presentation about the work the European project APARSEN is doing on trust, audit and certification (which would have been the easy way out), but that he brought together a jury of his peers to discuss the project from different points of view. Such an approach takes a lot more work, but it leads to lively and informative confrontations for the audience – and the readers of this blog. From iPRES2012 - by Inge Angevaare

The panel included, from the left, Seamus Ross, Andreas Rauber, Helen Tibbo, Silvio Salza, and Maurizio Lunghi.

For those new to the APARSEN work on Audit and Certification, you can find a summary (from Screening the Future 2012) here. The EU is building a three-tiered framework for audit and certification of trusted digital repositories in Europe and the APARSEN work on trust feeds into that. The first level is a self-assessment against very broad, general criteria (Data Seal of Approval), then comes a self-assessment against a real ISO standard, ISO 16363, and the highest level is a third-party audit against ISO 16363. The work of APARSEN carries on from the TRAC project by the US Center for Research Libraries.

After René’s introduction, the first speaker from the panel, our host in Toronto Seamus Ross, wasted no time at all to get his point across. “There is no such thing as trustworthiness,” he said. “There is so much involved in running a digital repository, that a) we cannot measure trustworthiness, and b) we cannot maintain it over time.”

Ross: “There is no such thing as trustworthiness.”

So much for a friendly kick-off …

“Instead,” Ross continued, “I believe in a risk management approach. I can measure risks to the security of digital objects – the risk load. And I can measure the risk resilience of an organization. Typically, a small, young organization can adapt easily; an old, large organization, with many more actors in play, will be much slower to react.” Seamus did not mention it, but a good example of such an approach is the Drambora toolkit.

Helen Tibbo, of the U. of North Carolina School of Library and Information Sciences, agreed with much that Seamus Ross said, but, she added, “that is not the whole story.”

Tibbo: “In a restaurant you can measure objectively whether food is being kept at the right temperature.”

Her main argument was with the metrics involved in ISO16363. “For instance, ISO 16363 prescribes that you have to have a mission. But it says nothing about the quality of that mission” (again, see previous post). Helen Tibbo also questioned the role of the auditors. “If auditing becomes a career, what will happen to objectivity?” Tibbo’s comments come from a “trusted” source, because she was part of the team that carried out test audits for the APARSEN project.

The panel went on to discuss research data quality and authenticity. Silvio Salza presented the recommendations of the APARSEN project that, really, the entire process of data creation and all subsequent actions which could possibly harm the data should be monitored. Helen Tibbo was quick to react to that: “Yes, that would be an ideal world. But it is not going to happen. Nobody has the money and staff to do that.” From the audience Bjarne Anderson (Denmark) agreed with Helen. “Counting the rows and the columns is about the best we can do.”

Session Chair Peter Doorn (Director of DANS) harvesting the panel’s comments

Which prompted session chair Peter Doorn (Director of DANS) to conclude: “The world is messy, the digital world is messier, and the research world is the messiest,” – no doubt thinking of a recent scandal in the Netherlands whereby a psychologist faked all of his survey data and got away with that for years before being caught out.

At the end of the session, Seamus Ross concluded that it had been “fun”, with those different points of view. All agreed, however, that the goal of audit and certification must be to make digital archives perform better. And that, somehow, sometime, we must come up with more objective metrics to benchmark digital archives, whether those be trust-based or risk-based. But preferably automated, embedded in the systems, because if we have to do all of this manually, we will never find enough funding.



Identifying the many faces of digital preservation research – iPRES2012 (2)

Published: Wed 03 Oct 2012

While many are struggling to implement the results of digital preservation research so far (after all, we cannot just buy another system every time new approaches are discovered), others are looking to “step beyond the limitations of solutions that are applicable now, and develop concepts, models and solutions for upcoming challenges” – which is a quote from the announcement for the Open Research Challenges in Digital Preservation workshop at iPRES2012. – by Inge Angevaare

“Get away from day-to-day reality,” said co-organizer Christopher Becker of TU Wien at the beginning of the workshop, “and really look ahead.” Taking advantage of the moment to pitch the SCAPE project Becker is involved in, he went on to say, “As amazing as the SCAPE project is, there still are a few research challenges left.

Christoph Becker: “There are many perspectives, but they all relate to each other.”

Quite a bit more than a few, as the workshop demonstrated. The themes that were tabled by co-organizers Becker, Andreas Rauber (both of TU Wien) and Cal Lee of the University of North Carolina were broad:

  1. Digital information models
  2. Value, utility, cost, risk and benefit
  3. Organizational aspects
  4. Experimentation, simulation, and prediction
  5. Changing paradigms, shift, evolution
  6. Future content and the long tail.

No fewer than 60 digital preservation experts from around the world took these questions on in six round-table discussions in varying formations. Which led to an incredibly tense 8-hour session, which I will not summarize for you in a mere blog post. I simply could not do it, especially not on the same night. Fortunately, I do not have to. There will be a full report from the group chairs and I will come back to that in due course. But I would like to share some first impressions with you.

To get us off to a good start, there were no fewer than ten (!) ten-minute presentations with suggestions for research topics. You can find all of them on the workshop website, with short papers to clarify the positions. I suggest you have a look at that page:, because it will give you a sense of the incredible variety and depth of topics we got to deal with – before the first coffee break.

The inevitable beamer-breakdown-moment befell René van Horik of the Dutch Data Archive DANS who presented the challenge of moving from theory into practice

What ensued during the workshop also reflected the many different aspects of digital preservation (DP). For DP is a truly interdisciplinary subject:

  • There were talks about “information models”: what are we preserving? What is information? What is data? What type of unit should we consider?
  • There were talks about the economic aspects: what is the value we produce? What is the cost? How can we balance costs against risks?

Rainer Schmidt chairing the table on value, utility, cost, risk and benefit

  • There were talks about the social and psychological dimensions: how are we, as humans, dealing with this incredibly new digital world? Can we cope with the “chaos” of the internet, and how?
  • There were talks about the organizational cultures in which the systems and models must be embedded:

Fiorella Foscarini: “Technology is never neutral – the cultural element should be on the digital preservation agenda.”

  • There were talks about Google and Amazon and whether perhaps we in the digital preservation community are trying to re-invent the wheel …
Quote of the day: “Where is OAIS? It is in DropBox.” (Petar Petrov)


  • There were talks about roles and responsibilities in an international information space while we still have nation states and national legal deposit schemes and concepts such as grey literature: they do not seem to fit anymore, but how to devise a new “world information order”?

Cal Lee organizing his group’s notes on organizational issues

At the end of the day, I think all of our heads were spinning with the multitude of issues to be considered.

Changing tables and topics every 25 minutes

But it was a great experience to freely exchange all our ideas with committed colleagues from all over the world. And when the dust settles, I am sure much good will come of it.

(to be continued!)

Theory and practice of digital preservation education – #iPRES2012 (1)

Published: Mon 01 Oct 2012

The most, what shall I say, dedicated digital preservation conference, iPRES2012, has come to Toronto, Canada. Five days of workshops, tutorials, presentations and panels – we have our work cut out for us. Let me reiterate how much I hate parallel sessions, because I have to forego the one about practical emulation tools to attend one about an equally important subject: digital preservation education (website includes all of the slides). Let’s just hope the OPF and or KEEP colleagues will blog about the emulation workshop. I’ll concentrate on education - by Inge Angevaare

Some of the workshop experts, from the left Helen Tibbo (U. of North Carolina), Joy Davidson (UK Digital Curation Centre), Cal Lee (U. North Carolina), Neil Grindley (UK JISC)

Unsurprisingly, there is a big need for digital preservation/digital curation training …

I need only remind you of one of Liz Lyon’s slides from the May LIBER curation workshop:

George Coulbourne of the Library of Congress (LoC) presented equally distressing data from a US survey:

… but your needs are not mine, and finding the right course if you want to learn something is not easy …

Joy Davidson (UK Digital Curation Centre or DCC): “There is a huge body of courses and trainings out there, but there is little guidance for prospective participants for making informed choices on where to go.” [As for the "huge body", I am not sure every country has the same level of offerings as the US and the UK, but that is another story.]

so there is a lot of work being done on structuring the marketplace, both the needs for training and the supply

A first, very useful and simple structure came from George Coulbourne of the US Digital Preservation Outreach and Education program (DPOE, pronounce “depot”, or Dutch: diepoo):

The DPOE pyramid

This one is pretty obvious. At the managerial level you need different bodies of knowledge/competences than at the practical level.

Other modelling efforts go a lot deeper. The intention is, of course, to define the needs as precisely as possible in order to design training that meets those needs. The UK Digital Curation Centre’s target audience is the sciences and the humanities. In the UK, “Vitae” has designed a “Researcher Development Framework” (or RDF – not the handiest of acronyms for those involved in data management …) listing all the competences and skills a researcher must have. That is a good starting point for defining researchers learning needs.

From another angle, the DCC’s Andrew McHugh highlighted “PORRO”, a Risk Relationship Ontology for digital preservation. That sounds (and is) a lot more complicated. PORRO is derived from DRAMBORA, a more well-known tool for risk management. From what I understand, PORRO is about defining your goals, defining the threats to those goals, and defining the measures which you can take to mitigate those risks. Joy Davidson explained why this tool was presented at the workshop: because it is important for the education community to be aware of the work going on in the audit community. And the tool can also help to define training needs. (I haven’s found a url for this work, so for more information, send an email to

Among the workshop attendees were Erik Moore (U. of New Brunswick), Barbara Sierman (Dutch National Library), Annemieke de Jong (Dutch Sound & Vision archive) and Jill Sexton (U. of North Carolina Library).

Joy Davidson presented the DCC’s work on criteria for Researcher Information Literacy Skills. She explained: “These are things to think about when you start developing a course”. And, of course, to evaluate courses. And the DCC’s lifecycle model was included in the workshop as a teaching tool.

The European DigCurV project is also working on a structure to define educational needs, this time at the vocational rather than the academic level (click on image to enlarge) (see website for more details):

Before you get tired of all this theory, I might mention that Joy Davidson stressed that all of these models are, in fact, feeding into each other, which no doubt is the result of lots of international networking, e.g., at the 2011 ANADP conference (see my blog post from the educational panel).

What about hands-on experience? 

“There is far too little of it,” was George Coulbourne’s simple summary. Coulbourne leads the US Library of Congress’s DPOE Program which also includes a “National Digital Stewardship Residency” initiative which is now in its beta phase.

Coulbourne: “Graduating students typically lack hands-on experience and the ability to work with others.”

As we all know how trainees are often simply dropped in organizations to fend for themselves, with little learned at the end of their internship, it was refreshing to hear of the NDSR approach: a two-week-long “immersion workshop” for post-graduates followed by “nine months of residency with well-defined, comprehensive hands-on projects with concrete deliverables.” In order to motivate the host institutions, they are involved in the very serious selection process. Good stuff, ambitious too.

The Library of Congress DPOE program also has a calendar of mostly practical courses which is very informative. In North Carolina, another platform for sharing curriculum materials and information has been developed, the Digital Curation Exchange (DCE). Someone suggested that all of this information should be brought together, but perhaps that is too much to ask. Maintaining a structured registry would probably be too large a task for any single institution to take on. Cal Lee of the University of North Carolina advocated rather a spontaneous approach to such tools.

So, where is the rest of the world?

All of the above came from the US and the UK. Both are very active in digital curation education, but in a next workshop I would also like to hear what is happening in other countries, like Germany (nestor?), France, etc. As for the Netherlands, I was very happy with this workshop because it offered inspiration for things we might do in the Netherlands.

Shigeo Sugimoto: “Our culture is very different.”

When questioned about the situation in Japan, Shigeo Sugimoto reported that there are not many specialists in Japan. “My first job”, he said, “is to create awareness”. Sugimoto also suggested that digital curation for the sciences and humanities is quite different from digital preservation for (cultural) heritage institutions. Perhaps the curricula should be different also.

At the end of the day …

All of the above is very encouraging, as was emphasized by Neil Grindley (JISC) in his summing up. However, I did learn that there are two time bombs ticking underneath all of this wonderful work. One is budget cuts, everybody is feeling them. The other is the pace at which the world of digital preservation is changing. Curriculum development can hardly keep up. Obsolescence of the teaching tools themselves is just around the corner.

Other blog reports from iPRES 2012:

Sarah Jones on the same education workshop at

David Anderson of the KEEP emulation project chastizing yours truly during lunch for not attending the emulation workshop …

Toronto from the CN Tower 450 meter above sea level – going up the tower seemed like the most efficient way to see the “whole city” in a couple of hours leisure time on Sunday

Comparing digital preservation approaches, part 3, conclusion – #ICA_2012 (7C)

Published: Mon 03 Sep 2012

Following up on my first and second posts on different digital preservation approaches among Australasian and the UK archives, let’s see if some general conclusions can be drawn with regard to archives and digital preservation. - by Inge Angevaare

Our host David Fricker (NAA): “Everything will be turned on its head.”

Jan Hutar (now of New Zealand Archives) said in his presentation that the archives seem to be slower to take up digital preservation issues than the libraries. I think that is quite understandable: for libraries digital content has been a reality for some time now, whereas most archives are still living in an analogue day-to-day reality because of legal transfer periods (typically 20 years or more). For most archives, the real digital storm is still to come. But when it comes, it will be a big storm. Cassie Findlay (NSW) explained that no two agencies are the same and there is a huge variety among their systems and approaches. Jan Hutar emphasized that Archives New Zealand expects quite a few databases, and these, as yet, present more questions than answers. To archive and provide (authentic, reliable) access to all of that is truly a challenge.

The vision

Most speakers at ICA 2012 emphasized that access is what archives are all about. The most succinct summary of the digital playing field and the business case for archives came from our host, David Fricker of the National Archives of Australia:

NAA’s David Fricker’s summary


Reality, obviously, lags behind this vision. In most countries, there is no legal basis whatsoever for archives’ wish to comply with users’ demands for instantaneous access. The NAA’s digital archive itself is far from being equipped to provide access “online, anywhere”. Also, as Simon Foudre (South Australia) put it: “Technology that helps generate material will continue to outpace the technology needed to preserve it.” Embedding preservation and archiving in the source systems is, of course, the most logical answer to the challenges, and it gets right to the root of the problem. But can it be done? Can the archives muster enough understanding of the agencies’ workflows, enough manpower, technology, skills and expertise to really become a trusted partner for agencies in the entire information lifecycle? Tricky issues lie ahead, such as distributed custody, custody over long-term temporary records, etc.

I can see the larger archives taking this on, but what about the small ones, the ones in developing countries? There was a wonderful presentation by Fabiao Nhatsave and Arlanza Dias from Mozambique. The archives in Mozambique are just starting out. In terms of leveraging their available resources they did what they could: train 11,600 (!) records managers. But, obviously, they have a long way to go and with much less funding than Western archives.

Mozambique: training 11,600 records managers …

Collaboration between smaller archives (and perhaps even with libraries, as in New Zealand) seems a logical way out of the money predicament. But only one speaker expressly addressed this issue, Robert Kretzschmar of Baden-Württemberg:

Robert Kretzschmar on local archives

I know of a plan for a shared services centre for smaller archives in the Netherlands, which was presented in 2010, but which, alas, still awaits funding. Money is tight at the moment, and there are three tiers of government involved …

Some notes on different approaches to DP issues

Having painted the larger picture, here are some notes on specific aspects of digital preservation and the choices made by various institutions:

To normalize or not to normalize: National Archives Australia (NAA) and the Public Records Office Victoria (PROV) say yes to normalization, because you do not want to get corrupted files into your system and discover only later that they are. Plus, it minimizes the workload. Archives New Zealand: no normalization, but some “pre-conditioning”. The UK National Archives (UK NA) do not normalize, they say that file formats are standardizing themselves. Also, only 25% of the records will ever be used, so it can be a substantial waste of effort. Plus: 85% of what, e.g., the 2012 Olympics will generate, consists of MS Office formats.

To develop your own systems or buy a vendor’s solution: Michael Carden of NAA emphasized that he regards digital preservation core business. Thus NAA develops and maintains all of its own systems, as do PROV and New South Wales (NSW). I myself wonder where this leaves the small archives. Can they ever muster enough resources to build their own systems? Archives New Zealand works with a commercial system (Rosetta by Ex Libris), but “that decision was forced upon us” (Fleming). The Tessella Safety Deposit Box is used by a number of archives (a.o., NA Netherlands, Municipal Archives Rotterdam).

Ex Libris stand at ICA 2012: “working to empower your library” … – faux-pas or visionary?

To keep content and metadata together, or keep them separate: PROV, NSW: together, to avoid the risk of broken linkages; NZ Archives: separate but safeguards for broken linkages.

To quarantine or not to quarantine: NAA quarantine typically is 28 days; PROV quarantine is 3 months. Archives New Zealand: continuous virus checking, because some viruses crop up later than 3 months.

When is custody completed? PROV waits until the entire ingest process (including quarantine and two virus checks) is completed before sending the agency an explicit “custody report”. Archives New Zealand still has to arrange for something like this.

On security: NAA is pretty much a dark archive, which is very secure. PROV provides access immediately, and it has implemented 4 levels of security (firewall, etc.) within the system.

Physical and digital records integrated? Yes, says PROV. For the user a record is a record.

Online transfer or through physical media? PROV has an online inbox, but that is hardly ever used. Typically, records are delivered on physical media. So far, of course …

Preservation strategies: Jan Hutar (Archives New Zealand): “We consider both migration and emulation as viable strategies. Migration has to be done again and again. Emulation is more complicated.” Other archives report that they have not yet performed preservation actions (other than normalization at ingest).

A last note

My reporting on ICA 2012 has been nowhere near complete, of course. I have singled out the presentations that had something to do with digital preservation, in the context of the archival function in the digital age. There was plenty to learn, I am happy to say (from Taipei Airport, where the wifi is good; after some delays I am finally on my way home).

The next ICA congress will be ICA_2016 in Seoul, Korea. The theme: “archives, harmony & friendship”. I wonder how that is going to translate into the digital world …

Tangalooma, 50 years after the whaling station was closed



Comparing digital preservation approaches, continued – #ICA_2012 (7B)

Published: Sat 01 Sep 2012

Following on from my previous post from ICA_2012, on Friday the Australasian Digital Recordkeeping Initiative (ADRI) organized a truly inspiring workshop. Ably led by Simon Foudre of Southern Australia (the only state in Australia that does not yet have a digital archive for public records … [-- corrected by Andrew Wilson in comments below: PROV and NSW are the only states with a digital archive]), the workshop showcased four different approaches to digital preservation: the National Archives of Australia (NAA), New South Wales (NSW), Victoria (PROV) and Archives New Zealand. After an introduction to each of these initiatives, the 30 workshop attendees were invited to do a SWOT analysis of all four approaches. - by Inge Angevaare

Simon Foudre leading the ADRI workshop

What impressed me most about the workshop was the honesty and willingness to share and learn of each of the initiatives involved. What surprised me a bit, was that in a small country like Australia (I mean, there are only 22 million inhabitants), the states would have different approaches. When I asked about that, I was told that state autonomy is quite a big thing in Australia. Also, digital preservation is probably too young a discipline to put all of our eggs in one basket. We need room to experiment. And the States do exchange information, e.g., in ADRI, to learn from each other. Andrew Waugh (PROV, Victoria): “OAIS is great … for two months. Once you start designing a system, its usefulness declines fast. Connecting with existing systems is something OAIS does not provide for. Then you should talk to other archives.” (See also Waugh’s comments during the plenary session in previous post).

Each table was asked to discuss the strengths and weaknesses of a particular system.

On SWOT analyses

At the risk of stating the obvious: SWOT analyses are rarely black-and-white judgements. Any strength can be a weakness as well.  Andrew Waugh: “Our greatest strength is that we have an operational digital archive [since 2005]. Our greatest weakness is that we have an operational digital archive, because it means that the urgency for granting funding for much-needed improvements is much less.” Similarly, the NAA’s system is very secure, because there is hardly any access – but if you make access your main goal, it turns into a weakness (see previous post). Some of the weaknesses identified, especially of the older systems (NAA, PROV) have been recognized by those that manage the systems, but require additional funding. As Simon Foudre emphasized, we live in an era of “heightened fiscal prudence” (some understatement) – which are aggravated by increased user expectations for instant access and fast technological changes.

I described the approaches of NAA and PROV in some detail in my previous post, so here is some more information about the other two participants: New South Wales and the New Zealand Archives.

Taking proactivity to a whole new level: State Records New South Wales

Cassie Findlay of State Records New South Wales (NSW) agreed with many of the principles outlined by Hofman of the Dutch Archives (earlier post): “State Records and NSW government agencies need to work together to tackle digital archives challenges.” Records are a by-product of the agencies’ core business, and therefore the archives have to offer whatever they can in terms of tools and advice to make it easier for agencies to manage and transfer their records. They’ve got to get involved at the creation phase.

Cassie Findlay: “Systems migrations is what agencies know.”

State Records NSW has decided not to accession individual records, but to migrate entire records keeping systems forward. For each agency, a migration project is set up which is specifically attuned to the agency’s requirements. “Each project will have different requirements, but all will involve analysis of recordkeeping system structures, metadata; preservation issues; possible migration protocols; indexing points/potential uses.” The options for storing the recordkeeping systems are:

  • as objects with metadata in text files
  • XML DB normalisation
  • RDF/XML DB normalisation
  • Migration to non-proprietary dbase, e.g., SQLITE
  • … probably combinations of these.

NSW also plans to make available a digital archives dashboard, with all sorts of tools and resources: a register of migration pathways, a register of metadata terms, file format identification tools, metadata mapping tools, forms and templates and a knowledge base. This dashboard will be based on best practices, and will grow over time. Also, the approach is open to distributed custody of records. More info on the project’s blog.

Discussions continue during lunch break. Is the NSW approach sustainable? Only time will tell.

Workshop attendees agreed that the approach takes digital archiving to a whole new level. They saw the flexibility of the approach as a great strength, and the development of entirely new relationships between agencies and archives as a great way to prove the value of archives. However, the approach is resource intensive, it requires skilled staff, and it does not (yet) nail the management of source records.

Sharing a system with the national library: Archives New Zealand

Alison Fleming of Archives New Zealand told the workshop about the “interesting journey” which has been the result of a May 2010 government decision that the core of the Australian National Library’s Rosetta system must be shared with the national archives.

Alison Fleming pointing to the part of the system (circle) that is shared with the National Library.

Some of the key principles of the Archives New Zealand approach are:

  • share central digital preservation system with National Library
  • describe digital archives at item level, not just aggregates
  • integration of digital and non-digital processes, automation
  • support for agencies is critical
  • “file formats we accept” depend on circumstances
  • digital repository will hold digitised material too
  • “We will not get it right first time, or even for a long time”
  • collaboration is critical.
Fleming reported that “libraries and archives do speak different dialects of the same language.” “This has led to robust discussions about occasions where the library and the archives make different choices.” In my own country, the Dutch National Archives and National Library are merging in 2013, which also was a political decision. A merger of the digital repositories has not yet been decided upon; perhaps they should talk to New Zealand first?


In a separate, more technical presentation about preservation strategies, Jan Hutar (who moved from the Czech Republic to New Zealand) summarised the “Rendering Matters” report recently published by the Archives:

Jan Hutar, now of Archives New Zealand

On the subject of preservation strategies, Hutar concluded:

In the workshop SWOT analysis, the use of a commercial system (Rosetta) was identified as a weakness rather than a strength; there were doubts whether the archives could get their information out if they wanted to make major changes. Interestingly, budget cuts were not only identified as a threat; they were also seen as an opportunity to get involved in actual collaboration rather than “tick-the-box collaboration”.

Noting strengths, weaknesses, opportunities and threats

So much for now; I will write a third post with some (tentative) conclusions. First I have some hassles to clear up. I am still down under. My flight was cancelled; a delay of at least 48 hours. So much for trying to save money …

On a positive note: yes, I did get to see some active whales and at least two beautiful jumps. Taking pictures was more of a challenge. The seas were rough, I needed both hands on the railing most of the time just to stay on the boat. At least I was one of the lucky ones who did not get sick. Interestingly, whale watching now brings in more money to the area than whale catching and slaughtering did in the 1950s and early 1960s. That’s the way.


Comparing digital preservation approaches – #ICA_2012 (7A)

Published: Tue 28 Aug 2012

For many years, Australia and New Zealand have been at the forefront of developments in digital preservation. As digital preservation is core business for all archives, ICA 2012 (International Council on Archives, Brisbane) presented an excellent opportunity to compare notes on different approaches. In what follows (and the next post) I shall be mixing and mashing notes from three plenary and breakout sessions with those from a full-day’s dedicated digital preservation workshop on Friday during which the attendees did a SWOT analysis of different Australasian approaches to digital preservation. - by Inge Angevaare

A full house to hear Michael Carden and Andrew Waugh speak about the “digital continuity” experiences of the Australian National Archives and Victoria Public Records Office respectively.

Setting the scene: the Australian National Archives

Michael Carden of the Australian National Archives (NAA) expertly set the scene for the discussion of Australasian digital preservation approaches, taking his cue from the 2002 Green Paper which defined the concept of “performance” as the “thing” to be preserved. NAA have chosen to “normalize” all incoming file formats to standards-based open file formats and developed the Xena software to do this. The so-called Digital Preservation Recorder (DPR) manages this process and leaves an audit trail – all details to be found in Carden’s informative full paper. Carden asserted that digital preservation is “core business” for archives, and therefore not something to be outsourced. All of the technology is developed in-house and made available to the community in open source, “which gives transparency to the process; others can check us.”

Carden was very confident in the NAA’s ability to deliver authentic and reliable records. He mentioned scaling as the major challenge ahead. During the workshop, another possible weakness was mentioned: the NAA is a preservation-only system. Very few people have access. This makes the system quite secure, of course. I also learned that it was a matter of funding: funding was only provided for a preservation system. So far that is not a problem, because the records in the system (from government agencies that have been decommissioned and some private records) are not yet eligible for access (due to the legal transfer period 20 years, which equals the embargo period on access). But, eventually, more funding will have to be secured to build an access module. Also, the original NAA design deals only with records that have been legally transferred to the archive. As we shall see below, thinking has meanwhile evolved into a more proactive direction.

Michael Carden (right) and Andrew Waugh (middle) discussing their approaches with Hans Hofman of the Dutch Archives

Victoria Public Records Office: “The challenge is to get the records”

Andrew Waugh of the Victoria Public Records Office (PROV) readily admitted that his institution’s experience “stands on the shoulders of giants”, referring, a.o., to the National Archives. But the PROV system was designed to provide access right away, in compliance with the legislation of the State of Victoria. PROV, like the NAA, normalizes the records it receives, while keeping a copy of the original. However, access is preserved only to the preservation format copies for the indefinite future – “if rendering engines are not available, we will either build our own or migrate the long-term preservation formats.” Adding that “We do not consider that this will occur for many years,” echoing what David Rosenthal from LOCKSS has been arguing for some time in the library community.

PROV is built on the basis of commercially available, but fully compliant, components (Centera). It seeks to integrate physical and digital records into one system, as “a record is a record”. According to Waugh, one of the major flaws of the OAIS Reference Model is that it looks at digital records in isolation and does not make room for existing systems. The PROV system keeps digital objects and metadata together for fear of linkages being broken.

Andrew Waugh presenting the Victoria digital archive during the workshop.

Waugh drew special attention to the challenge of “getting the records”. This involves making sure that digital records are created and that they are transferred. “Currently,” Waugh asserted, “it is not economic for agencies to deal with archiving.” “We must work with agencies to make it less expensive to transfer records, and to transfer early. This includes dealing with non-mature (temporary) records.” This is a change from the traditional position of archives to deal only with mature records, and it is a change in attitude that was echoed by other speakers (including Hans Hofman). (Sorry, no full paper available for Waugh.)

UK National Archives: “Every decision should be based on usage”

Oliver Morley of the UK National Archives (no full paper available) disagreed with his Australian colleagues about file format normalization. He argued that “digital formats have standardized” by themselves: 99% of the records received have top-20 file formats, and most of those are regular Office formats. Besides, Morley argued, only 25% of what archives keep will ever be accessed, so normalizing everything can be a waste of effort.

Oliver Morley: “Social media still present a problem. But do not panic. Most content is not what you would collect.”

Another non-problem for Morley are web archives. “The challenge is not quite as big as I thought,” he said. “Public sector agencies typically produce small websites, and what is published is captured automatically.” Morley does acknowledge that e-mail is a problem. “We have no solution for e-mail yet. There are huge selection problems.”

Here are the principal features of the UK NA’s preservation policy:

So much for now; more (especially on the workshop) in the next post, scheduled for a few days from now. I cannot resist the temptation to go out and (try to) see some of the whales that are migrating past Australia’s east coast during the Winter months.

A local tells me that until about 20 years ago, the church and the town hall were the highest buildings in Brisbane …

… I have no way to verify this, but the official guide book of the city proudly declares that the oldest surviving home in Brisbane dates from 1846 — which is perhaps why the visitor’s bureau advertises Brisbane as “Australia’s new world city”

“Change”, but how? The archival function in the digital age – #ICA_2012 (5)

Published: Thu 23 Aug 2012

At the ICA 2012 congress, Hans Hofman of the Dutch National Archives took a step back from practical issues in order to look more fundamentally at the core of “the archival function” in present-day society. What exactly has the digital age brought about in society at large, and what are the implications for the role the archives can and must play? - by Inge Angevaare

Hofman preparing for his speech

To my mind, we don’t do this enough: stepping back from “life as it has always been” and from day-to-day practicalities to look at the essence of what we are doing and why, so this is a welcome effort by the Dutch National Archives (for details, see Hofman’s full paper).

Here are the key changes in our society according to Hofman:

The digital revolution

As a consequence, key issues facing society include:

  • Erosion of citizen trust in government
  • Erosion of government services
  • Loss of corporate/national memory
  • Loss of individual identity
  • Threat to individual rights.

These are the questions that archives must ask themselves:

Questions to be asked

Pondering these questions, the Dutch National Archives have formulated the following guidelines for the changes they must make. The framework has been adopted only recently, Hofman stressed, and still awaits implementation.  But it contains some very valuable principles, which I gladly reproduce in full:

Guiding principles for the archival function in the digital age

  1. The business process is leading, information management or record keeping is supportive and should be integrated.
  2. The National Archives should be involved right from the beginning (e.g., the planning and design of business processes, underlying (information) architectures and supporting systems).
  3. Focus on all information (or records) of government; comprehensive approach both in scope and in time; not only on archival records (This includes the need to be aware that different types of information exist and that each of them may have a different value to different audiences).
  4. All connections/linkages of a record (concept of a record) should be accounted for.
  5. A systematic approach to record keeping should be used based on a risk management approach (At which moments are records most vulnerable?)
  6. Government information is public and freely available, unless … (This entails that the National Archives aims to ensure the right to information for the public ["the public interest], including both government and private sector information, while respecting privacy rights, freedom of information, etc.

In other words: the “archival function” comprises the entire lifecycle of all government records, and there are more players in this lifecycle than archives. This is a fundamental change from the old situation, in which archives would only be involved in those records which are intended to be kept.

Lots of food for thought, I should think. The only thing I would add from the perspective of the National Coalition is an appeal to look beyond the borders of the archival sector at the entire information space. As described earlier, issues such as the preservation of the web and social media may require a broader approach which includes libraries, data centers and other stakeholders, nationally and internationally.

The Dutch delegation includes, a.o., Jantje Steenhuis and Mies Langelaar (Rotterdam Municipal Archives), Kees Schabbing (North-Holland Province) and (behind them, at right) Reinder van der Heide (Dutch Parliament)


Seamless integration between source systems and digital archives: Can we have it? Do we need it? – #ICA_2012 (4)

Published: Wed 22 Aug 2012

To my mind, conferences are at their best when they manage to bring together different viewpoints on sticky issues within a single session. The ICA_2012 congress managed to do just that on Tuesday afternoon, when Estonian archivist Kuldar Aas and Tessella’s Robert Sharpe reflected on what – I assume – is the ultimate digital archivist’s dream: seamless integration or transition of records from government agencies to archives. - by Inge Angevaare

Kuldar Aas presenting; at left Rob Sharpe and Master of Ceremonies Steve Stuckey

First, let’s look at the problem. Records which are to be archived inevitably come from many sources with different systems and thus they are likely to have all sorts of different metadata schemas which, inevitably, do not match the archival description formats used by digital archives. Standards are designed to help solve these problems, but, as all of us know, they are typically not complied with. Aas: “Standards are only being used as inspiration for tenders.”

Kuldar Aas: “It’s feasible, in about 5 years”

To deal with the issue, the Estonian National Archives designed a software tool, the Universal Archiving Module or UAM. This tool is designed to streamline the ingest process (more details in Aas’s full paper on the ICA website):

UAM software tool

The tool has now been in use for a few years, and here is some feedback from the agencies:

Feedback from the agencies

Interestingly, the agencies report that the tool has forced them to get their records management better organised. Now, that’s music in any archivist’s ears. On the negative side, they regret losing flexibility, which is inherent in any standardization process. And implementation is still very time-consuming.

UAM: lessons learned

Aas concluded that “seamless integration” is not yet possible at the present time, but he expects it to become possible in, say, five years.

Robert Sharpe: “We don’t really need it.”

Robert Sharpe represents Tessella, a vendor of digital preservation systems (Safety Deposit Box). He agreed with the problem, but challenged Aas’s solution, arguing that the combined schema will change over time, thus requiring further conversions or necessitating the system to work with multiple versions. Also, Sharpe argued, every conversion carries the risk of data loss. Alternatively, Tessella designed a system that can work with multiple metadata schemas (full details in Sharpe’s full paper)

Rob Sharpe: “Why bother?”

Here’s Tessella’s alternative:

Tessella’s alternative approach

And here are the advantages, according to Sharpe:

Advantages of the Tessella approach

Now, I am in no position to tell which approach is “better” (if such can be determined at all, at this stage), but there is one thing about Sharpe’s approach that appealed to me very much: the fact that it reduces barriers to ingest, that it allows organizations to get stuff into their systems without much ado. All too often valuable data remain “on the other side of the wall”, because ingest is too problematic. In this way, at least, the data gets into a system where it is protected and backed up. Extra metadata can always be added at a later stage. I am reminded of the social media debate (yesterday’s post): because it is complicated, nothing is done at the moment, and that is certainly the worst of options.

On the other hand: how does this compare with the adagium “garbage in, garbage out”, in other words: “What about access?” Sharpe: “Nowadays metadata are not the only way to search content, there are such facilities as full-text search. Besides, if you make use of the original metadata, you might even be able to get in deeper.”

“Might this approach be too simple for complicated data such as census data?”, someone asked. “That is quite possible,” Sharpe allowed.

Entr’acte: archivist, diplomat’s wife, spy, novelist: the story of Dame Stella

On another note altogether, there was a charming presentation by Dame Stella Rimington. She started out her career as an archivist, then became a diplomat’s wife in Delhi (mostly hosting tea parties), eventually to be recruited by MI5, the British Secret Service, where she ended her career as the first female Director. Since then, she has written seven novels …

Dame Stella discussing secrecy and freedom of information with ICA President (and Dutch National Archivist) Martin Berendse

Dame Stella described the Cold War years when everything revolved around secrecy. She indicated that the emergence of terrorism, and the consequent need to share information, in large part contributed to the present demand for more openness. However, she strongly condemned such initiatives as Wikileaks, which “indiscriminately” leak information, putting “live sources” (now there’s a spy-word!) at risk. She warned that the risk of leaks will cause government agencies to make decisions without leaving any paper trail, which is quite the opposite of what movements like Wikileaks profess to strive for.

She concluded by saying that since her own days as an archivist (when her main worry was to prevent parchment records from being turned into fashionable lamp shades), the life of an archivist has become more complicated, “and thus more interesting.” Now that’s the spirit!

Social media debate testimony to “climate of change” for archives – #ICA_2012 (3)

Published: Tue 21 Aug 2012

This year’s Congress of the International Council on Archives (or ICA_2012, Brisbane) is entitled “A climate of change”. That is quite a broad motto, and it could easily just be a cliché without much meaning. So I was more than pleasantly surprised when the first keynote speaker, US archivist David Ferriero, took on the complicated issue of social media. – by Inge Angevaare

David Ferriero: “I expect my staff to anticipate what technology government is going to use.”

Ferriero had no doubts at all about whether archives should involve themselves with social media. Government officials have embraced the new technologies, as witnessed, a.o., by the Obama Administration, and thus, quite a few social media messages will become public records. In fact, Ferriero’s office issued guidance on when and why social media content would become public records:

  • “Is the information unique and not available anywhere else?
  • Does it contain evidence of an agency’s policies, business, mission, etc.?
  • Is this tool being used in relation to the agency’s work?
  • Is use of the tool authorized by the agency?
  • Is there a business need for the information?

If the answers to any of the above questions are yes, then the content is likely to be a Federal record.”

Ferriero indicated that 900 (!!) staff within his organization are involved in social media projects, but did not go into details as to what these projects are.

Some of the 1,000 delegates from 95 countries watching an aborigine dance during the opening ceremony.

Bungala (

I could not help but think back to an interview I did some years ago with the Dutch national archivist. As late as 2009 he argued that websites were publications, and belonged in a library rather than an archive. Times are indeed changing!

ICA President Martin Berendse: “A climate of change — and I am glad of it.”

Later in the day, Günther Schefbeck of the Austrian parliamentary administration took a closer look at just what these social media are. His analysis of our newly evolved “network society” and the dynamics that are at play there may have been a bit theoretical for some of the audience, but his conclusion was unmistakable: content on social media is totally dependent upon but a few large “hubs” (Facebook, Twitter, etc.) If any of those goes out of business, all of the content is lost with them.

Günther Schefbeck referred to the EU ERCOMEM project which is to address some of the social media issues.

Is that something archives should worry about? Schefbeck readily admits that there is a lot of “Bored” and “Me too” content out there that really does not merit archiving. But, increasingly, important societal debates take place on social media which influence public decision-making. As such, Schefbeck argues that they should become part of the public record. But who is responsible for archiving such content? Archives and libraries keep discussing the issue amongst themselves, Schefbeck stated, but so far, nothing happens. Schefbeck: “We must discuss this issue seriously, and we must discuss it soon.” Because if we wait too long, the content may be lost forever.

A captive audience for Günther Schefbeck

Now this is just the type of debate that my own organization, the inter-sectoral Netherlands Coalition for Digital Preservation, might facilitate. But in my experience the likely custodians of social media content are reticent to get involved. And who could blame them. Shefbeck listed the many technical and legal obstacles that need to be solved (check out his paper, link to follow soon).

However, when you look at preservation from the point of risk, this is the type of content we should be putting our money to. Most of the paper records we are presently digitizing will easily keep for a few more decades. But then again, the issue of social media is so much more complicated.

That’s all for now. There is, as usual, more to tell, but that must wait. To be continued!

Brisbane Central Business District