Skip to content

Theory and practice of digital preservation education – #iPRES2012 (1)


Published: Mon 01 Oct 2012

The most, what shall I say, dedicated digital preservation conference, iPRES2012, has come to Toronto, Canada. Five days of workshops, tutorials, presentations and panels – we have our work cut out for us. Let me reiterate how much I hate parallel sessions, because I have to forego the one about practical emulation tools to attend one about an equally important subject: digital preservation education (website includes all of the slides). Let’s just hope the OPF and or KEEP colleagues will blog about the emulation workshop. I’ll concentrate on education - by Inge Angevaare

Some of the workshop experts, from the left Helen Tibbo (U. of North Carolina), Joy Davidson (UK Digital Curation Centre), Cal Lee (U. North Carolina), Neil Grindley (UK JISC)

Unsurprisingly, there is a big need for digital preservation/digital curation training …

I need only remind you of one of Liz Lyon’s slides from the May LIBER curation workshop:

George Coulbourne of the Library of Congress (LoC) presented equally distressing data from a US survey:

… but your needs are not mine, and finding the right course if you want to learn something is not easy …

Joy Davidson (UK Digital Curation Centre or DCC): “There is a huge body of courses and trainings out there, but there is little guidance for prospective participants for making informed choices on where to go.” [As for the "huge body", I am not sure every country has the same level of offerings as the US and the UK, but that is another story.]

so there is a lot of work being done on structuring the marketplace, both the needs for training and the supply

A first, very useful and simple structure came from George Coulbourne of the US Digital Preservation Outreach and Education program (DPOE, pronounce “depot”, or Dutch: diepoo):

The DPOE pyramid

This one is pretty obvious. At the managerial level you need different bodies of knowledge/competences than at the practical level.

Other modelling efforts go a lot deeper. The intention is, of course, to define the needs as precisely as possible in order to design training that meets those needs. The UK Digital Curation Centre’s target audience is the sciences and the humanities. In the UK, “Vitae” has designed a “Researcher Development Framework” (or RDF – not the handiest of acronyms for those involved in data management …) listing all the competences and skills a researcher must have. That is a good starting point for defining researchers learning needs.

From another angle, the DCC’s Andrew McHugh highlighted “PORRO”, a Risk Relationship Ontology for digital preservation. That sounds (and is) a lot more complicated. PORRO is derived from DRAMBORA, a more well-known tool for risk management. From what I understand, PORRO is about defining your goals, defining the threats to those goals, and defining the measures which you can take to mitigate those risks. Joy Davidson explained why this tool was presented at the workshop: because it is important for the education community to be aware of the work going on in the audit community. And the tool can also help to define training needs. (I haven’s found a url for this work, so for more information, send an email to andrew.mchugh@glasgow.ac.uk).

Among the workshop attendees were Erik Moore (U. of New Brunswick), Barbara Sierman (Dutch National Library), Annemieke de Jong (Dutch Sound & Vision archive) and Jill Sexton (U. of North Carolina Library).

Joy Davidson presented the DCC’s work on criteria for Researcher Information Literacy Skills. She explained: “These are things to think about when you start developing a course”. And, of course, to evaluate courses. And the DCC’s lifecycle model was included in the workshop as a teaching tool.

The European DigCurV project is also working on a structure to define educational needs, this time at the vocational rather than the academic level (click on image to enlarge) (see website for more details):

Before you get tired of all this theory, I might mention that Joy Davidson stressed that all of these models are, in fact, feeding into each other, which no doubt is the result of lots of international networking, e.g., at the 2011 ANADP conference (see my blog post from the educational panel).

What about hands-on experience? 

“There is far too little of it,” was George Coulbourne’s simple summary. Coulbourne leads the US Library of Congress’s DPOE Program which also includes a “National Digital Stewardship Residency” initiative which is now in its beta phase.

Coulbourne: “Graduating students typically lack hands-on experience and the ability to work with others.”

As we all know how trainees are often simply dropped in organizations to fend for themselves, with little learned at the end of their internship, it was refreshing to hear of the NDSR approach: a two-week-long “immersion workshop” for post-graduates followed by “nine months of residency with well-defined, comprehensive hands-on projects with concrete deliverables.” In order to motivate the host institutions, they are involved in the very serious selection process. Good stuff, ambitious too.

The Library of Congress DPOE program also has a calendar of mostly practical courses which is very informative. In North Carolina, another platform for sharing curriculum materials and information has been developed, the Digital Curation Exchange (DCE). Someone suggested that all of this information should be brought together, but perhaps that is too much to ask. Maintaining a structured registry would probably be too large a task for any single institution to take on. Cal Lee of the University of North Carolina advocated rather a spontaneous approach to such tools.

So, where is the rest of the world?

All of the above came from the US and the UK. Both are very active in digital curation education, but in a next workshop I would also like to hear what is happening in other countries, like Germany (nestor?), France, etc. As for the Netherlands, I was very happy with this workshop because it offered inspiration for things we might do in the Netherlands.

Shigeo Sugimoto: “Our culture is very different.”

When questioned about the situation in Japan, Shigeo Sugimoto reported that there are not many specialists in Japan. “My first job”, he said, “is to create awareness”. Sugimoto also suggested that digital curation for the sciences and humanities is quite different from digital preservation for (cultural) heritage institutions. Perhaps the curricula should be different also.

At the end of the day …

All of the above is very encouraging, as was emphasized by Neil Grindley (JISC) in his summing up. However, I did learn that there are two time bombs ticking underneath all of this wonderful work. One is budget cuts, everybody is feeling them. The other is the pace at which the world of digital preservation is changing. Curriculum development can hardly keep up. Obsolescence of the teaching tools themselves is just around the corner.

Other blog reports from iPRES 2012:

Sarah Jones on the same education workshop at http://www.dcc.ac.uk/blog/dcc-training-workshop-ipres-2012

David Anderson of the KEEP emulation project chastizing yours truly during lunch for not attending the emulation workshop …

Toronto from the CN Tower 450 meter above sea level – going up the tower seemed like the most efficient way to see the “whole city” in a couple of hours leisure time on Sunday

Comparing digital preservation approaches, part 3, conclusion – #ICA_2012 (7C)


Published: Mon 03 Sep 2012

Following up on my first and second posts on different digital preservation approaches among Australasian and the UK archives, let’s see if some general conclusions can be drawn with regard to archives and digital preservation. - by Inge Angevaare

Our host David Fricker (NAA): “Everything will be turned on its head.”

Jan Hutar (now of New Zealand Archives) said in his presentation that the archives seem to be slower to take up digital preservation issues than the libraries. I think that is quite understandable: for libraries digital content has been a reality for some time now, whereas most archives are still living in an analogue day-to-day reality because of legal transfer periods (typically 20 years or more). For most archives, the real digital storm is still to come. But when it comes, it will be a big storm. Cassie Findlay (NSW) explained that no two agencies are the same and there is a huge variety among their systems and approaches. Jan Hutar emphasized that Archives New Zealand expects quite a few databases, and these, as yet, present more questions than answers. To archive and provide (authentic, reliable) access to all of that is truly a challenge.

The vision

Most speakers at ICA 2012 emphasized that access is what archives are all about. The most succinct summary of the digital playing field and the business case for archives came from our host, David Fricker of the National Archives of Australia:

NAA’s David Fricker’s summary

Reality

Reality, obviously, lags behind this vision. In most countries, there is no legal basis whatsoever for archives’ wish to comply with users’ demands for instantaneous access. The NAA’s digital archive itself is far from being equipped to provide access “online, anywhere”. Also, as Simon Foudre (South Australia) put it: “Technology that helps generate material will continue to outpace the technology needed to preserve it.” Embedding preservation and archiving in the source systems is, of course, the most logical answer to the challenges, and it gets right to the root of the problem. But can it be done? Can the archives muster enough understanding of the agencies’ workflows, enough manpower, technology, skills and expertise to really become a trusted partner for agencies in the entire information lifecycle? Tricky issues lie ahead, such as distributed custody, custody over long-term temporary records, etc.

I can see the larger archives taking this on, but what about the small ones, the ones in developing countries? There was a wonderful presentation by Fabiao Nhatsave and Arlanza Dias from Mozambique. The archives in Mozambique are just starting out. In terms of leveraging their available resources they did what they could: train 11,600 (!) records managers. But, obviously, they have a long way to go and with much less funding than Western archives.

Mozambique: training 11,600 records managers …

Collaboration between smaller archives (and perhaps even with libraries, as in New Zealand) seems a logical way out of the money predicament. But only one speaker expressly addressed this issue, Robert Kretzschmar of Baden-Württemberg:

Robert Kretzschmar on local archives

I know of a plan for a shared services centre for smaller archives in the Netherlands, which was presented in 2010, but which, alas, still awaits funding. Money is tight at the moment, and there are three tiers of government involved …

Some notes on different approaches to DP issues

Having painted the larger picture, here are some notes on specific aspects of digital preservation and the choices made by various institutions:

To normalize or not to normalize: National Archives Australia (NAA) and the Public Records Office Victoria (PROV) say yes to normalization, because you do not want to get corrupted files into your system and discover only later that they are. Plus, it minimizes the workload. Archives New Zealand: no normalization, but some “pre-conditioning”. The UK National Archives (UK NA) do not normalize, they say that file formats are standardizing themselves. Also, only 25% of the records will ever be used, so it can be a substantial waste of effort. Plus: 85% of what, e.g., the 2012 Olympics will generate, consists of MS Office formats.

To develop your own systems or buy a vendor’s solution: Michael Carden of NAA emphasized that he regards digital preservation core business. Thus NAA develops and maintains all of its own systems, as do PROV and New South Wales (NSW). I myself wonder where this leaves the small archives. Can they ever muster enough resources to build their own systems? Archives New Zealand works with a commercial system (Rosetta by Ex Libris), but “that decision was forced upon us” (Fleming). The Tessella Safety Deposit Box is used by a number of archives (a.o., NA Netherlands, Municipal Archives Rotterdam).

Ex Libris stand at ICA 2012: “working to empower your library” … – faux-pas or visionary?

To keep content and metadata together, or keep them separate: PROV, NSW: together, to avoid the risk of broken linkages; NZ Archives: separate but safeguards for broken linkages.

To quarantine or not to quarantine: NAA quarantine typically is 28 days; PROV quarantine is 3 months. Archives New Zealand: continuous virus checking, because some viruses crop up later than 3 months.

When is custody completed? PROV waits until the entire ingest process (including quarantine and two virus checks) is completed before sending the agency an explicit “custody report”. Archives New Zealand still has to arrange for something like this.

On security: NAA is pretty much a dark archive, which is very secure. PROV provides access immediately, and it has implemented 4 levels of security (firewall, etc.) within the system.

Physical and digital records integrated? Yes, says PROV. For the user a record is a record.

Online transfer or through physical media? PROV has an online inbox, but that is hardly ever used. Typically, records are delivered on physical media. So far, of course …

Preservation strategies: Jan Hutar (Archives New Zealand): “We consider both migration and emulation as viable strategies. Migration has to be done again and again. Emulation is more complicated.” Other archives report that they have not yet performed preservation actions (other than normalization at ingest).

A last note

My reporting on ICA 2012 has been nowhere near complete, of course. I have singled out the presentations that had something to do with digital preservation, in the context of the archival function in the digital age. There was plenty to learn, I am happy to say (from Taipei Airport, where the wifi is good; after some delays I am finally on my way home).

The next ICA congress will be ICA_2016 in Seoul, Korea. The theme: “archives, harmony & friendship”. I wonder how that is going to translate into the digital world …

Tangalooma, 50 years after the whaling station was closed

 

 

Comparing digital preservation approaches, continued – #ICA_2012 (7B)


Published: Sat 01 Sep 2012

Following on from my previous post from ICA_2012, on Friday the Australasian Digital Recordkeeping Initiative (ADRI) organized a truly inspiring workshop. Ably led by Simon Foudre of Southern Australia (the only state in Australia that does not yet have a digital archive for public records … [-- corrected by Andrew Wilson in comments below: PROV and NSW are the only states with a digital archive]), the workshop showcased four different approaches to digital preservation: the National Archives of Australia (NAA), New South Wales (NSW), Victoria (PROV) and Archives New Zealand. After an introduction to each of these initiatives, the 30 workshop attendees were invited to do a SWOT analysis of all four approaches. - by Inge Angevaare

Simon Foudre leading the ADRI workshop

What impressed me most about the workshop was the honesty and willingness to share and learn of each of the initiatives involved. What surprised me a bit, was that in a small country like Australia (I mean, there are only 22 million inhabitants), the states would have different approaches. When I asked about that, I was told that state autonomy is quite a big thing in Australia. Also, digital preservation is probably too young a discipline to put all of our eggs in one basket. We need room to experiment. And the States do exchange information, e.g., in ADRI, to learn from each other. Andrew Waugh (PROV, Victoria): “OAIS is great … for two months. Once you start designing a system, its usefulness declines fast. Connecting with existing systems is something OAIS does not provide for. Then you should talk to other archives.” (See also Waugh’s comments during the plenary session in previous post).

Each table was asked to discuss the strengths and weaknesses of a particular system.

On SWOT analyses

At the risk of stating the obvious: SWOT analyses are rarely black-and-white judgements. Any strength can be a weakness as well.  Andrew Waugh: “Our greatest strength is that we have an operational digital archive [since 2005]. Our greatest weakness is that we have an operational digital archive, because it means that the urgency for granting funding for much-needed improvements is much less.” Similarly, the NAA’s system is very secure, because there is hardly any access – but if you make access your main goal, it turns into a weakness (see previous post). Some of the weaknesses identified, especially of the older systems (NAA, PROV) have been recognized by those that manage the systems, but require additional funding. As Simon Foudre emphasized, we live in an era of “heightened fiscal prudence” (some understatement) – which are aggravated by increased user expectations for instant access and fast technological changes.

I described the approaches of NAA and PROV in some detail in my previous post, so here is some more information about the other two participants: New South Wales and the New Zealand Archives.

Taking proactivity to a whole new level: State Records New South Wales

Cassie Findlay of State Records New South Wales (NSW) agreed with many of the principles outlined by Hofman of the Dutch Archives (earlier post): “State Records and NSW government agencies need to work together to tackle digital archives challenges.” Records are a by-product of the agencies’ core business, and therefore the archives have to offer whatever they can in terms of tools and advice to make it easier for agencies to manage and transfer their records. They’ve got to get involved at the creation phase.

Cassie Findlay: “Systems migrations is what agencies know.”

State Records NSW has decided not to accession individual records, but to migrate entire records keeping systems forward. For each agency, a migration project is set up which is specifically attuned to the agency’s requirements. “Each project will have different requirements, but all will involve analysis of recordkeeping system structures, metadata; preservation issues; possible migration protocols; indexing points/potential uses.” The options for storing the recordkeeping systems are:

  • as objects with metadata in text files
  • XML DB normalisation
  • RDF/XML DB normalisation
  • Migration to non-proprietary dbase, e.g., SQLITE
  • … probably combinations of these.

NSW also plans to make available a digital archives dashboard, with all sorts of tools and resources: a register of migration pathways, a register of metadata terms, file format identification tools, metadata mapping tools, forms and templates and a knowledge base. This dashboard will be based on best practices, and will grow over time. Also, the approach is open to distributed custody of records. More info on the project’s blog.

Discussions continue during lunch break. Is the NSW approach sustainable? Only time will tell.

Workshop attendees agreed that the approach takes digital archiving to a whole new level. They saw the flexibility of the approach as a great strength, and the development of entirely new relationships between agencies and archives as a great way to prove the value of archives. However, the approach is resource intensive, it requires skilled staff, and it does not (yet) nail the management of source records.

Sharing a system with the national library: Archives New Zealand

Alison Fleming of Archives New Zealand told the workshop about the “interesting journey” which has been the result of a May 2010 government decision that the core of the Australian National Library’s Rosetta system must be shared with the national archives.

Alison Fleming pointing to the part of the system (circle) that is shared with the National Library.

Some of the key principles of the Archives New Zealand approach are:

  • share central digital preservation system with National Library
  • describe digital archives at item level, not just aggregates
  • integration of digital and non-digital processes, automation
  • support for agencies is critical
  • “file formats we accept” depend on circumstances
  • digital repository will hold digitised material too
  • “We will not get it right first time, or even for a long time”
  • collaboration is critical.
Fleming reported that “libraries and archives do speak different dialects of the same language.” “This has led to robust discussions about occasions where the library and the archives make different choices.” In my own country, the Dutch National Archives and National Library are merging in 2013, which also was a political decision. A merger of the digital repositories has not yet been decided upon; perhaps they should talk to New Zealand first?

 

In a separate, more technical presentation about preservation strategies, Jan Hutar (who moved from the Czech Republic to New Zealand) summarised the “Rendering Matters” report recently published by the Archives:

Jan Hutar, now of Archives New Zealand

On the subject of preservation strategies, Hutar concluded:

In the workshop SWOT analysis, the use of a commercial system (Rosetta) was identified as a weakness rather than a strength; there were doubts whether the archives could get their information out if they wanted to make major changes. Interestingly, budget cuts were not only identified as a threat; they were also seen as an opportunity to get involved in actual collaboration rather than “tick-the-box collaboration”.

Noting strengths, weaknesses, opportunities and threats

So much for now; I will write a third post with some (tentative) conclusions. First I have some hassles to clear up. I am still down under. My flight was cancelled; a delay of at least 48 hours. So much for trying to save money …

On a positive note: yes, I did get to see some active whales and at least two beautiful jumps. Taking pictures was more of a challenge. The seas were rough, I needed both hands on the railing most of the time just to stay on the boat. At least I was one of the lucky ones who did not get sick. Interestingly, whale watching now brings in more money to the area than whale catching and slaughtering did in the 1950s and early 1960s. That’s the way.

 

Comparing digital preservation approaches – #ICA_2012 (7A)


Published: Tue 28 Aug 2012

For many years, Australia and New Zealand have been at the forefront of developments in digital preservation. As digital preservation is core business for all archives, ICA 2012 (International Council on Archives, Brisbane) presented an excellent opportunity to compare notes on different approaches. In what follows (and the next post) I shall be mixing and mashing notes from three plenary and breakout sessions with those from a full-day’s dedicated digital preservation workshop on Friday during which the attendees did a SWOT analysis of different Australasian approaches to digital preservation. - by Inge Angevaare

A full house to hear Michael Carden and Andrew Waugh speak about the “digital continuity” experiences of the Australian National Archives and Victoria Public Records Office respectively.

Setting the scene: the Australian National Archives

Michael Carden of the Australian National Archives (NAA) expertly set the scene for the discussion of Australasian digital preservation approaches, taking his cue from the 2002 Green Paper which defined the concept of “performance” as the “thing” to be preserved. NAA have chosen to “normalize” all incoming file formats to standards-based open file formats and developed the Xena software to do this. The so-called Digital Preservation Recorder (DPR) manages this process and leaves an audit trail – all details to be found in Carden’s informative full paper. Carden asserted that digital preservation is “core business” for archives, and therefore not something to be outsourced. All of the technology is developed in-house and made available to the community in open source, “which gives transparency to the process; others can check us.”

Carden was very confident in the NAA’s ability to deliver authentic and reliable records. He mentioned scaling as the major challenge ahead. During the workshop, another possible weakness was mentioned: the NAA is a preservation-only system. Very few people have access. This makes the system quite secure, of course. I also learned that it was a matter of funding: funding was only provided for a preservation system. So far that is not a problem, because the records in the system (from government agencies that have been decommissioned and some private records) are not yet eligible for access (due to the legal transfer period 20 years, which equals the embargo period on access). But, eventually, more funding will have to be secured to build an access module. Also, the original NAA design deals only with records that have been legally transferred to the archive. As we shall see below, thinking has meanwhile evolved into a more proactive direction.

Michael Carden (right) and Andrew Waugh (middle) discussing their approaches with Hans Hofman of the Dutch Archives

Victoria Public Records Office: “The challenge is to get the records”

Andrew Waugh of the Victoria Public Records Office (PROV) readily admitted that his institution’s experience “stands on the shoulders of giants”, referring, a.o., to the National Archives. But the PROV system was designed to provide access right away, in compliance with the legislation of the State of Victoria. PROV, like the NAA, normalizes the records it receives, while keeping a copy of the original. However, access is preserved only to the preservation format copies for the indefinite future – “if rendering engines are not available, we will either build our own or migrate the long-term preservation formats.” Adding that “We do not consider that this will occur for many years,” echoing what David Rosenthal from LOCKSS has been arguing for some time in the library community.

PROV is built on the basis of commercially available, but fully compliant, components (Centera). It seeks to integrate physical and digital records into one system, as “a record is a record”. According to Waugh, one of the major flaws of the OAIS Reference Model is that it looks at digital records in isolation and does not make room for existing systems. The PROV system keeps digital objects and metadata together for fear of linkages being broken.

Andrew Waugh presenting the Victoria digital archive during the workshop.

Waugh drew special attention to the challenge of “getting the records”. This involves making sure that digital records are created and that they are transferred. “Currently,” Waugh asserted, “it is not economic for agencies to deal with archiving.” “We must work with agencies to make it less expensive to transfer records, and to transfer early. This includes dealing with non-mature (temporary) records.” This is a change from the traditional position of archives to deal only with mature records, and it is a change in attitude that was echoed by other speakers (including Hans Hofman). (Sorry, no full paper available for Waugh.)

UK National Archives: “Every decision should be based on usage”

Oliver Morley of the UK National Archives (no full paper available) disagreed with his Australian colleagues about file format normalization. He argued that “digital formats have standardized” by themselves: 99% of the records received have top-20 file formats, and most of those are regular Office formats. Besides, Morley argued, only 25% of what archives keep will ever be accessed, so normalizing everything can be a waste of effort.

Oliver Morley: “Social media still present a problem. But do not panic. Most content is not what you would collect.”

Another non-problem for Morley are web archives. “The challenge is not quite as big as I thought,” he said. “Public sector agencies typically produce small websites, and what is published is captured automatically.” Morley does acknowledge that e-mail is a problem. “We have no solution for e-mail yet. There are huge selection problems.”

Here are the principal features of the UK NA’s preservation policy:

So much for now; more (especially on the workshop) in the next post, scheduled for a few days from now. I cannot resist the temptation to go out and (try to) see some of the whales that are migrating past Australia’s east coast during the Winter months.

A local tells me that until about 20 years ago, the church and the town hall were the highest buildings in Brisbane …

… I have no way to verify this, but the official guide book of the city proudly declares that the oldest surviving home in Brisbane dates from 1846 — which is perhaps why the visitor’s bureau advertises Brisbane as “Australia’s new world city”

“Change”, but how? The archival function in the digital age – #ICA_2012 (5)


Published: Thu 23 Aug 2012

At the ICA 2012 congress, Hans Hofman of the Dutch National Archives took a step back from practical issues in order to look more fundamentally at the core of “the archival function” in present-day society. What exactly has the digital age brought about in society at large, and what are the implications for the role the archives can and must play? - by Inge Angevaare

Hofman preparing for his speech

To my mind, we don’t do this enough: stepping back from “life as it has always been” and from day-to-day practicalities to look at the essence of what we are doing and why, so this is a welcome effort by the Dutch National Archives (for details, see Hofman’s full paper).

Here are the key changes in our society according to Hofman:

The digital revolution

As a consequence, key issues facing society include:

  • Erosion of citizen trust in government
  • Erosion of government services
  • Loss of corporate/national memory
  • Loss of individual identity
  • Threat to individual rights.

These are the questions that archives must ask themselves:

Questions to be asked

Pondering these questions, the Dutch National Archives have formulated the following guidelines for the changes they must make. The framework has been adopted only recently, Hofman stressed, and still awaits implementation.  But it contains some very valuable principles, which I gladly reproduce in full:

Guiding principles for the archival function in the digital age

  1. The business process is leading, information management or record keeping is supportive and should be integrated.
  2. The National Archives should be involved right from the beginning (e.g., the planning and design of business processes, underlying (information) architectures and supporting systems).
  3. Focus on all information (or records) of government; comprehensive approach both in scope and in time; not only on archival records (This includes the need to be aware that different types of information exist and that each of them may have a different value to different audiences).
  4. All connections/linkages of a record (concept of a record) should be accounted for.
  5. A systematic approach to record keeping should be used based on a risk management approach (At which moments are records most vulnerable?)
  6. Government information is public and freely available, unless … (This entails that the National Archives aims to ensure the right to information for the public ["the public interest], including both government and private sector information, while respecting privacy rights, freedom of information, etc.

In other words: the “archival function” comprises the entire lifecycle of all government records, and there are more players in this lifecycle than archives. This is a fundamental change from the old situation, in which archives would only be involved in those records which are intended to be kept.

Lots of food for thought, I should think. The only thing I would add from the perspective of the National Coalition is an appeal to look beyond the borders of the archival sector at the entire information space. As described earlier, issues such as the preservation of the web and social media may require a broader approach which includes libraries, data centers and other stakeholders, nationally and internationally.

The Dutch delegation includes, a.o., Jantje Steenhuis and Mies Langelaar (Rotterdam Municipal Archives), Kees Schabbing (North-Holland Province) and (behind them, at right) Reinder van der Heide (Dutch Parliament)

 

Seamless integration between source systems and digital archives: Can we have it? Do we need it? – #ICA_2012 (4)


Published: Wed 22 Aug 2012

To my mind, conferences are at their best when they manage to bring together different viewpoints on sticky issues within a single session. The ICA_2012 congress managed to do just that on Tuesday afternoon, when Estonian archivist Kuldar Aas and Tessella’s Robert Sharpe reflected on what – I assume – is the ultimate digital archivist’s dream: seamless integration or transition of records from government agencies to archives. - by Inge Angevaare

Kuldar Aas presenting; at left Rob Sharpe and Master of Ceremonies Steve Stuckey

First, let’s look at the problem. Records which are to be archived inevitably come from many sources with different systems and thus they are likely to have all sorts of different metadata schemas which, inevitably, do not match the archival description formats used by digital archives. Standards are designed to help solve these problems, but, as all of us know, they are typically not complied with. Aas: “Standards are only being used as inspiration for tenders.”

Kuldar Aas: “It’s feasible, in about 5 years”

To deal with the issue, the Estonian National Archives designed a software tool, the Universal Archiving Module or UAM. This tool is designed to streamline the ingest process (more details in Aas’s full paper on the ICA website):

UAM software tool

The tool has now been in use for a few years, and here is some feedback from the agencies:

Feedback from the agencies

Interestingly, the agencies report that the tool has forced them to get their records management better organised. Now, that’s music in any archivist’s ears. On the negative side, they regret losing flexibility, which is inherent in any standardization process. And implementation is still very time-consuming.

UAM: lessons learned

Aas concluded that “seamless integration” is not yet possible at the present time, but he expects it to become possible in, say, five years.

Robert Sharpe: “We don’t really need it.”

Robert Sharpe represents Tessella, a vendor of digital preservation systems (Safety Deposit Box). He agreed with the problem, but challenged Aas’s solution, arguing that the combined schema will change over time, thus requiring further conversions or necessitating the system to work with multiple versions. Also, Sharpe argued, every conversion carries the risk of data loss. Alternatively, Tessella designed a system that can work with multiple metadata schemas (full details in Sharpe’s full paper)

Rob Sharpe: “Why bother?”

Here’s Tessella’s alternative:

Tessella’s alternative approach

And here are the advantages, according to Sharpe:

Advantages of the Tessella approach

Now, I am in no position to tell which approach is “better” (if such can be determined at all, at this stage), but there is one thing about Sharpe’s approach that appealed to me very much: the fact that it reduces barriers to ingest, that it allows organizations to get stuff into their systems without much ado. All too often valuable data remain “on the other side of the wall”, because ingest is too problematic. In this way, at least, the data gets into a system where it is protected and backed up. Extra metadata can always be added at a later stage. I am reminded of the social media debate (yesterday’s post): because it is complicated, nothing is done at the moment, and that is certainly the worst of options.

On the other hand: how does this compare with the adagium “garbage in, garbage out”, in other words: “What about access?” Sharpe: “Nowadays metadata are not the only way to search content, there are such facilities as full-text search. Besides, if you make use of the original metadata, you might even be able to get in deeper.”

“Might this approach be too simple for complicated data such as census data?”, someone asked. “That is quite possible,” Sharpe allowed.

Entr’acte: archivist, diplomat’s wife, spy, novelist: the story of Dame Stella

On another note altogether, there was a charming presentation by Dame Stella Rimington. She started out her career as an archivist, then became a diplomat’s wife in Delhi (mostly hosting tea parties), eventually to be recruited by MI5, the British Secret Service, where she ended her career as the first female Director. Since then, she has written seven novels …

Dame Stella discussing secrecy and freedom of information with ICA President (and Dutch National Archivist) Martin Berendse

Dame Stella described the Cold War years when everything revolved around secrecy. She indicated that the emergence of terrorism, and the consequent need to share information, in large part contributed to the present demand for more openness. However, she strongly condemned such initiatives as Wikileaks, which “indiscriminately” leak information, putting “live sources” (now there’s a spy-word!) at risk. She warned that the risk of leaks will cause government agencies to make decisions without leaving any paper trail, which is quite the opposite of what movements like Wikileaks profess to strive for.

She concluded by saying that since her own days as an archivist (when her main worry was to prevent parchment records from being turned into fashionable lamp shades), the life of an archivist has become more complicated, “and thus more interesting.” Now that’s the spirit!

Social media debate testimony to “climate of change” for archives – #ICA_2012 (3)


Published: Tue 21 Aug 2012

This year’s Congress of the International Council on Archives (or ICA_2012, Brisbane) is entitled “A climate of change”. That is quite a broad motto, and it could easily just be a cliché without much meaning. So I was more than pleasantly surprised when the first keynote speaker, US archivist David Ferriero, took on the complicated issue of social media. – by Inge Angevaare

David Ferriero: “I expect my staff to anticipate what technology government is going to use.”

Ferriero had no doubts at all about whether archives should involve themselves with social media. Government officials have embraced the new technologies, as witnessed, a.o., by the Obama Administration, and thus, quite a few social media messages will become public records. In fact, Ferriero’s office issued guidance on when and why social media content would become public records:

  • “Is the information unique and not available anywhere else?
  • Does it contain evidence of an agency’s policies, business, mission, etc.?
  • Is this tool being used in relation to the agency’s work?
  • Is use of the tool authorized by the agency?
  • Is there a business need for the information?

If the answers to any of the above questions are yes, then the content is likely to be a Federal record.”

Ferriero indicated that 900 (!!) staff within his organization are involved in social media projects, but did not go into details as to what these projects are.

Some of the 1,000 delegates from 95 countries watching an aborigine dance during the opening ceremony.

Bungala (http://www.bungala.com.au/)

I could not help but think back to an interview I did some years ago with the Dutch national archivist. As late as 2009 he argued that websites were publications, and belonged in a library rather than an archive. Times are indeed changing!

ICA President Martin Berendse: “A climate of change — and I am glad of it.”

Later in the day, Günther Schefbeck of the Austrian parliamentary administration took a closer look at just what these social media are. His analysis of our newly evolved “network society” and the dynamics that are at play there may have been a bit theoretical for some of the audience, but his conclusion was unmistakable: content on social media is totally dependent upon but a few large “hubs” (Facebook, Twitter, etc.) If any of those goes out of business, all of the content is lost with them.

Günther Schefbeck referred to the EU ERCOMEM project which is to address some of the social media issues.

Is that something archives should worry about? Schefbeck readily admits that there is a lot of “Bored” and “Me too” content out there that really does not merit archiving. But, increasingly, important societal debates take place on social media which influence public decision-making. As such, Schefbeck argues that they should become part of the public record. But who is responsible for archiving such content? Archives and libraries keep discussing the issue amongst themselves, Schefbeck stated, but so far, nothing happens. Schefbeck: “We must discuss this issue seriously, and we must discuss it soon.” Because if we wait too long, the content may be lost forever.

A captive audience for Günther Schefbeck

Now this is just the type of debate that my own organization, the inter-sectoral Netherlands Coalition for Digital Preservation, might facilitate. But in my experience the likely custodians of social media content are reticent to get involved. And who could blame them. Shefbeck listed the many technical and legal obstacles that need to be solved (check out his paper, link to follow soon).

However, when you look at preservation from the point of risk, this is the type of content we should be putting our money to. Most of the paper records we are presently digitizing will easily keep for a few more decades. But then again, the issue of social media is so much more complicated.

That’s all for now. There is, as usual, more to tell, but that must wait. To be continued!

Brisbane Central Business District

 

DNA to solve all our storage problems? – #ICA_2012 (2)


Published: Sat 18 Aug 2012

I am still travelling between Amsterdam and Brisbane for the #ICA_2012 conference. Because I tried to save money this trip is taking forever. Between Bangkok (3 hour stopover) and Taipei (10 hour stopover), I stumbled upon this article in the Wall Street Journal (Vol. XXX No. 142, August 17-19, 2012, p. 8):

Wall Street Journal article on DNA & digital storage

For a moment I thought of science fiction, or of delusions caused by sleep deprivation, but, no, this is serious business. I quote from the article: “In the latest attempt to corral society’s growing quantities of digital data, Harvard University researchers encoded an entire book into the genetic molecules of DNA, the basic building block of life, and then accurately read back the text. … In that form, a billion copies of the book could fit in a test tube and, under normal circumstances, last for centuries, the researchers said.”

WSJ writer Robert Lee Hotz reports that the “unconventional exercise – one that is a ways from being commercially viable – highlights the potential of DNA as a stable, long-term archive for ordinary information, such as photographs, books, financial records, medical files and videos.”

Wow. The entire holdings of the soon to be merged Dutch National Library and National Archives in one test tube.

“It is a very simple way to store information,” said bioengineer Sriram Kosuri at the Wyss Institute for Biologically Inspired Engineering at Harvard.

The article concludes on a cautionary note: “For the foreseeable future, however, the DNA book is expensive and time-consuming to reed. It requires a series of laboratory procedures, microarray chips and a high-speed gene-sequencing machine to assemble the strands in the proper order.”

Ah, now there is a game we are all too familiar with: storage itself is the easy bit, getting the information in and out of the systems in a way that people can understand (remember OAIS: “independently understandable”) and that society can afford in terms of costs, is the hard work.

I guess I won’t take the next flight back to Amsterdam but patiently await the plane that will take me to Brisbane where all the experts can mull over exactly those questions.

I guess this was not yet to be (Taipei airport)

A sizeable “cloud” here in Taipei …

Posted by Inge Angevaare

Back to business – and off to Brisbane (#ICA_2012)


Published: Thu 16 Aug 2012

Judging by my e-mail inbox, most of us are back from our holidays and getting ready to get back to business – the digital preservation business that is. Remember what that is all about? Here is a pictorial summary which I recorded at the recent LIBER conference in Tartu:

Here’s Yvonne Friese of ZBW, the German National Library of Economics, proudly presenting her organization’s newly developed automatic ingest procedure …

… and here is Yvonne using her 1-minute poster pitch to express her emotions about all those non-valid PDF-files that keep interfering with the process.

Can anybody please help Yvonne?

Brisbane: ICA 2012 conference

Meanwhile I am mentally preparing myself for the long, long trip to Brisbane, Australia, for the 2012 conference of the International Council for Archives (hash tag: #ICA_2012; the underscore is vital). After reporting on two library conferences this year (LIBER DP workshop + six preceding posts, LIBER annual conference + preceding posts), it is about time to give the archives their due attention. How are the national and regional public records archives dealing with the digital preservation challenge? Find out all about it next week right here.

 

(BTW: Thanks, Yvonne, for allowing me to use the images. Here is the full ZBW poster:)