You are here: University of Vienna PHAIDRA Detail o:294257
Title
From the World Wide Web to Digital Library Stacks: Preserving the French Web Archives
Subtitle (en)
Paper - iPRES 2011 - Singapore
Language
English
Description (en)
The National Library of France is mandated by French law to collect and preserve the French Internet. It is now a 10-year old project with collections ranging from 1996 to the present. To ensure their long-term preservation, the choice has been made to ingest these web archives into the institution’s existing digital preservation repository, SPAR (Scalable Preservation and Archiving Repository). There were numerous implementation challenges, on the modeling as well as the technical sides, which the library met with solutions drawn from international collaboration and widely adopted standards, whenever possible. – Web archive-specific formats (W/ARC files) lacked validation and characterization tools, which led to the development of a Jhove2 module for the ARC format. – The heterogeneity of BnF’s web archives in terms of formats, production workflows and tools, was managed by aligning all of them on a single model, the current production workflow using NetarchiveSuite. – The specificities of web archives were matched to the PREMIS data model and dictionary and SPAR’s global METS profile. – Finally, the need to express technical information about ARC files in a concise, manageable fashion led us to define a format-specific metadata scheme for container files, containerMD, which will be released to the preservation community (on BnF’s website). All this development work means new services for digital curators in general and preservation experts in particular. They will be able to know their collection better, to check its comprehensiveness, and, with that deeper understanding, to investigate new preservation strategies. Allowing differentiated service level agreements for specific sets of documents, with richer metadata extraction, better quality insurance and differentiated preservation strategies, will be the logical next step of the web archives long-term preservation project.
Keywords (en)
iPRES, Singapore, Web archives, Metadata, Characterization tools, ARC file format
Author of the digital object
Clément  Oury
Sébastien  Peyrard
Format
application/pdf
Size
516.5 kB
Licence Selected
CC BY-SA 3.0 AT
Conferences
Conference 2011
Content
Details
Object type
PDFDocument
Format
application/pdf
Created
26.06.2013 10:50:24
Metadata