Description (en)
The British Library’s web archive comprises several
terabyte of harvested websites. Like other content streams
this data should be ingested into the library’s central
preservation repository. The repository requires a
standardized Submission- and Archival Information
Package.
Harvested Websites are stored in Archival Information
Packages (AIP). Each AIP is described by a METS file.
Operational metadata for resource discovery as well as
archival metadata are normalized and embedded in the
METS descriptor using common metadata profiles such as
PREMIS and MODS.
The British Library’s METS profile for web archiving
considers dissemination and preservation use cases
ensuring the authenticity of data. The underlying complex
content model disaggregates websites into web pages,
associated objects and their actual digital manifestations.
The additional abstract layer ensures accessibility over the
long term and the ability to carry out preservation actions
such as migrations. The library wide preservation policies
and principles become applicable to web content as well.