Title (eng)

Considerations for High Throughput Digital Preservation: Paper - iPRES 2011 - Singapore

Author

Jason Pierson

Robert Sharpe

James Carr

Mark Evans

Description

In partnership with Tessella, FamilySearch is developing an automated approach to large scale digitization, ingest and longterm preservation of electronic content. The set of proposed processes and underlying architecture must support required ingest rates in excess of 20Tb a day.
Significant effort has been placed on examining the preservation architecture and processes for potential bottlenecks. Digital preservation requires computational intensive capabilities to provide functionality such as fixity checking, format identification and characterization of content. When operating at very large scale there is also a real need for a large network bandwidth and high speed storage systems.
By minimizing the need for human interaction and employing software parallelization our initial findings indicate that the primary bottleneck is not processor bound, but is directly associated with the movement of digital files into and within the application. In short the scalability problem is really a system engineering problem and not necessarily an issue for digital preservation per se.

Object languages

English

Rights

Creative Commons License
This work is licensed under a
CC BY-SA 3.0 AT - Creative Commons Attribution-ShareAlike 3.0 Austria License.

CC BY-SA 3.0 AT

http://creativecommons.org/licenses/by-sa/3.0/at/

Classification

iPRES, Singapore, Digital Preservation, Digital Archiving, Scalability, Automation

Conferences, Conference 2011

Member of the Collection(s) (3)

o:424738 Openaire v3.0 collection
o:294299 iPRES 2011 - Proceedings of the 8th International Conference on Preservation of Digital Objects: iPRES 2011 - Singapore
o:168770 Open Access Collection