Considerations for High Throughput Digital Preservation: Paper - iPRES 2011 - Singapore
In partnership with Tessella, FamilySearch is developing an automated approach to large scale digitization, ingest and longterm preservation of electronic content. The set of proposed processes and underlying architecture must support required ingest rates in excess of 20Tb a day.
Significant effort has been placed on examining the preservation architecture and processes for potential bottlenecks. Digital preservation requires computational intensive capabilities to provide functionality such as fixity checking, format identification and characterization of content. When operating at very large scale there is also a real need for a large network bandwidth and high speed storage systems.
By minimizing the need for human interaction and employing software parallelization our initial findings indicate that the primary bottleneck is not processor bound, but is directly associated with the movement of digital files into and within the application. In short the scalability problem is really a system engineering problem and not necessarily an issue for digital preservation per se.
This work is licensed under a
CC BY-SA 3.0 AT - Creative Commons Attribution-ShareAlike 3.0 Austria License.
CC BY-SA 3.0 AT
iPRES, Singapore, Digital Preservation, Digital Archiving, Scalability, Automation
Conferences, Conference 2011