Titel (eng): LARGE-SCALE COLLECTIONS UNDER THE MAGNIFYING GLASS: FORMAT IDENTIFICATION FOR: Poster - iPres 2010 - Vienna

Autor: Oury, Clément

Beschreibung (eng): Institutions that perform web crawls in order to gather
heritage collections have millions – or even billions – of
files encoded in thousands of different formats about
which they barely know anything. Many of these
heritage institutions are members of the International
Internet Preservation Consortium, whose Preservation
Working Group decided to address the issues related to
format identification in web archive.
Its first goal is to design an overview of the formats to
be found in different types of collections (large-, smallscale…)
over time. It shows that the web seems to be
becoming a more standardized space. A small number
of formats – frequently open – cover from 90 to 95% of
web archive collections, and we can reasonably hope to
find preservation strategies for them.
However, this survey is mainly built on a source – the
MIME type of the file sent in the server response – that
gives good statistical trends but is not fully reliable for
every file. This is the reason why it appears necessary to
study how to use, for web archives, identification tools
developed for other kinds of digital assets.

Sprache des Objekts: Englisch

Rechte: Creative Commons Lizenzvertrag
Dieses Werk bzw. dieser Inhalt steht unter einer CC BY-SA 2.0 AT - Creative Commons Namensnennung - Weitergabe unter gleichen Bedingungen 2.0 Österreich Lizenz.

CC BY-SA 2.0 AT

http://creativecommons.org/licenses/by-sa/2.0/at/

Klassifikation: iPRES, Vienna

Conferences, Conference 2010

Besuche auf dieser Seite:

Downloads:

Mitglied in der/den Collection(s) (3):
o:424738 Openaire v3.0 collection
o:245914 iPRES 2010 - Proceedings of the 7th International Conference on Preservation of Digital Objects: iPRES 2010 - Vienna
o:168770 Open Access Documents in Phaidra
Permanent Identifier