LARGE-SCALE COLLECTIONS UNDER THE  MAGNIFYING GLASS: FORMAT IDENTIFICATION FOR: Poster - iPres 2010 - Vienna

Clément Oury

Open in new window

Title (en)

LARGE-SCALE COLLECTIONS UNDER THE MAGNIFYING GLASS: FORMAT IDENTIFICATION FOR

Subtitle (en)

Poster - iPres 2010 - Vienna

Language

English

Description (en)

Institutions that perform web crawls in order to gather heritage collections have millions – or even billions – of files encoded in thousands of different formats about which they barely know anything. Many of these heritage institutions are members of the International Internet Preservation Consortium, whose Preservation Working Group decided to address the issues related to format identification in web archive. Its first goal is to design an overview of the formats to be found in different types of collections (large-, smallscale…) over time. It shows that the web seems to be becoming a more standardized space. A small number of formats – frequently open – cover from 90 to 95% of web archive collections, and we can reasonably hope to find preservation strategies for them. However, this survey is mainly built on a source – the MIME type of the file sent in the server response – that gives good statistical trends but is not fully reliable for every file. This is the reason why it appears necessary to study how to use, for web archives, identification tools developed for other kinds of digital assets.

Keywords (en)

iPRES, Vienna

Author of the digital object

Clément Oury

Format

application/pdf

Size

103.9 kB

Licence Selected

CC BY-SA 2.0 AT

Conferences

Conference 2010

Type of publication

Article in collected edition

Citable links

Persistent identifier
https://phaidra.univie.ac.at/o:245900
Handle
https://hdl.handle.net/11353/10.245900
Content

Download (103.9 kB)
Details

Uploader

Andreas Rauber

Object type

PDFDocument

Format

application/pdf

Created

20.11.2012 10:08:05 UTC
Usage statistics

-

-
This object is in collection

Open Access Collection

iPRES 2010 - Proceedings of the 7th International Conference on Preservation of Digital Objects: iPRES 2010 - Vienna

Openaire v3.0 collection
Metadata

Metadata XML
Export formats

Dublin Core

DataCite

LOM

EDM

OpenAIRE