Title (eng)

Will Formal Preservation Models Require Relative Identity? An exploration of data identity statements: Paper - iPRES 2012 - Digital Curation Institute, iSchool, Toronto


Simone Sacchi

Karen M. Wickett

Allen H. Renear


Digital Curation Institute, iSchool University of Toronto


The problem of identifying and re–identifying data put the notion of of ”same data” at the very heart of preservation, integration and interoperability, and many other fundamental data curation activities. However, it is also a profoundly challenging notion because the concept of data itself clearly lacks a precise and univocal definition. When science is con- ducted in small communicating groups, with homogeneous data these ambiguities seldom create problems and solutions can be negotiated in casual real-time conversations. However when the data is heterogeneous in encoding, con- tent and management practices, these problems can produce costly inefficiencies and lost opportunities. We consider here the relative identity view which apparently provides the most natural interpretation of common identity statements about digitally–encoded data. We show how this view conflicts with the curatorial and management practice of “data” objects, in terms of their modeling, and common knowledge representation strategies.

In what follows we focus on a single class of identity statements about digitally–encoded data: “same data but in a different format”. As a representative example of the use of this kind of statements consider the dataset “Federal Data Center Consolidation Initiative (FDCCI) Data Center Closings 2010-2013”1 , available at Data.gov. Anyone can “Down- load a copy of this dataset in a static format”. The available formats include CSV, RDF, RSS, XLS, and XML. Each of this is presumably an encoding of the “same data”. We explore three approaches to formalization into first order logic and for each we identify distinctive tradeoffs for preservation models. Our analysis further motivates the development of a system that will provide a comprehensive treatment of data concepts.

Object languages





Creative Commons License
This work is licensed under a
CC BY-NC-SA 3.0 AT - Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Austria License.




iPRES, iSchool, Toronto, Canada, data, identity, scientific equivalence, data curation, digital preservation

Conferences, Conference 2012

Member of the Collection(s) (1)

o:293685 iPRES 2012 - Proceedings of the 9th International Conference on Preservation of Digital Objects: iPRES 2012 - Digital Curation Institute, iSchool, Toronto