Description (en)
Format identification output needs to be assessed within an institutional context, also considering provenance information that is not contained in the data, but provided by data producers by other means. Sometimes, real issues in the data need to be distinguished from warnings. Ideally, this assessment should permit to decide where to invest effort in correcting issues, where to just document them, and where to postpone activities. The poster presents preliminary considerations at the ETH Data Archive of ETH-Bibliothek, the main library of ETH Zurich, on how to address file format identification and validation issues. The underlying issues are mostly independent of the specific tools and systems employed.