Title
Duplicate Detection for Quality Assurance of Document Image Collections
Subtitle (en)
Paper - iPRES 2012 - Digital Curation Institute, iSchool, Toronto
Description (en)
Digital preservation workflows for image collections involving automatic and semi-automatic image acquisition and processing are prone to reduced quality. We present a method for quality assurance of scanned content based on computer vision. A visual dictionary derived from local image descriptors enables efficient perceptual image fingerprinting in order to compare scanned book pages and detect duplicated pages. A spatial verification step involving descriptor matching provides further robustness of the approach. Results for a digitized book collection of approximately 35.000 pages are presented. Duplicated pages are identified with high reliability and well in accordance with results obtained independently by human visual inspection.
Keywords (en)
iPRES, iSchool, Toronto, Canada, digital preservation, information retrieval, image processing