You are here: University of Vienna PHAIDRA Detail o:931103
Title
Diverse Digital Collections Meet Diverse Uses: Applying Natural Language Processing to Born-Digital Primary Sources
Language
English
Description (en)
Use of primary sources often focuses on identifying and tracking entities (e.g. people, places, organizations, events) and other values (e.g. dates and times) across documents. There are many existing open-source natural language processing (NLP) tools that can identify and report on named entities, and projects in the digital humanities have previously demonstrated the scholarly value of NLP approaches when working with digitized materials. To date, there has been relatively little adoption of NLP tools for the analysis of born-digital materials by libraries, archives and museums (LAMs). There are a variety of challenges associated with applying NLP tools to born-digital primary source collections, including those forensically acquired from removable media. Many of the challenges relate to the diversity of materials and potential use cases. This paper reports on the BitCurator NLP project, which is developing software for LAMs to extract and expose features in text extracted from such materials. The resulting services and methods can be used by LAM professionals and the users they serve.
Keywords (en)
iPRES, Kyoto
Author of the digital object
Christopher  Lee
Author of the digital object
Kam  Woods
Format
application/pdf
Size
132.4 kB
Licence Selected
CC BY-SA 4.0 International
Conferences
Conference 2017
Type of publication
Article in collected edition
Content
Details
Object type
PDFDocument
Format
application/pdf
Created
20.02.2019 09:21:41
Metadata