Description (de)
Second LEARN Workshop Vienna: "Research Data Management towards Open Science – The Importance of Policies"
Mitschnitt einer Veranstaltung am Mittwoch, dem 6. April 2016 am Campus der Universität Wien
Teil 3. Andreas Rauber: Enabling precise identification and citability of dynamic data. Recommendations of the RDA Working Group
Moderation: Paolo Budroni (Universität Wien)
Kamera und Schnitt: Viktor Zdrachal
Andreas Rauber ist Professor am Institute of Software Technology and Interactive Systems der Technischen Universität Wien.
Abstract: In order to repeat an earlier study, to apply data from an earlier study to a new model, we need to be able to precisely identify the very subset of data used. While verbal descriptions of how the subset was created (e.g. by providing selected attribute ranges and time intervals) are hardly precise enough and do not support automated handling, keeping redundant copies of the data in question does not scale up to the big data settings encountered in many disciplines today. Conventional approaches, such as assigning persistent identifiers to entire data sets or individual subsets or data items, are not sufficient to meet these requirements. This problem is further exacerbated if the data itself is dynamic, i.e. if new data keeps being added to a database, if errors are corrected or if data items are being deleted. In this talk we will review the challenges identified above and discuss the solutions and recommendations that are currently elaborated within the context of a Working Group of the Research Data Alliance (RDA) on Data Citation: Making Dynamic Data Citeable. These approaches are based on versioned and time-stamped data sources, with persistent identifiers being assigned to the time-stamped queries/expressions that are used for creating the subset of data. We will review examples of how these can be implemented for different types of data, including SQL-style databases, CSV or XML files, and see how this fits into the larger context of activities on Data Citation.
INHALT
======
Kapitel Titel Position
---------------------------------------------------------------------
1. Vorspann 00:00:00
2. Introductory remarks 00:00:16
3. Challenges in data identification and citation 00:03:04
4. Recommendations of the RDA Working Group 00:18:10
5. Pilots and adoption 00:27:21
6. Questions from the audience 00:32:10