User:HBrugman

From Open Annotation Collaboration

Jump to: navigation, search

Contents

Hennie Brugman

Technical Coordinator, CATCHPlus Project

Meetens Institute, Royal Netherlands Academy for Arts and Sciences

Using the OAC Model for Annotation Interoperability

Project Description

Since 2005 the CATCH research program funds research projects that are hosted by large Dutch cultural heritage institutions. Among these are the Rijksmuseum Amsterdam, the Dutch Royal Library, the Dutch National Archive, the central archive of the public broadcasting corporations (Sound and Vision), and several other archives, museums and libraries. The program initially mainly supported applied computer science research addressing the needs of collection managers. The focus is currently shifting towards e-Humanities research on digital cultural heritage collections, supported by innovative computer science. Currently there are 14 CATCH projects and there are more to come.

CATCH project teams all included programmers that delivered demonstrators and research prototypes. In 2009, the CATCHPlus valorization project was started with the task to turn CATCH research software into quality software that can be applied and sustainably maintained by the Dutch cultural heritage community. CATCHPlus is a 3-year project funded by the Dutch Ministry of Science (OC&W), the Netherlands Organization for Scientific Research (NWO), and the Dutch Ministry of Economic Affairs.

CATCHPlus tools and web services from 8 subprojects are required to operate as parts of the Dutch digital cultural heritage infrastructure. This is accomplished by collaborative work on the use of vocabularies, metadata standardization and publication, workspaces, user profiles, persistent identifiers and: annotations.

CATCHPlus tools and services are primarily intended for use by cultural heritage professionals and (humanities) researchers, but also visitors of online cultural heritage will benefit.

Annotations in CATCHPlus

A wide range of annotations plays a role in CATCHPlus use cases:

  • Complete digital resources are annotated with ‘traditional’ textual metadata records, with textual or semantic values.
  • Segments of plain and semi-structured textual resources are annotated with text values, semantic values or elements from controlled vocabularies. This sometimes results in complex linguistic annotations over several connected tiers.
  • Scanned handwritten documents are annotated with transcription text. The transcriptions are aligned with the scan images at line or word levels.
  • Annotations are annotated with images: for example, handwritten ‘word zones’ are annotated with 2D graphical representations of image data.
  • Audio recordings are annotated with text generated by automatic speech recognition, or by automatic speech alignment processes.
  • Radio and television programs are described at scene level.
  • Annotations (and annotation values) can be further annotated: for example, segments of transcription texts from automatic speech

recognition can be further annotated with semantic annotations.

CATCHPlus will collect heterogeneous annotations of (segments of) digital objects from different cultural heritage collections and institutions in a shared annotation repository. Such a repository has several advantages, one of them being that it forms a searchable index across cultural heritage collections, leading users directly to the right parts of objects. There are several other use cases for this repository.

Concrete tasks currently at hand are to agree on a sufficiently powerful annotation model and format, and to design and implement the repository and a REST web service to access it.

Our intention was to use the model presented in (Brugman, Malaisé & Hollink, 2008) as the basis for format and web service design. Since this model appears to be quite similar to the Open Annotation model, it will be very interesting to investigate in detail how the two models align. Open Annotation might very well cover all or most of our requirements.