User:AAshton
From Open Annotation Collaboration
Contents |
Andrew Ashton
Director, Center for Digital Scholarship
Brown University
Michael Park
Repository Programmer, Center for Digital Scholarship
Brown University
Use case: Annotating digital texts in the Brown University Library
Summary
Brown University Library seeks to build a set of annotation services for the Brown Digital Repository, an institutional repository based on Fedora Commons. The new annotation framework will enable faculty, students, and researchers at Brown to interact with and create new content in the repository. In accordance with the work currently being done by the Open Annotation Collaboration (OAC), the annotation framework will be designed using existing web-based technologies, including REST, AtomPub, and RDF. The service will be built upon an existing prototype, and will evolve along with the OAC data model, to explore the opportunities and challenges presented by implementing contextually rich annotations as a feature of an institution-wide digital repository.
Details
The Brown Digital Repository (BDR) is a store for digital objects that inform the scholarly work at the university. The breadth of formats and media types supported by the BDR presents a challenge in designing core service, such as annotation, that function across contexts. The OAC data model offers a solution to this problem by leveraging the BDR’s web-centric representations of objects as a common vocabulary for creating links between annotations and digital objects. While the goal of this effort is to develop a generalized approach to annotating repository objects, the initial work will focus on developing tools to work with Brown’s substantial collections of digital texts.
Brown University is home to a number of prominent collections of texts encoded using the Text Encoding Initiative (TEI) guidelines. The structural markup within these texts offers an initial mechanism to target annotations at specific sections and passages in a text. While online annotation tools, including Brown’s own Virtual Humanities Lab, have been available for some time, the OAC data model encourages a new approach. It offers the potential:
- To treat annotations as first-class objects within the digital repository; they can use all of the services and tools available to other objects of the same class.
- To aggregate and republish annotations as new texts, which are then subject to the same activities as their target texts (e.g., annotation, dissemination).
- To make annotations interoperable within a broader, Web-centric annotation context.
Early versions of this prototype have focused on using AtomPub as a protocol for creating annotations of TEI-encoded editions of several works from the Italian Renaissance. These works include Giovanni Pico della Mirandola's Oration on Human Dignity (1486), as presented in its first printed edition (Bologna, 1496; see Figure 1), and Pico's Conclusiones Nongentae Disputandae (1486), or 900 Theses. When an annotation is created, the Pico Project web application creates an Atom XML document, which contains metadata about the annotation, the content of the annotation, and a URI that defines the portion of the text that the annotation addresses. The system uses HTTP to post the document to an AtomPub web service, which ingests it into the digital repository. Upon ingestion, the repository creates an RDF document that defines the relationships between the annotation document and other objects in the repository. The annotation is thus stored as a new, distinct object in the repository. Although the initial release of the Pico Project uses an internally developed model for addressing the portions of the text targeted by an annotation, future versions will seek to adopt the OAC data model.
This experiment demonstrates several key benefits to treating annotations as distinct resources within the Web context. For example, a Mexican Pico scholar created annotations containing Spanish translations of each of Pico’s 900 Theses. Using the RDF and metadata from the annotation object, the Pico Project interface is able to aggregate that class of annotations into a new representation of the complete work – a Spanish translation of Pico’s text. That representation is then subject to all of the activities enabled by the project interface, including annotation, discovery, and comparison.
Figure 1. Creating annotations for Pico's Oratio de hominis dignitate (1486)
The OAC data model specifically addresses several practical challenges that emerge when contemplating the deployment of an institution-wide annotation service. OAC support for ORE Resource Maps provides a mechanism for managing the convoluted network of object relationships and representations manifest in such an environment. Furthermore, OAC integrates with ORE to provide a framework for attaching annotations to a segment of a resource, regardless of the format of the target object. This avoids the risk of developing annotation tools that work with only a single type of object, such as TEI. In creating the annotation framework, the Brown University Library will expand upon another prototyped tool that allows semantically encoded fragments of TEI to be expressed as generic RDF. This prototype, which is being developed as part of a grant to explore the use of TEI with the SEASR framework, offers a model for working with fragments of complex objects using the OAC guidelines.
Finally, as objects are edited or otherwise change, their related annotations may become obsolete or misleading, which may then impact any number of other linked resources. OAC offers the option to tie an annotation to a time-dependent representation of an object, thus at least ensuring referential integrity. As work proceeds in integrating the OAC data model into Brown’s annotation prototype, a number of new challenges will arise. They include rights management, handling machine or group-created annotations, and the use of the annotation framework with non-textual media. These efforts will require a deeper consideration of the interaction between annotation data and the many software layers that provide the framework for digital repository services.
