User:JBurns

From Open Annotation Collaboration

Jump to: navigation, search

Contents

John Burns

Director, Platform Research

ProQuest

Use Case: Open Annotations for ProQuest

Collections

ProQuest holds large and significant collections of academic material , both primary and secondary literature, ranging across essentially all subject areas. The content includes journals, books, newspapers, magazines, datasets, image archives, historical records, and databases. In total it has more than 4 billion `documents’ distributed across many distinct repositories, including well know primary resources such as EEBO, Chadwyck-Healy, and Dialog as well as over 9,000 periodicals and 2 million doctoral theses. Given the increases in multidisciplinary research, it is likely that a researcher will use multiple repositories, including those in subject areas that he is less familiar with. An interoperable annotations store would allow users to have a common annotation storage mechanism across and beyond the ProQuest collections, and would allow transparent migration of annotations as the delivery platforms evolve, split or merge.

Target Annotations

Given the scale and range of the content held by ProQuest and the number and diversity of users, any annotation facilities provided by ProQuest per-se will have to be fairly generic, or targeted at largish groups of users with common practices. We would initially explore the provision of a basic annotation facility related to text documents, i.e. the ability to annotate a 'chunk' of text in the target document with one of more 'chunks' in source documents plus literal 'commentary' text that would be embedded in the annotation body as described in paragraph 3.5 (URNs) of the (α3) OAC model document. In particular we would want to investigate issues around fragility of reference, as we would need annotation targets to be stable in face of representation changes and perhaps even content changes. In particular image sub-parts as the source and target of an annotation need to be robust to changes in resolution, and text references in the face of differing document formats. We would expect to explore ways of ensuring that reference binding was as robust as possible, and aware of the work of Paul Watry et al in this domain.

We are aware both of the diversity of practice within the scholarly community and our necessarily limited understanding of the nuances of such practice, we would therefore undertake to provide a well documented API so that specialist clients could be created on a variety of delivery platforms such as Firefox, Chrome, Papers, iPad & c.

Target Scholars

As previously alluded to, archives such as ProQuest provide services to the entire research community, both in academia and the commercial world. Consequently initial roll-out of any service by ProQuest must address large communities of practice and be fairly generic. We feel that widespread adoption of basic but useful services should come first and will encourage the creation of more specialized tools as the limits of the generic solutions become evident, and that such creation is initially best addressed within the communities of practice. Providing generally useful facilities will enable us to get early and rapid feedback and to identify widely sought after additional features directly from the users. We would also expect to work with users in academia to refine an API that would enable domain specific client facilities to be readily implemented.

Relevance

ProQuest and similar archives have enormous reach, and they could well be the deciding factor in the adoption of OAC. They are not tied to the academic funding cycle, and so they can move rapidly to implement services. They provide a good partner for initiatives such as OAC because they can expose it to almost every academic on the planet. From ProQuest's perspective annotation is something we have to support, and the interoperability could allow us to deliver a common service across multiple repositories quickly, whilst easing the problems of migration as the delivery platforms evolve and change. Moreover, the elegance and generality of the OAC model and its foundations in the linked data initiative means that the same underlying infrastructure can potentially be used to provide other services.

Challenges

The existing closed services are not suitable to the full range of academic needs. The range of ProQuest's user base and content demands a well conceived solution that can be applied generally across every discipline and content type. The biggest challenge will be to ensure that the URI addressability requirements of the OAC model can be satisfied by the delivery platforms, and so the early implementation of OAC conformant services on the ProQuest systems will force that issue to be addressed.