Open Annotation: Beta Data Model Guide
10 August 2011
The Open Annotation Data Model specifies an approach for associating annotations with resources, using a methodology conformant with the Architecture of the World Wide Web and the Linked Data initiative. It draws on the Annotea model, as well as more recent extensions of that model.
1.1 Namespaces Used
1.2 Note about Examples
2. Open Annotation Guiding Principles
3. Open Annotation Data Model
3.1 Baseline Model
3.2 Serialization of Model
3.3 Additional Properties and Relationships
3.4 Annotation Types
3.5 Inline Information
3.6 Multiple Targets
3.7 Segments of a Resource and Constrained Resources
3.7.1 Fragment Identifiers
3.7.2 Media Fragment Identifiers
3.7.3 Constrained Targets
220.127.116.11 Inline Constraint Data
18.104.22.168 RDF Constraint Data
3.7.4 Constrained Body
3.7.5 Multiple Constraint Precedence
3.7.6 Constraint Type List
3.8 Time Dependent Annotations
3.9 Structured Data Annotations
3.9.1 Inline Data Annotations
Annotating, the act of associating one piece of information with one (or more) other piece(s) of information, is a pervasive activity shared by all humanity across all walks of life. Web citizens make comments about online resources using either tools built in to the hosting web site, external web services, or the functionality of an annotation client. Comments about photos on Flickr, videos on Youtube, people's posts on Facebook, or resources mentioned on Twitter could all be considered as annotations associated with the original resource. There are a plethora of closed and proprietary web-based "sticky note" systems, such as Google's Sidewiki, and stand-alone multimedia annotation systems. The primary complaint about these systems is that the annotation is locked in to the web site or tool which was used to create them, and cannot be seen, managed or leveraged in any way outside of it.
Annotating is also a pervasive element of scholarly practice for both the humanist and the scientist. It is a method by which scholars organize existing knowledge and facilitate the creation and sharing of new knowledge. It is used by individual scholars when reading as an aid to memory, to add commentary, and to classify, but more importantly it can also facilitate shared editing, scholarly collaboration, and pedagogy. Annotations can have value in their own right, as a means of scholarly communication.
Yet scholars also remain dissatisfied with the options available for annotating digital resources. Scholars that wish to annotate online resources are in the same position of having to learn different annotation clients for different content repositories, have no easy way to integrate annotations made on different systems or created by colleagues using other tools, and are often limited to simplistic and constrained models of annotation.
The Open Annotation data model provides an interoperable method of expressing annotations such that they can easily be shared between platforms, with sufficient richness of expression to satisfy scholars' needs while remaining simple enough to also allow for common use cases such as attaching a piece of text to a single web resource. Annotations are modeled as a set of connected resources, including a body and one or more targets, where the body is somehow about those targets.
Unlike previous attempts, the Open Annotation system does not prescribe a protocol for creating, managing and retrieving annotations. Instead it describes a web-centric method, founded on the ideas of the Linked Data initiative, which enables discovery and sharing of annotations without clients or servers having to agree on a particular set of operations on those annotations. This publish/discover approach is described in a companion document (not yet available).
This specification uses the following namespaces and prefixes to indicate those namespaces:
||Content in RDF [Content in RDF]|
||Dublin Core elements [DC Elements]|
||Dublin Core terms [DC Terms]|
||Dublin Core types [DC Types|
||Friend of a Friend vocabulary [FOAF]|
||Open Annotation vocabulary [OAC]|
||RDF vocabulary [RDF Vocabulary]|
||RDF Schema vocabulary [RDF Vocabulary]|
Examples of the data model are available in separate, themed documents. Links to the relevant sections are included below, and will be updated as more examples become available.
Current example documents:
An Annotation is a document identified by an HTTP URI that describes an association created between a Body resource and a Target resource. The Body must be somehow "about" the Target for it to be considered the body of an Annotation. This gives rise to a tripartite base model with the same basic structure as that of Annotea [Annotea]
|Figure 1: Baseline Model|
Both the Body and the Target of the Annotation can be any resource on the web, identified by a URI. As such, they can have representations in any format or language (or no language), or have no representations at all, such as for abstract resources that denote a concept.
The Open Annotation ontology defines the following classes:
- oac:Annotation (A-1)
- A document identified by an HTTP URI that describes, at least, the Body and Target resources involved in the annotation.
- oac:Body (B-1)
- The body of the annotation. The Body is somehow about the Target resource. It is the information which is annotating the Target.
- oac:Target (T-1)
- The resource that is being annotated.
And the following relationships:
- The relationship between Annotation and Body.
- The relationship between Annotation and Target.
Please note that both oac:Body and oac:Target should not be interpreted as exclusive classes when considered across multiple Annotations, as the Body of one Annotation may be the Target of another.
Dereferencing the HTTP URI of the Annotation document results in an RDF serialization of an instance of this data model. Any of the RDF serialization formats are permissible, however RDF/XML is recommended. It may be desirable to allow clients to request different serializations through the use of Content Negotiation.
|Figure 2: Serialization of an Annotation|
As an Annotation is a resource with a URI, additional properties and relationships can be associated with it. It should have a timestamp of when the annotation relationship was created (dcterms:created) and a reference to the agent that created it (dcterms:creator). It is recommended that the object of dcterms:creator relationship be a foaf:Agent, and it have at least foaf:name and foaf:mbox properties.
Resources referenced by additional relationships may themselves have additional properties and relationships. The set of relationships below is by no means exhaustive: other relationships and properties may also be used.
It is also important to note that the properties and relationships of the Annotation do not necessarily apply to either the Body or the Target. The same property may be used on each of the three different resources with different values.
It is also possible to have additional relationships between resources present in the model. There may be a predicate that expresses the relationship between the Body and the Target, for example. The oac:annotates relationship is used to demonstrate this in Figure 3.1. Example documents may contain additional uses of these constructions.
|Figure 3.1: Additional Properties and Relationships|
- foaf:Agent (U-1)
- An agent, in this case the creator of the Annotation
- The name of the Annotation
- The creator of the Annotation
- The time and date at which the Annotation was created
- The name of the creator
- The email address of the creator
- The subject resource is the body of an annotation and is somehow about the object resource
While oac:Annotation is the base class for Annotations, there can be many other more specialized types. Many systems allow users to reply to Annotation, for example, and one might want to type a particular annotation as an oac:Reply in order to distinguish it from other sorts of Annotation.
Additional classes can also be used for additional requirements of the Annotation document. For example, a hypothetical oac:PhotoCommentary might require that the target be an image (a dcmitypes:Image). In this case the model would be exactly the same as Figure 2 above, but with oac:PhotoCommentary in place of oac:Annotation.
|Figure 4: Additional Types of Annotation: oac:Reply|
A list of known subClasses of oac:Annotation is available. The list is maintained separately from this guide for ease of discoverability and maintenance.
The baseline model assumes that all resources have URIs, and are available on the web. Some clients may not be able to generate URIs on their own for the resources that are created as part of the annotation process. For example, it may be necessary for the client to transmit a single document which includes the Body as plain text, and any other user generated information.
To allow the client to embed the Body directly into the Annotation document, we assign a unique non-resolvable URI (called a URN) as the identifier for the Body. It is suggested that an identifier in the urn:UUID scheme be used, however any URN is possible. This would be appropriate for clients that function primarily offline, or cannot generate URIs for the text entered by the user. Servers which discover these URIs should rewrite them into HTTP URIs which they control and assert an equivalence between the HTTP URI and the original URN. More information about this process is available in the companion publish/discovery document (not yet available).
The Open Annotation data model leverages the W3C's "Representing Content in RDF" [Content in RDF] specification to include information directly within the Annotation document.
|Figure 5.1: Inline Body|
Relationships defined elsewhere:
- The representation of the resource, as plain text. In this case the resource is the Body of the Annotation.
- The name of the character encoding of the object of the cnt:bytes property, such as "utf-8" or "ascii"
Annotations can be about multiple resources, such as Annotations that compare or contrast two resources. The data model therefore needs to allow for multiple Targets for a single Annotation. This can be accomplished by expressing multiple hasTarget relationships.
While the Annotation normally stands for the relationship between the Body and the Target node, if there are different relationships between the Body and individual Targets in the multiple Target scenario, it is encouraged to be explicit and include these relationships in the RDF graph.
|Figure 6: Multiple Targets|
The user (or software agent) must be able to select a part of the resource as the Body or Target for an Annotation, not just the entire resource. Many Annotations have this requirement, as resources can be arbitrarily large and frequently there is only a small section which is of interest. For example, the Target may be an area within an image or video, or the Body may be a paragraph within a longer text.
There are several ways in which this can be accomplished. The first two approaches use the three node model described above, and latter approaches introduce a new subClass of oac:Target for situations in which the base model is insufficient.
A fragment URI normally identifies a part of a resource, and the method for constructing and interpreting these URIs is dependent on the media type of the resource. In general, fragment URIs are created by appending a fragment that describes the section of interest to the URI of the full resource, separated by a '#' character. For more information about fragments in URIs, please see RFC 3986 [RFC 3986].
It is recommended that when a definition exists for how to construct a fragment URI for a particular document format, and such a fragment would accurately describe the section of interest for the Annotation, then this technique should be used.
The known media types that define fragment construction rules are:
As the fragment URI identifies the resource that is the Body or Target of the Annotation, the data model is the same as the baseline described in section 2.1 with one exception. It is strongly recommended that the dcterms:isPartOf relationship be used to link back to the resource's URI without the fragment, for user agents that do not understand fragments and for ease of querying.
|Figure 7.1: Fragment URIs|
The W3C Media Fragment URI specification [Media Fragments] allows the creation of a URI that identifies a segment of image, video and audio resources for common scenarios. In contrast to (X)HTML fragments, where the fragment's target must be embedded in the data by the creator, Media Fragments can be generated by others as needed by following conventions.
The Media Fragment specification defines four fragment parameters, used in the same way as for PDFs and plain text, described above. Multiple parameters can be given, separated by an ampersand ('&').
The model and ontology for Media Fragment URIs is identical to that of fragment URI schemes defined explicitly per media type, as above in Figure 7.1.
In order to allow for resource segments which cannot be described using fragment URIs, the recommended approach is to define a new resource identified by a URN (again, a urn:UUID is appropriate) as the Target of the Annotation, which we call a ConstrainedTarget (CT-1). We then describe how that resource is constrained in a Constraint resource (C-1), and link from the ConstrainedTarget to the full resource (T-1) using the oac:constrains predicate. The Constrained Target should have only one Constraint; see the section on Multiple Constraints below if more are required.
The nature of the Constraint description will be dependent on the type of the resource for which the segment is being conveyed. It is then up to the annotation client to interpret the segment description with respect to the full resource. For example, an SVG path element could be used to describe an area within an image, a SPARQL or SQL query could be used to describe a slice of a database.
This Constraint resource cannot be attached directly to the Body or Target, as it is only true within the context of the Annotation rather than true in all circumstances. This is a restriction of the RDF data model, as all statements must be globally true, not only true within a particular graph or description.
This approach can be used to constrain resources in ways other than to describe a part of the resource. For example, one might want to annotate an image resource only as it appears within a particular HTML page to say that it looks out of place. A second example is to annotate a particular version of a resource, and for this use case please see section 3.8 below.
|Figure 7.3: Constraints|
It is possible to include the Constraint information inline within the Annotation document using the same technique as used for including the Body, described in section 3.5. The Constraint is given a URN (normally a urn:UUID) and then the Constraint information is included as the value of the cnt:chars property. The requirements for doing this are the same as for including the Body inline within the Annotation document.
|Figure 7.3.1: Inline Constraints|
Instead of including the Constraint description in a single string value, it may be more appropriate to embed the information within the Annotation document directly using new relationships specific to the type of Constraint. For example, rather than requiring the client to parse out the required information from a string, it may be easier to retrieve the information directly from the RDF graph.
Examples of appropriate uses would include segment descriptions that do not have an external standard that could be applied, and trivially simple segment descriptions where it would be significantly more work to parse the inline data than just to inspect the RDF graph. For example, an SVG path does not make sense to be included in the RDF graph directly, however it would be appropriate if the Constraint is a description of a range of text by using a copy of text itself, along with the text immediately before and after.
|Figure 7.3.2: RDF Constraints|
The Body can be constrained in exactly the same way as the Target of the Annotation.
Example requirements for this might be when one section of a longer text is an annotation on another resource, or when part of a video discusses a particular resource but it would not be appropriate to model the entire video as the Body of the Annotation.
|Figure 7.4: Constrained Body|
It is possible that multiple constraints need to be used in order to ensure that the annotation's body or target are fully and correctly described. There are several use cases for when this is important, that fall under two broad areas:
This specification recognises the requirement for explicit constraint precedence within the model, but does not provide a final solution. Instead we invite further feedback on the suggested implementation described in this discussion document. The Open Annotation Collaboration appreciates your patience and engagement with this complicated issue.
The different types of Constraint that are known to the Open Annotation community are maintained in a separate document for ease of discovery.
The data served from a URI at any given time (the representation) is not necessarily the same as the representation served at any other time. For some resources the data is very frequently different, as in the case of news pages, search results and so forth. An Annotation is likely to apply only to the resource as it was at the time of the creation of the Annotation. Furthermore, the Annotation could be created at a different time than the resources involved.
The data model distinguishes three different types of annotation, with respect to time. These can be distinguished via the use of the oac:when property from either the Annotation or the Body and Targets.
Uniform Time Annotations need a single timestamp. This is attached to the Annotation node using the oac:when property.
|Figure 8.1: Uniform Time Annotations|
Varied Time Annotations need a timestamp for each resource involved. They are attached to an oac:WebTimeConstraint for the resource, which records the timestamp using the oac:when property.
|Figure 8.2: Varied Time Annotations|
It is possible to retrieve appopriate representations for a resource involved in an Annotation using the Memento extension to HTTP [Memento]. For further information about persistent annotations, please see our JCDL 2010 paper [JCDL 2010].
A common set of use cases for annotation is annotating data with other data, rather than with information intended for human consumption. It may be desirable, for example, to annotate text with the identifiers of the resources named or described, such as associating identifiers for places with mentions of the locations. Equally, the data may be more complex, involving relationships between resources, further properties and so forth.
The approach taken by the Open Annotation Data Model is to have an Annotation where the body is intended for machines, rather than humans. A review of a product could be considered an annotation where the rating and commentary are in the body, and the product is the target.
All of the previous requirements and principles apply: the Body is a document which contains the machine readable information, which is somehow "about" the target or targets. Further relationships and properties can be attached to the body, such as creator and format, and between the body and other resources. Please note that the other patterns, such as the use of Constraints, can also be applied to the Body of a DataAnnotation.
The Body resource may participate in other Annotations with different roles, it could be the Target of a further annotation or the Body in an Annotation intended for humans if the resource is also easily interpretable such as XHTML with embedded RDF or microformats. Thus the intention of the creator of the Annotation that it be interpreted by machines is signalled by the use of the oac:DataAnnotation subclass.
If the information is encoded in RDF, please note that the assertions are being made by the creator of the Body resource, not the creator of the Annotation resource. This is important for the web of trust and similar concepts. Secondly, the identifier(s) for the target(s) of the annotation can be used within the RDF to further clarify the relationships or properties being asserted.
|Figure 9: Structured Data Annotations|
The following class is defined to support machine readable annotations:
- An annotation in which the Body is intended for consumption by machines.
As with inline Annotation Bodies intended for humans, it is possible to embed the data within the Annotation document. The Content in RDF specification again, along with the dc:format property to inform the processing application of the type of data that it should expect from the inlined content. This content can be any sort of data, and might require the use of other subClasses from the Content in RDF specification such as ContentAsBase64 or ContentAsXML.
Note that it would be possible to include RDF using various techniques, including Named Graphs and reification. For consistency, we recommend the use of ContentAsText unless there is a requirement or use case that other techniques can solve more easily.
|Figure 9.1: Inline Data Annotations|
|2010-04-19||rsanderson||Internal alpha release|
|2010-06-30||rsanderson||External alpha2 release|
|2010-10-15||rsanderson||External alpha3 release|
|2011-03-15||tcole||added links to additional examples|
|2011-08-10||rsanderson||External beta release|