Showing posts with label Annotation Framework (AF). Show all posts
Showing posts with label Annotation Framework (AF). Show all posts

Friday, October 10, 2014

Annotopia 101 - Basic use for document/data annotation

This post explains how to get started in using Annotopia as a server for document/data annotation. It assumes Annotopia is already installed and running and that you have admin access to the instance.


Step 1. Register your system

After logging in (as admin) to Annotopia, you will see a welcome screen:
  • Click on 'Administration Dashboard' (top left of the screen)


  • Select 'Create System'

  • Fill out the form and 'Save system'

  • Take note of the 'API key' which is going to be used by your system to communicate with Annotopia when Annotopia is not set up to use a stronger Authentication mechanism.

 

Step 2. Create my first annotation (POST)

Assuming that the server address is http://myserver.example.com:8090 we are going to create our first POST. Normally your application will connect to the server through Ajax or a server call. For the sake of this tutorial we are going to use curl that is easy to use in command line.

The structure of the POST for an annotation item is very simple (API documentation here):

  curl -i -X POST http://myserver.example.com:8090/s/annotation \
       -H "Content-Type: application/json" \
       -d'{"apiKey":"{+SYSTEM_API_KEY}", "outCmd":"frame", "item":{+ANNOTATION}}
Where +SYSTEM_API_KEY is the API key of the previous section and +ANNOTATION is the actual annotation content. Notice also the parameter "outCmd":"frame", this is used to frame the JSON-LD result, which means that the result will always be returned with a precise hierarchical structure so that the clients don't have to deal with the variability of a graph-like representation.

A simple example of Annotation of type Highlight (conformant to the Open Annotation Model) would be:

{
 "@context": "https://raw2.github.com/Annotopia/AtSmartStorage/master/web-app/data/OAContext.json",
 "@id": "urn:temp:7",
 "@type": "oa:Annotation",
 "motivatedBy": "oa:highlighting",
 "annotatedBy": {
  "@id": "http://orcid.org/0000-0002-5156-2703",
  "@type": "foaf:Person",
  "foaf:name": "Paolo Ciccarese"
 },
 "annotatedAt": "2014-02-17T09:46:11EST",
 "serializedBy": "urn:application:utopia",
 "serializedAt": "2014-02-17T09:46:51EST",
 "hasTarget": {
  "@id": "urn:temp:8",
  "@type": "oa:SpecificResource",
  "hasSelector": {
   "@type": "oa:TextQuoteSelector",
   "exact": "senior scientist and software engineer",
   "prefix": "I am a",
   "suffix": ", working in the bio-medical informatics field since the year 2000"
  },
  "hasSource": {
   "@id": "http://paolociccarese.info",
   "@type": "dctypes:Text"
  }
 }
}


Note that:
  • the '@context' is necessary for the server to interpret the content
  • the '@id' fields contain a temporary value. In fact, when posting an annotation for the first time, the server will mint URIs for Annotation and Target and will return the updated content to the client as a response of the POST
  • the 'motivatedBy' property declares that the intent of the annotation is of highlighting.
  • the 'hasTarget' uses a quote of the annotated piece of content. 
  • the 'hasSource/@id' represents the URI of the annotated resource
  • the 'hasSelector' identifies a fragment of that resource.
  • the 'serializedBy' declares which system created the artifact. In the above case it is the Utopia for PDF application. Domeo would  be urn:application:domeo**.
** Note that this aspect is not fully implemented yet, therefore only specific systems are recognized by Annotopia and used for filtering. All others are managed but not exploited. In other words, currently only two values are fully manage 'urn:application:domeo' and 'urn:application:utopia'. Alternative values can be used and stored but they will not appear in the facets.  for search.

A simpler example of Annotation of type Comment of an entire resource:

{
    "@context": "https://raw2.github.com/Annotopia/AtSmartStorage/master/web-app/data/OAContext.json",
    "@id": "urn:temp:001",
    "@type": "http://www.w3.org/ns/oa#Annotation",
    "motivatedBy": "oa:commenting",
    "annotatedBy": {
        "@id": "http://orcid.org/0000-0002-5156-2703",
        "@type": "foaf:Person",
        "foaf:name": "Paolo Ciccarese"
    },
    "annotatedAt": "2014-02-17T09:46:11EST",
    "serializedBy": "urn:application:domeo",
    "serializedAt": "2014-02-17T09:46:51EST",
    "hasBody": {
        "@type": [
            "cnt:ContentAsText",
            "dctypes:Text"
        ],
        "cnt:chars": "This is an interesting document",
        "dc:format": "text/plain"
    },
    "hasTarget": "http://paolociccarese.info"
}
 
Note that:
  • the 'hasBody' shows how to encode textual content
  • the 'hasTarget' is just a URI**.
** Note that as the target is a URI, anything identifiable can be annotated. In the above case we are annotating a web page, but the URI could be the identifier for a Data point as well.

Once the POST is sent, if everything is correct, the server (if "outCmd":"frame" was specified) will return a result message that has the following structure:

{"status":"saved", "result": {"duration": "1764ms","graphs":"1","item":[{
  "@context" : {
    ...
  },
  "@graph" : [ {
    "@id" : "http://myserver.example.com:8090/s/annotation/597C3DE9-8657-4FA6-ABCA-895A74B448E9",
    "@type" : "oa:Annotation",
    "http://purl.org/pav/previousVersion" : "urn:temp:7",
    "annotatedAt" : "2014-02-17T09:46:11EST",
    "annotatedBy" : {
      "@id" : "http://orcid.org/0000-0002-5156-2703",
      "@type" : "foaf:Person",
      "name" : "Paolo Ciccarese"
    },
    "hasTarget" : {
      "@id" : "http://myserver.example.com:8090/s/resource/ED20AE10-4916-485C-903D-54D6F11DF682",
      "@type" : "oa:SpecificResource",
      "http://purl.org/pav/previousVersion" : "urn:temp:8",
      "hasSelector" : {
        "@id" : "_:b0",
        "@type" : "oa:TextQuoteSelector",
        "exact" : "senior scientist and software engineer",
        "prefix" : "I am a",
        "suffix" : ", working in the bio-medical informatics field since the year 2000"
      },
      "hasSource" : {
        "@id" : "http://paolociccarese.info",
        "@type" : "dctypes:Text"
      }
    },
    "motivatedBy" : "oa:highlighting",
    "serializedAt" : "2014-02-17T09:46:51EST",
    "serializedBy" : "urn:application:utopia"
  } ]
}]}}

Note that:
  • the updated message is stored in the "item" section in a '@graph'
  • the '@id' have been updated with resolvable URIs
  • the property "http://purl.org/pav/previousVersion" returns the original temporary '@id' for matching.

Step 3. How to include bibliographic metadata/identifiers

Annotopia can use identifiers (PubMed IDs, PubMed Central IDs, DOIs and PIIs) to resolve equivalent documents. For example a HTML version of the document vs a PDF version. Or multiple HTML versions of the same document.

To include bibliographic metadata identifiers in the annotation, is sufficient to add the data to the 'hasSource' section as follows:

"hasSource": {
 "@id": "http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3102893/",
 "@type": "dctypes:Text",
 "format": "text/html",
 "http://purl.org/vocab/frbr/core#embodimentOf": {
                "http://purl.org/dc/terms/title":"An open annotation ontology for science on web 3.0",
                "http://prismstandard.org/namespaces/basic/2.0/doi": "10.1186/2041-1480-2-S2-S4",
  "http://purl.org/spar/fabio#hasPII": "2041-1480-2-S2-S4",
  "http://purl.org/spar/fabio#hasPubMedCentralId": "PMC3102893",
  "http://purl.org/spar/fabio#hasPubMedId": "21624159"
 }
}

Step 4. Request for a specific annotation (GET {$id})

For requesting a specific annotation through its URI it is sufficient to execute (see API docs):

    curl -i -X GET +ANNOTATION_URI -H "Content-Type: application/json" \
       -d'{"apiKey":"+SYSTEM_API_KEY","outCmd":"frame"}'

Step 4. Request for annotations for a document (GET)

It is common to request the annotation for a particular document by URI (see API docs):

    curl -i -X GET http://myserver.example.com:8090/s/annotation \
      -H "Content-Type: application/json" \
      -d '{"apiKey":"+SYSTEM_API_KEY","tgtUrl":"http://www.jbiomedsem.com/content/2/S2/S4"}'


Or by bibliographic identifier:
    curl -i -X GET http://myserver.example.com:8090/s/annotation \
      -H "Content-Type: application/json" \
      -d '{"apiKey":"+SYSTEM_API_KEY","tgtIds":"{'pii':'2041-1480-2-S2-S4'}"}'

Tuesday, July 08, 2014

Domeo and Utopia integration through Annotopia

Here is a very recent demo of annotation created on a HTML document through Domeo and then seen on the correspondent PDF with the Utopia PDF viewer. All through Open Annotation and Annotopia.

Thanks to Steve Pettifer and Dave Thorne (for the Utopia plugin development); Thomas Wilkins for OAuth implementation in Annotopia. Annotopia is currently architected and developed by me.

Friday, April 25, 2014

Annotopia: Creation/updates with Open Annotation (?)

I am currently developing the Annotopia Open Annotation server [GitHub, Living Slides, Talk 'I Annotate 2014'] and there are a few topics related to the application of the Open Annotation Model that might need further discussion within the community. I will start with one

:: Date/agent of annotation creation and update ::

Even if we don't have the need to support versioning (this will be subject of a future post) and in the unlikely event that the annotation cannot be edited we often need to be able to keep track of who/when the annotation has been created and, eventually, updated. 


Open Annotation Provenance Model

The Open Annotation Model now supports the following provenance relationships/properties:


Vocabulary ItemTypeDescription
oa:annotatedByRelationship[subProperty of prov:wasAttributedTo] The object of the relationship is a resource that identifies the agent responsible for creating the Annotation. This may be either a human or software agent.
There SHOULD be exactly 1 oa:annotatedBy relationship per Annotation, but MAY be 0 or more than 1, as the Annotation may be anonymous, or multiple agents may have worked together on it.
oa:annotatedAtPropertyThe time at which the Annotation was created.
There SHOULD be exactly 1 oa:annotatedAt property per Annotation, and MUST NOT be more than 1. The datetime MUST be expressed in the xsd:dateTime format, and SHOULD have a timezone specified.
oa:serializedByRelationship[subProperty of prov:wasAttributedTo] The object of the relationship is the agent, likely software, responsible for generating the Annotation's serialization.
There MAY be 0 or more oa:serializedBy relationships per Annotation.
oa:serializedAtPropertyThe time at which the agent referenced by oa:serializedBy generated the first serialization of the Annotation, and any subsequent substantially different one. The annotation graph MUST have changed for this property to be updated, and as such represents the last modified datestamp for the Annotation. This might be used to determine if it should be re-imported into a triplestore when discovered.
There MAY be exactly 1 oa:serializedAt property per Annotation, and MUST NOT be more than 1. The datetime MUST be expressed in the xsd:dateTime format, and SHOULD have a timezone specified.

So we can encode when the annotation has been created and usually that coincides with the time when the user created the annotation on the user interface of the annotation client.

Then the annotation is sent to the server to be persisted.

Do nothing approach

One possible approach, that I would rather not advocate for, is to forget about the concepts of creation and update: 'every time a change is performed on an annotation, the old instance is swapped with the new one. The new one replaces entirely the previous annotation and shares the same URI.'. In this case, 'annotatedAt' is always referring to the latest annotation event (no matter if it was the original creation or following updates). 

Use a richer provenance model

To be a little more exhaustive, in Annotopia, as I was doing in Domeo and Annotation Ontology, I could use a series of properties of PAV (Provenance, Authoring and Versioning) ontology [paper]: pav:createdOn (when it has been created), pav:createdBy (who created it), pav:lastUpdateOn (when it has been last updated), pav:lastUpdateBy (who last updated the annotation).

So I could say:

Option A: Add lastUpdateOn

In this scenario we use annotatedAt/annotatedBy for the annotation creation and lastUpdateOn/lastUpdateBy for the last update.

{
    "@id" : "http://host/s/annotation/830ED7EE-BF7B-4A18-8AE1-A9AF96AC135B",
    "@type" : "oa:Annotation",
    "annotatedAt" : "2014-02-17T09:46:11EST",
    "annotatedBy" : {
      "@id" : "http://orcid.org/0000-0002-5156-2703",
      "@type" : "foaf:Person",
      "name" : "Paolo Ciccarese"
    },
    "pav:lastUpdateOn" : "2014-03-11T11:46:11EST",
    "pav:lastUpdateBy" : {
      "@id" : "http://example.org/johndoe",
      "@type" : "foaf:Person",
      "name" : "John Doe"
    }
...
}

In this case, both events would refer to when the act has been performed on the user interface (?).

Option B: Add createdOn and lastUpdateOn

Here we make use of annotatedAt/annotatedBy, createdOn/createdBy and  lastUpdateOn/lastUpdateBy

{
    "@id" : "http://host/s/annotation/830ED7EE-BF7B-4A18-8AE1-A9AF96AC135B",
    "@type" : "oa:Annotation",
    "pav:previousVersion" : "urn:temp:001",
    "annotatedAt" : "2014-02-17T09:46:11EST",
    "annotatedBy" : {
      "@id" : "http://orcid.org/0000-0002-5156-2703",
      "@type" : "foaf:Person",
      "name" : "Paolo Ciccarese"
    },
    "pav:createdOn" : "2014-02-17T09:48:11EST",
    "pav:createdBy" : {
      "@id" : "http://orcid.org/0000-0002-5156-2703",
      "@type" : "foaf:Person",
      "name" : "Paolo Ciccarese"
    },
    "pav:lastUpdateOn" : "2014-03-11T11:46:11EST",
    "pav:lastUpdateBy" : {
      "@id" : "http://example.org/johndoe",
      "@type" : "foaf:Person",
      "name" : "John Doe"
    }
...
}

In this case it is necessary to agree on the semantics of all those properties. I could use:
(i) 'createdOn/createdBy' for the original creation on the (Annotopia) server
(ii) 'lastUpdateOn/lastUpdateBy' for the last update on the (Annotopia)  server
(iii) and what is  'annotatedAt' going to indicate? The original creation or the latest update? And how do I keep track of the agents involved?

Saturday, February 08, 2014

From CATCH to HarvardX to Annotopia

On October 18, 2012, Philip Desenne (at the time Senior Product Manager, Academic Technology Services at Harvard), Martin Schreiner (Head of Maps, Media, Data and Government Information, Harvard College Library) and I got awarded a small grant from Harvard Library Labs called CATCH: Common Annotation, Tagging, and Citation at Harvard.

The idea was to create a federated network of server for storing annotations created for pedagogical purposes. As we knew there are many applications at Harvard creating annotation we wanted to provide a common back-end for all these to store, retrieve and search for annotation. The CATCH was meant to produce also some services for translating annotation into Open Annotation format so that we could store all the annotation coming from different tools in a uniform way that would have made search a lot easier.

Obviously, as I've spent the last two years developing the Domeo Annotation Tool, the idea was also to have Domeo using the same technology for storing/retrieving/searching annotation.



However, the original grant has been broken down in two phases and only the first phase has been funded so far. As result of the first phase I produced with the help of Justin Miranda, a back end for persisting annotation produced by an annotator client based on annotator.js technology.

Three weeks ago,  both client (thanks to the work by Daniel Cebrian Robles and Phil Desenne) and the CATCH server (developed by Justin Miranda and I) entered production in HarvardX for one class that counts about 14.000 students.

As the result of phase I was supposed to be just a prototype and not a production quality server, this has been a stressful and at the same time exciting transition.
In a few days, the CATCH counts already 21.000 annotation produced by more than 800 students and the number of annotations is increasing steadily.
The future of CATCH is named Annotopia
The original plan for CATCH has not been fully realized and the streaming of funding ended. So in agreement with Tim Clark (Director of MIND Informatics and PI of the Domeo project) we decided to create a new project called Annotopia that will consist in developing the full potential of the original CATCH idea. Annotopia will also provide additional services: text mining, terms search and support for semantic annotation. These features were already available in Domeo but they will be generalized and made available through APIs for third party annotation clients. 

The CATCH codebase will merge with the new platform and, at least for now, we will still refer to the name CATCH for indicating the instance for HavardX of the Annotopia annotation back-end.

The first release of Annotopia is scheduled by the end of March.

Saturday, July 20, 2013

Domeo, Annotation Framework, Catch Annotation Hub and Grails Plugins architecture

I found organizing big projects in components always a reassuring idea.
Component Oriented Programming? I let you decide if that is what I mean. I’ve read several discussion on the topic Component Oriented Programming vs. Object Oriented Programming and I am personally one of those who believes the two strategies are complementary and not in competition. As I am not interested in debating the theoretical differences, I would stick to what I normally do and not what I think.
That is one of the reasons I’ve always liked - and I still like - OSGi and that is also one of the reasons I’ve been always attracted by the Grails Plugins architecture.
The components oriented approach did not always pay off. Occasionally I just gave up when I found myself fighting with the technology of the moment, which was getting a little on the way. I am sure most of my problems were related to my limited knowledge of that particular technology... still, deadlines are deadlines and I needed to get things done.
I am certainly not the first nor the last developer celebrating the Grails Plugin Oriented Architecture. Here is a blog post that shows how a domain class defined in a plugin can be reused by other components of the architecture.

However, I have been thinking about the OSGi-based Eclipse architecture for a long time and I even tried to develop a lighter Java framework for developing applications along the same lines. Naturally, since I've been using Grails, I’ve been thinking on how to reproduce the same behavior in web applications by using Grails plugins. Basically I am talking about conveniently leveraging plugins to benefit from all the perks of the Grails platform: domain classes, services, controllers and views. I will defer to future posts some of the technical details. Meanwhile I wish to provide a little context.

I am thinking of leveraging the plugin architecture for a project called CATCH that I’ve been working on for a tiny grant awarded by Harvard Library Labs. As the Domeo Annotation Tool already provided some of the features I need for CATCH I've decided to refactor and spin off some of its components. I've  created a new GitHub project called Annotation Framework which will collect all the new improved modules that will be later used by both Domeo and CATCH


CATCH Annotation Hub

The goal of CATCH is to provide a hub for collecting/searching and sharing annotation produced by several clients. These includes the Domeo annotation client, HighBrow - an annotation client developed at Harvard by Reinhard Engels - and annotator.js an open-source JavaScript library and tool - developed by Nick Stenning  - that can be added to any webpage to make it annotatable.

Both CATCH and the older sister project Domeo are meant to be installed in several instances that should be able to communicate with each other in a federated architecture. You can think this as a series of Annotation Framework Nodes that are distributed and connected so that when a user performs a search on one of the nodes, it can also find results that have been created and stored in other authorized/linked nodes. All with access control...


Friday, February 04, 2011