Showing posts with label Interoperability. Show all posts
Showing posts with label Interoperability. Show all posts

Friday, October 10, 2014

Annotopia 101 - Basic use for document/data annotation

This post explains how to get started in using Annotopia as a server for document/data annotation. It assumes Annotopia is already installed and running and that you have admin access to the instance.


Step 1. Register your system

After logging in (as admin) to Annotopia, you will see a welcome screen:
  • Click on 'Administration Dashboard' (top left of the screen)


  • Select 'Create System'

  • Fill out the form and 'Save system'

  • Take note of the 'API key' which is going to be used by your system to communicate with Annotopia when Annotopia is not set up to use a stronger Authentication mechanism.

 

Step 2. Create my first annotation (POST)

Assuming that the server address is http://myserver.example.com:8090 we are going to create our first POST. Normally your application will connect to the server through Ajax or a server call. For the sake of this tutorial we are going to use curl that is easy to use in command line.

The structure of the POST for an annotation item is very simple (API documentation here):

  curl -i -X POST http://myserver.example.com:8090/s/annotation \
       -H "Content-Type: application/json" \
       -d'{"apiKey":"{+SYSTEM_API_KEY}", "outCmd":"frame", "item":{+ANNOTATION}}
Where +SYSTEM_API_KEY is the API key of the previous section and +ANNOTATION is the actual annotation content. Notice also the parameter "outCmd":"frame", this is used to frame the JSON-LD result, which means that the result will always be returned with a precise hierarchical structure so that the clients don't have to deal with the variability of a graph-like representation.

A simple example of Annotation of type Highlight (conformant to the Open Annotation Model) would be:

{
 "@context": "https://raw2.github.com/Annotopia/AtSmartStorage/master/web-app/data/OAContext.json",
 "@id": "urn:temp:7",
 "@type": "oa:Annotation",
 "motivatedBy": "oa:highlighting",
 "annotatedBy": {
  "@id": "http://orcid.org/0000-0002-5156-2703",
  "@type": "foaf:Person",
  "foaf:name": "Paolo Ciccarese"
 },
 "annotatedAt": "2014-02-17T09:46:11EST",
 "serializedBy": "urn:application:utopia",
 "serializedAt": "2014-02-17T09:46:51EST",
 "hasTarget": {
  "@id": "urn:temp:8",
  "@type": "oa:SpecificResource",
  "hasSelector": {
   "@type": "oa:TextQuoteSelector",
   "exact": "senior scientist and software engineer",
   "prefix": "I am a",
   "suffix": ", working in the bio-medical informatics field since the year 2000"
  },
  "hasSource": {
   "@id": "http://paolociccarese.info",
   "@type": "dctypes:Text"
  }
 }
}


Note that:
  • the '@context' is necessary for the server to interpret the content
  • the '@id' fields contain a temporary value. In fact, when posting an annotation for the first time, the server will mint URIs for Annotation and Target and will return the updated content to the client as a response of the POST
  • the 'motivatedBy' property declares that the intent of the annotation is of highlighting.
  • the 'hasTarget' uses a quote of the annotated piece of content. 
  • the 'hasSource/@id' represents the URI of the annotated resource
  • the 'hasSelector' identifies a fragment of that resource.
  • the 'serializedBy' declares which system created the artifact. In the above case it is the Utopia for PDF application. Domeo would  be urn:application:domeo**.
** Note that this aspect is not fully implemented yet, therefore only specific systems are recognized by Annotopia and used for filtering. All others are managed but not exploited. In other words, currently only two values are fully manage 'urn:application:domeo' and 'urn:application:utopia'. Alternative values can be used and stored but they will not appear in the facets.  for search.

A simpler example of Annotation of type Comment of an entire resource:

{
    "@context": "https://raw2.github.com/Annotopia/AtSmartStorage/master/web-app/data/OAContext.json",
    "@id": "urn:temp:001",
    "@type": "http://www.w3.org/ns/oa#Annotation",
    "motivatedBy": "oa:commenting",
    "annotatedBy": {
        "@id": "http://orcid.org/0000-0002-5156-2703",
        "@type": "foaf:Person",
        "foaf:name": "Paolo Ciccarese"
    },
    "annotatedAt": "2014-02-17T09:46:11EST",
    "serializedBy": "urn:application:domeo",
    "serializedAt": "2014-02-17T09:46:51EST",
    "hasBody": {
        "@type": [
            "cnt:ContentAsText",
            "dctypes:Text"
        ],
        "cnt:chars": "This is an interesting document",
        "dc:format": "text/plain"
    },
    "hasTarget": "http://paolociccarese.info"
}
 
Note that:
  • the 'hasBody' shows how to encode textual content
  • the 'hasTarget' is just a URI**.
** Note that as the target is a URI, anything identifiable can be annotated. In the above case we are annotating a web page, but the URI could be the identifier for a Data point as well.

Once the POST is sent, if everything is correct, the server (if "outCmd":"frame" was specified) will return a result message that has the following structure:

{"status":"saved", "result": {"duration": "1764ms","graphs":"1","item":[{
  "@context" : {
    ...
  },
  "@graph" : [ {
    "@id" : "http://myserver.example.com:8090/s/annotation/597C3DE9-8657-4FA6-ABCA-895A74B448E9",
    "@type" : "oa:Annotation",
    "http://purl.org/pav/previousVersion" : "urn:temp:7",
    "annotatedAt" : "2014-02-17T09:46:11EST",
    "annotatedBy" : {
      "@id" : "http://orcid.org/0000-0002-5156-2703",
      "@type" : "foaf:Person",
      "name" : "Paolo Ciccarese"
    },
    "hasTarget" : {
      "@id" : "http://myserver.example.com:8090/s/resource/ED20AE10-4916-485C-903D-54D6F11DF682",
      "@type" : "oa:SpecificResource",
      "http://purl.org/pav/previousVersion" : "urn:temp:8",
      "hasSelector" : {
        "@id" : "_:b0",
        "@type" : "oa:TextQuoteSelector",
        "exact" : "senior scientist and software engineer",
        "prefix" : "I am a",
        "suffix" : ", working in the bio-medical informatics field since the year 2000"
      },
      "hasSource" : {
        "@id" : "http://paolociccarese.info",
        "@type" : "dctypes:Text"
      }
    },
    "motivatedBy" : "oa:highlighting",
    "serializedAt" : "2014-02-17T09:46:51EST",
    "serializedBy" : "urn:application:utopia"
  } ]
}]}}

Note that:
  • the updated message is stored in the "item" section in a '@graph'
  • the '@id' have been updated with resolvable URIs
  • the property "http://purl.org/pav/previousVersion" returns the original temporary '@id' for matching.

Step 3. How to include bibliographic metadata/identifiers

Annotopia can use identifiers (PubMed IDs, PubMed Central IDs, DOIs and PIIs) to resolve equivalent documents. For example a HTML version of the document vs a PDF version. Or multiple HTML versions of the same document.

To include bibliographic metadata identifiers in the annotation, is sufficient to add the data to the 'hasSource' section as follows:

"hasSource": {
 "@id": "http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3102893/",
 "@type": "dctypes:Text",
 "format": "text/html",
 "http://purl.org/vocab/frbr/core#embodimentOf": {
                "http://purl.org/dc/terms/title":"An open annotation ontology for science on web 3.0",
                "http://prismstandard.org/namespaces/basic/2.0/doi": "10.1186/2041-1480-2-S2-S4",
  "http://purl.org/spar/fabio#hasPII": "2041-1480-2-S2-S4",
  "http://purl.org/spar/fabio#hasPubMedCentralId": "PMC3102893",
  "http://purl.org/spar/fabio#hasPubMedId": "21624159"
 }
}

Step 4. Request for a specific annotation (GET {$id})

For requesting a specific annotation through its URI it is sufficient to execute (see API docs):

    curl -i -X GET +ANNOTATION_URI -H "Content-Type: application/json" \
       -d'{"apiKey":"+SYSTEM_API_KEY","outCmd":"frame"}'

Step 4. Request for annotations for a document (GET)

It is common to request the annotation for a particular document by URI (see API docs):

    curl -i -X GET http://myserver.example.com:8090/s/annotation \
      -H "Content-Type: application/json" \
      -d '{"apiKey":"+SYSTEM_API_KEY","tgtUrl":"http://www.jbiomedsem.com/content/2/S2/S4"}'


Or by bibliographic identifier:
    curl -i -X GET http://myserver.example.com:8090/s/annotation \
      -H "Content-Type: application/json" \
      -d '{"apiKey":"+SYSTEM_API_KEY","tgtIds":"{'pii':'2041-1480-2-S2-S4'}"}'

Tuesday, July 08, 2014

Adding bibliographic data to Open Annotation

One of the challenges for achieving interoperability between annotation clients that deal with different formats (for example PDF and HTML, see previous post Domeo and Utopia integration through Annotopia) is to be able to identify the annotated content.

For example, let's consider the paper about Annotation Ontology: Ciccarese P, Ocana M, Castro LJG, Das S, Clark, T. An open annotation ontology for science on web 3.0. J Biomed Semantics 2011, 2(Suppl 2):S4 (17 May 2011) [doi:10.1186/2041-1480-2-S2-S4]

Besides this PDF version (there might be others) of the article:
* PDF at Journal of Biomedical Semantics

The manuscript can be found in HTML format at least in these two locations (which exhibits different layouts):
* PubMed Central
* Journal of Biomedical Semantics

We know that the same content can be identified through identifiers:
* DOI (Digital Object Identifier) 10.1186/2041-1480-2-S2-S4
* PMID (PubMed ID) 21624159
* PMCID (PubMed Central ID) PMC3102893
* PII (Publisher Item Identifier) 2041-1480-2-S2-S4

In order to take into account all the available identifiers, it is possible to include in the annotation target the additional information. So if the client is annotating the PubMed Central version of the document (identified by the URL http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3102893/), the source of the target will be identified by:

 ...
    "hasSource": {
        "@id": "http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3102893/",
        "@type": "dctypes:Text",
        "frbr:embodimentOf" : 
        { 
            "prism:doi": "10.1186/2041-1480-2-S2-S4",
            "fabio:hasPII":"2041-1480-2-S2-S4",
            "fabio:hasPubMedCentralId":"PMC3102893",
            "fabio:hasPubMedId":"21624159"
        }
        
    }
Where I made use of the FaBiO (FRBR aligned bibliographic Ontology) ontology which, in turns, reuse term from the FRBR ontology and the PRISM vocabulary. Kudos to Silvio Peroni for pointing out that the relationship between the Manifestation (HTML page) and the Expression should be frbr:embodimentOf and not fabio:manifestationOf. The latter would assume the identifiers are identifying the Work.

Domeo and Utopia integration through Annotopia

Here is a very recent demo of annotation created on a HTML document through Domeo and then seen on the correspondent PDF with the Utopia PDF viewer. All through Open Annotation and Annotopia.

Thanks to Steve Pettifer and Dave Thorne (for the Utopia plugin development); Thomas Wilkins for OAuth implementation in Annotopia. Annotopia is currently architected and developed by me.

Sunday, February 13, 2011

Which principles drive ontology adoption?

Several weeks ago, I started to think of the next version of the Annotation Ontology (AO). After one year spent developing the Annotation Framework and discussing with several colleagues and friends, I certainly have a little list of things I want to improve. Nothing major, mostly a clean up.

Before proceeding with the updates, I wanted to better clarify the set of principles I want to follow in developing AO2. These are, in random order: Traceability, Orthogonality, Generality, Interoperability, Modularity, Extensibility, Adequate Documentation, Community Driven. The reason why I am listing this principles is important, I believe they influence adoption.

As you might have noticed the number of available ontologies is constantly increasing. If you need to use an ontology, you have to go through the process of revising what is out there, and selecting what you think is most appropriate. How many time have you done that? How many time did you succeed? How many times did you find the right ontology covering exactly what you needed? I am pretty sure that if you are involved in the development of a complex application the answer is something like: I found a few ontologies I could mix and match... I still need to add pieces... and, most importantly, I am not sure I agree on the way some or them are done. Right. Welcome to the Semantic Web I would say.

I remember the old days - many years ago - when Dublin Core Metadata Element Set, Version 1.1 (DC) was the answer to almost everything. When I started working on SWAN (Semantic Web Applications in Neuromedicine) in 2006 I found immediately DC to be insufficient for our needs. For days I've been struggling trying to understand what to do: use DC and being sloppy or create something more appropriate risking isolation and to increase the entropy of the Semantic Web world.

Well at that time my answer has been the Provenance, Authoring and Versioning Ontology (PAV) now available in version 2. The choice, at the time, has been dictated also by practical reasons: if I was using DC for Annotation Properties and I wanted to be OWL DL, I could not use it also for other properties. Since then, PAV has been used in our applications but also in several others developed by people/groups I barely know - sometimes I wish they just would tell me something like: "hey I am using PAV and it's cool" or even "hey I am using PAV and it sucks because...". PAV has also been considered as one of the starting points for the W3C Provenance Incubator Group.

PAV was not such a bad idea at the end. But it was a risky business. If you are developing an application you need always to keep an eye on what is existing and the other eye on your requirements. This results to be even more complicated because it is hard to find appropriate ontologies and, when you find them, they often don't have adequate documentation for you to understand that is what you are looking for. Surprise! The  lack of shared knowledge about the ontology does not help it to emerge and does not help adoption... unless, of course, external factors - networking, important supporters, big institutions ... - come into play. And external factors are not little thing.