HckLab: JSON-LD

This post explains how to get started in using Annotopia as a server for document/data annotation. It assumes Annotopia is already installed and running and that you have admin access to the instance.

Step 1. Register your system

After logging in (as admin) to Annotopia, you will see a welcome screen:

Click on 'Administration Dashboard' (top left of the screen)

Select 'Create System'

Fill out the form and 'Save system'

Take note of the 'API key' which is going to be used by your system to communicate with Annotopia when Annotopia is not set up to use a stronger Authentication mechanism.

Step 2. Create my first annotation (POST)

Assuming that the server address is http://myserver.example.com:8090 we are going to create our first POST. Normally your application will connect to the server through Ajax or a server call. For the sake of this tutorial we are going to use curl that is easy to use in command line.

The structure of the POST for an annotation item is very simple (API documentation here):

  curl -i -X POST http://myserver.example.com:8090/s/annotation \
       -H "Content-Type: application/json" \
       -d'{"apiKey":"{+SYSTEM_API_KEY}", "outCmd":"frame", "item":{+ANNOTATION}}

Where +SYSTEM_API_KEY is the API key of the previous section and +ANNOTATION is the actual annotation content. Notice also the parameter "outCmd":"frame", this is used to frame the JSON-LD result, which means that the result will always be returned with a precise hierarchical structure so that the clients don't have to deal with the variability of a graph-like representation.

A simple example of Annotation of type Highlight (conformant to the Open Annotation Model) would be:

{
 "@context": "https://raw2.github.com/Annotopia/AtSmartStorage/master/web-app/data/OAContext.json",
 "@id": "urn:temp:7",
 "@type": "oa:Annotation",
 "motivatedBy": "oa:highlighting",
 "annotatedBy": {
  "@id": "http://orcid.org/0000-0002-5156-2703",
  "@type": "foaf:Person",
  "foaf:name": "Paolo Ciccarese"
 },
 "annotatedAt": "2014-02-17T09:46:11EST",
 "serializedBy": "urn:application:utopia",
 "serializedAt": "2014-02-17T09:46:51EST",
 "hasTarget": {
  "@id": "urn:temp:8",
  "@type": "oa:SpecificResource",
  "hasSelector": {
   "@type": "oa:TextQuoteSelector",
   "exact": "senior scientist and software engineer",
   "prefix": "I am a",
   "suffix": ", working in the bio-medical informatics field since the year 2000"
  },
  "hasSource": {
   "@id": "http://paolociccarese.info",
   "@type": "dctypes:Text"
  }
 }
}

Note that:

the '@context' is necessary for the server to interpret the content
the '@id' fields contain a temporary value. In fact, when posting an annotation for the first time, the server will mint URIs for Annotation and Target and will return the updated content to the client as a response of the POST
the 'motivatedBy' property declares that the intent of the annotation is of highlighting.
the 'hasTarget' uses a quote of the annotated piece of content.
the 'hasSource/@id' represents the URI of the annotated resource
the 'hasSelector' identifies a fragment of that resource.
the 'serializedBy' declares which system created the artifact. In the above case it is the Utopia for PDF application. Domeo would be urn:application:domeo**.

** Note that this aspect is not fully implemented yet, therefore only specific systems are recognized by Annotopia and used for filtering. All others are managed but not exploited. In other words, currently only two values are fully manage 'urn:application:domeo' and 'urn:application:utopia'. Alternative values can be used and stored but they will not appear in the facets. for search.

A simpler example of Annotation of type Comment of an entire resource:

{
    "@context": "https://raw2.github.com/Annotopia/AtSmartStorage/master/web-app/data/OAContext.json",
    "@id": "urn:temp:001",
    "@type": "http://www.w3.org/ns/oa#Annotation",
    "motivatedBy": "oa:commenting",
    "annotatedBy": {
        "@id": "http://orcid.org/0000-0002-5156-2703",
        "@type": "foaf:Person",
        "foaf:name": "Paolo Ciccarese"
    },
    "annotatedAt": "2014-02-17T09:46:11EST",
    "serializedBy": "urn:application:domeo",
    "serializedAt": "2014-02-17T09:46:51EST",
    "hasBody": {
        "@type": [
            "cnt:ContentAsText",
            "dctypes:Text"
        ],
        "cnt:chars": "This is an interesting document",
        "dc:format": "text/plain"
    },
    "hasTarget": "http://paolociccarese.info"
}

Note that:

the 'hasBody' shows how to encode textual content
the 'hasTarget' is just a URI**.

** Note that as the target is a URI, anything identifiable can be annotated. In the above case we are annotating a web page, but the URI could be the identifier for a Data point as well.

Once the POST is sent, if everything is correct, the server (if "outCmd":"frame" was specified) will return a result message that has the following structure:

{"status":"saved", "result": {"duration": "1764ms","graphs":"1","item":[{
  "@context" : {
    ...
  },
  "@graph" : [ {
    "@id" : "http://myserver.example.com:8090/s/annotation/597C3DE9-8657-4FA6-ABCA-895A74B448E9",
    "@type" : "oa:Annotation",
    "http://purl.org/pav/previousVersion" : "urn:temp:7",
    "annotatedAt" : "2014-02-17T09:46:11EST",
    "annotatedBy" : {
      "@id" : "http://orcid.org/0000-0002-5156-2703",
      "@type" : "foaf:Person",
      "name" : "Paolo Ciccarese"
    },
    "hasTarget" : {
      "@id" : "http://myserver.example.com:8090/s/resource/ED20AE10-4916-485C-903D-54D6F11DF682",
      "@type" : "oa:SpecificResource",
      "http://purl.org/pav/previousVersion" : "urn:temp:8",
      "hasSelector" : {
        "@id" : "_:b0",
        "@type" : "oa:TextQuoteSelector",
        "exact" : "senior scientist and software engineer",
        "prefix" : "I am a",
        "suffix" : ", working in the bio-medical informatics field since the year 2000"
      },
      "hasSource" : {
        "@id" : "http://paolociccarese.info",
        "@type" : "dctypes:Text"
      }
    },
    "motivatedBy" : "oa:highlighting",
    "serializedAt" : "2014-02-17T09:46:51EST",
    "serializedBy" : "urn:application:utopia"
  } ]
}]}}

Note that:

the updated message is stored in the "item" section in a '@graph'
the '@id' have been updated with resolvable URIs
the property "http://purl.org/pav/previousVersion" returns the original temporary '@id' for matching.

Step 3. How to include bibliographic metadata/identifiers

Annotopia can use identifiers (PubMed IDs, PubMed Central IDs, DOIs and PIIs) to resolve equivalent documents. For example a HTML version of the document vs a PDF version. Or multiple HTML versions of the same document.

To include bibliographic metadata identifiers in the annotation, is sufficient to add the data to the 'hasSource' section as follows:

"hasSource": {
 "@id": "http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3102893/",
 "@type": "dctypes:Text",
 "format": "text/html",
 "http://purl.org/vocab/frbr/core#embodimentOf": {

                "http://purl.org/dc/terms/title":"An open annotation ontology for science on web 3.0",

                "http://prismstandard.org/namespaces/basic/2.0/doi": "10.1186/2041-1480-2-S2-S4",
  "http://purl.org/spar/fabio#hasPII": "2041-1480-2-S2-S4",
  "http://purl.org/spar/fabio#hasPubMedCentralId": "PMC3102893",
  "http://purl.org/spar/fabio#hasPubMedId": "21624159"
 }
}

Step 4. Request for a specific annotation (GET {$id})

For requesting a specific annotation through its URI it is sufficient to execute (see API docs):

    curl -i -X GET +ANNOTATION_URI -H "Content-Type: application/json" \
       -d'{"apiKey":"+SYSTEM_API_KEY","outCmd":"frame"}'

Step 4. Request for annotations for a document (GET)

It is common to request the annotation for a particular document by URI (see API docs):

    curl -i -X GET http://myserver.example.com:8090/s/annotation \
      -H "Content-Type: application/json" \
      -d '{"apiKey":"+SYSTEM_API_KEY","tgtUrl":"http://www.jbiomedsem.com/content/2/S2/S4"}'

Or by bibliographic identifier:

    curl -i -X GET http://myserver.example.com:8090/s/annotation \
      -H "Content-Type: application/json" \
      -d '{"apiKey":"+SYSTEM_API_KEY","tgtIds":"{'pii':'2041-1480-2-S2-S4'}"}'

After working for a couple of years on the Domeo Annotation Tool I am now working on a couple of projects that focus on the creation of a back-end for saving/searching annotation. I am planning to use the Open Annotation model and some other ontologies such as: PAV (Provenance, Authoring and Versioning) ontology and maybe CO (Collections Ontology).

Named Graphs and JSON-LD

Most importantly I am going to make large use of Named Graphs and their serialization in JSON-LD format, which is the recommended format for Open Annotation. JSON-LD became very recently a W3C Recommendation.

A Named Graph is a collections of Statements that is identified by a URI.

JSON-LD is a lightweight Linked Data format. It is easy for humans to read and write. It is based on the already successful JSON format and provides a way to help JSON data interoperate at Web-scale.

JSON-LD provides a very slick way of representing Named Graphs. Here is an example of Named Graph used for representing a very basic annotation (with Open Annotation):

  
  {
     "@context": {
        ...
     },
     "@id": "http://example.org/graphs/1",
     "@graph":
     [
        {
          "@id": "http://www.example.org/ann/1",
          "@type": "oa:Annotation",
          "hasBody": "http://www.example.org/body/1",
          "hasTarget": "http://www.example.org/target/1"
        }
     ]
  }

  Figure 1 - JSON-LD representation of a Named Graph and Open Annotation data.
  You can find the full @context in the Open Annotation specifications.

Loading JSON-LD in memory with Jena API

I would like to store the above Named Graph for instance in the triple store Virtuoso Open-Source Edition. For this task I chose the Apache Jena API that makes use of the JSON-LD implementation for Java.

I will start by loading in memory the above JSON-LD code (figure 1) that is currently in a JSON file:

  
  JenaJSONLD.init(); // Only needed once
  
  Dataset dataset = DatasetFactory.createMem();
  InputStream inputStream = new FileInputStream(annotationFile);
  if(inputStream == null) {
    throw new IllegalArgumentException("File: " + annotationFile + " not found");
  }
  RDFDataMgr.read(dataset, inputStream, "http://example.com/", JenaJSONLD.JSONLD);

  Figure 2 - Jena API code for loading the JSON-LD file in an in-memory Dataset.

The reason why I used a Dataset rather than a Model is because the

Dataset is a collection of named graphs and a background graph (also called the default graph or unnamed graph)

And that fits exactly the needs we have with the code in Figure 1. And the needs of much more complex use cases related to Domeo. Also, this approach works for both the JSON-LD making and not making use of graphs. If the JSON-LD does not contain any graph, the Statements will belong to the default graph.

Note: When I tired to use the Model and not the Dataset for loading the JSON-LD files, I realized that only the files with no @graph declarations were loaded correctly. The ones with the @graph declaration were not generating any statement.

Persist the Named Graphs in Virtuoso

And these are the few lines of code I use to store the in-memory graphs in the Virtuoso store (I am sure there is a better way of doing this and combining the above step with these lines of code, however, this seems to work the way I want):

  // Default graph
  if(dataset.getDefaultModel()!=null && dataset.getDefaultModel().size()>0) {
    VirtGraph virtGraph = new VirtGraph (
      "jdbc:virtuoso://localhost:1111", "dba", "dba");
    VirtModel virtModel = new VirtModel(virtGraph);
    virtModel.add(dataset.getDefaultModel());
    // Print the triples
    println "graph: *"
    RDFDataMgr.write(System.out, dataset.getDefaultModel(), JenaJSONLD.JSONLD);
  }

  // Named graphs
  Iterator names = dataset.listNames()
  while(names.hasNext()) {
    String name = names.next();
    Model model = dataset.getNamedModel(name)
    VirtGraph virtGraph = new VirtGraph (name, 
      "jdbc:virtuoso://localhost:1111", "dba", "dba");
    VirtModel virtModel = new VirtModel(virtGraph);
    virtModel.add(model);

    // Print the triples
    println "graph: " + name
    RDFDataMgr.write(System.out, model, JenaJSONLD.JSONLD);
  }

  Figure 3 - Saving default and named graphs in Virtuoso

Software versions used in the example above

For the above examples I've used the following libraries/versions:

jena-core v. 2.11.0
jena-arq v. 2.11.0
jsonld-java-jena v. 0.2.99
virtjdbc4.jar
virt_jena2.jar

HckLab

Friday, October 10, 2014

Annotopia 101 - Basic use for document/data annotation

Step 1. Register your system

Step 2. Create my first annotation (POST)

Step 3. How to include bibliographic metadata/identifiers

Step 4. Request for a specific annotation (GET {$id})

Step 4. Request for annotations for a document (GET)

Monday, January 27, 2014

JSON-LD, Jena, Virtuoso and Named Graphs

Named Graphs and JSON-LD

Loading JSON-LD in memory with Jena API

Persist the Named Graphs in Virtuoso

Software versions used in the example above

About Me in Blogspot