Monday, January 27, 2014

JSON-LD, Jena, Virtuoso and Named Graphs

After working for a couple of years on the Domeo Annotation Tool I am now working on a couple of projects that focus on the creation of a back-end for saving/searching annotation. I am planning to use the Open Annotation model and some other ontologies such as: PAV (Provenance, Authoring and Versioning) ontology and maybe CO (Collections Ontology).

Named Graphs and JSON-LD

Most importantly I am going to make large use of Named Graphs and their serialization in JSON-LD format, which is the recommended format for Open Annotation. JSON-LD became very recently a W3C Recommendation.
A Named Graph is a collections of Statements that is identified by a URI.
JSON-LD is a lightweight Linked Data format. It is easy for humans to read and write. It is based on the already successful JSON format and provides a way to help JSON data interoperate at Web-scale.
JSON-LD provides a very slick way of representing Named Graphs. Here is an example of Named Graph used for representing a very basic annotation (with Open Annotation):
  
  {
     "@context": {
        ...
     },
     "@id": "http://example.org/graphs/1",
     "@graph":
     [
        {
          "@id": "http://www.example.org/ann/1",
          "@type": "oa:Annotation",
          "hasBody": "http://www.example.org/body/1",
          "hasTarget": "http://www.example.org/target/1"
        }
     ]
  }

  Figure 1 - JSON-LD representation of a Named Graph and Open Annotation data.
  You can find the full @context in the Open Annotation specifications.

Loading JSON-LD in memory with Jena API 

I would like to store the above Named Graph for instance in the triple store Virtuoso Open-Source Edition. For this task I chose the Apache Jena API that makes use of the JSON-LD implementation for Java

I will start by loading in memory the above JSON-LD code (figure 1) that is currently in a JSON file:
  
  JenaJSONLD.init(); // Only needed once
  
  Dataset dataset = DatasetFactory.createMem();
  InputStream inputStream = new FileInputStream(annotationFile);
  if(inputStream == null) {
    throw new IllegalArgumentException("File: " + annotationFile + " not found");
  }
  RDFDataMgr.read(dataset, inputStream, "http://example.com/", JenaJSONLD.JSONLD);

  Figure 2 - Jena API code for loading the JSON-LD file in an in-memory Dataset.
The reason why I used a Dataset rather than a Model is because the
Dataset is a collection of named graphs and a background graph (also called the default graph or unnamed graph)
And that fits exactly the needs we have with the code in Figure 1. And the needs of much more complex use cases related to Domeo. Also, this approach works for both the JSON-LD making and not making use of graphs. If the JSON-LD does not contain any graph, the Statements will belong to the default graph.

Note: When I tired to use the Model and not the Dataset for loading the JSON-LD files, I realized that only the files with no @graph declarations were loaded correctly. The ones with the @graph declaration were not generating any statement.

Persist the Named Graphs in Virtuoso 

And these are the few lines of code I use to store the in-memory graphs in the Virtuoso store (I am sure there is a better way of doing this and combining the above step with these lines of code, however, this seems to work the way I want):
  // Default graph
  if(dataset.getDefaultModel()!=null && dataset.getDefaultModel().size()>0) {
    VirtGraph virtGraph = new VirtGraph (
      "jdbc:virtuoso://localhost:1111", "dba", "dba");
    VirtModel virtModel = new VirtModel(virtGraph);
    virtModel.add(dataset.getDefaultModel());
    // Print the triples
    println "graph: *"
    RDFDataMgr.write(System.out, dataset.getDefaultModel(), JenaJSONLD.JSONLD);
  }

  // Named graphs
  Iterator names = dataset.listNames()
  while(names.hasNext()) {
    String name = names.next();
    Model model = dataset.getNamedModel(name)
    VirtGraph virtGraph = new VirtGraph (name, 
      "jdbc:virtuoso://localhost:1111", "dba", "dba");
    VirtModel virtModel = new VirtModel(virtGraph);
    virtModel.add(model);

    // Print the triples
    println "graph: " + name
    RDFDataMgr.write(System.out, model, JenaJSONLD.JSONLD);
  }

  Figure 3 - Saving default and named graphs in Virtuoso

Software versions used in the example above

For the above examples I've used the following libraries/versions:
  • jena-core v. 2.11.0
  • jena-arq v. 2.11.0
  • jsonld-java-jena v. 0.2.99
  • virtjdbc4.jar
  • virt_jena2.jar