Thursday, February 24, 2011

Principle: Traceability [2] - Provenance and Doublin Core

For people working with Semantic Web technologies, for a long time, provenance has been called Dublin Core Metadata Element Set, a vocabulary of fifteen properties for use in resource description (as they currently state in the webpage). Let's take, for instance, the following property
'creator': an entity primarily responsible for making the content of the resource. Examples of a Creator include a person, an organization, or a service. Typically the name of the Creator should be used to indicate the entity.
You can find the guidelines for the usage of the creator property here. If we consider the RDF format (important for providing a syntactical framework) we can look at the following (RDF/XML) example:
<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 
 xmlns:dc="http://purl.org/dc/elements/1.1/"> 
  <rdf:Description rdf:about="http://www.w3.org/TR/hcls-swan/">
   <dc:title>Semantic Web Applications in Neuromedicine (SWAN) Ontology</dc:title>
   <dc:creator>Paolo Ciccarese</dc:creator>
   <dc:date>2009-10-20</dc:date>
   <dc:format>text/html</dc:format>
   <dc:language>en</dc:language>
  </rdf:Description>
</rdf:RDF>
Of the above example I want to focus on the following triple (Turtle syntax):
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .

<http://www.w3.org/TR/hcls-swan/> dc:creator "Paolo Ciccarese".
Now if you take a look of the actual document I wrote where I appear as the Editor. It actually happened that somebody else created the file for that note and I've filled in the actual content. This situation is difficult to model using simply Dublin Core Element Set. Probably one way to go is to distinguish between the file and the content.

Another example. Let's say I want to create a file with a quote from a book or a speech. I create the HTML file (my resource). However, the actual content has been authored by somebody else. How do I represent it with Dublin Core Element Set. Let me give it a try:
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 
 xmlns:dcterms="http://purl.org/dc/terms/" 
 xmlns:dc="http://purl.org/dc/elements/1.1/">
  <rdf:Description rdf:about="http://www.paolociccarese.info/example/quotes/1">
   <dc:title>My favourite quote</dc:title>
   <dc:creator>Paolo Ciccarese</dc:creator>
   <dc:date>2011-02-23</dc:date>
   <dc:format>text/html</dc:format>
   <dc:language>it</dc:language>
   <dcterms:hasPart>
     <rdf:Description>
       <dc:creator>Dante Alighieri</dc:creator>
       <dc:description>Lasciate ogni speranza, voi ch'entrate</dc:description>
     </rdf:Description>
   </dcterms:hasPart>
  </rdf:Description>
</rdf:RDF>
As you probably noticed I have not defined a URI for the quote and therefore the generated triples will include a blank node. I can also think of making up a URI like http://www.paolociccarese.info/example/quotes/1#quote as long as I can make it resolvable. The above snippet does more or less what I wanted to. Now, one thing I don't like is that Dante Alighieri and I are both creators. As a matter of fact, in the quote there is some intellectual property involved, while in the making of the simple HTML page, not so much. However this could lead to problems as drawing the lines is not easy. I could also consider the use of the property contributor - see the guideline here -, however I am not sure that is appropriate in the present case.

No comments: