HckLab: 2008

Wednesday, December 31, 2008

Moving towards the SWAN Collections Ontology [2]

The idea of reasoning with sequential structures in OWL-DL is appealing. However, as already mentioned, we cannot use the RDF vocabulary in OWL-DL.

Drummond et al. [1] proposed a way of representing sequential structures in OWL-DL. Analyzing the work of Hirsh & Kudenko [2] Drummond argued that "their representation requires extensive rewriting, the relation of the resulting structures to the original lists is not intuitive and, more importantly, the resulting structures grow as the square of the length of the list". Then, he describes a general list pattern, an intuitive approach related to that suggested by Hayes [3] and incorporated in the Semantic Web Best Practice Working Group’s note on n-ary relations.

The list pattern works as follow:

Each item is held in a “cell” (OWLList); each cell has 2 pointers, one to a head (hasContents - functional) and one to the tail cells (hasNext - functional); the end of the list is indicated by a terminator (EmptyList) which also serves to represent the empty list. A transitive property, isFollowedBy, as a super-property of hasNext as been defined as well. In other words the members of any list are the contents of the first element plus the contents of all of the following elements. A separate OWL vocabulary has been defined as the RDF vocabulary cannot be used in OWL-DL.

Through the transitive property followedBy it is possible to ask things like: give me all the items that are followedBy "AC" for instance and it doesn't matter what is in between the item and the sequence itself.

In Manchester Syntax:

OWLList can express:

For instance for the pattern (A*):

List_only_As --> List AND
hasContents ONLY A AND
isFollowedBy ONLY (List AND hasContents ONLY A)

Still, in OWL-DL there are a bunch of constraints that cannot be defined (and I would suggest to read the paper for the complete list).

The list ontology page.

[1] Nicholas Drummond, Alan Rector, Robert Stevens, Georgina Moulton, Matthew Horridge, Hai Wang and Julian Sedenberg (2006). Putting OWL in Order: Paterns for sequences in OWL. OWL Experiences and Directions (OWLED 2006), Athens, Georgia, USA.
[2] Hirsh, H. and D. Kudenko. Representing Sequences in Description Logics. in Fourteenth National Conference on Artificial Intelligence. 1997.
[3] Noy, N.F. and A. Rector, N-ary relations. 2004, Editors Draft, Semantic Web Best Practices Working Group, W3C.

Tuesday, December 30, 2008

Moving towards the SWAN Collections Ontology [1]

RDF Containers
RDF allows the usage of three kinds of containers:

rdf:Bag - A Bag represents a group of resources or literals, possibly including duplicate members, where there is no significance in the order of the members. For example, a Bag might be used to describe a group of part numbers in which the order of entry or processing of the part numbers does not matter.
rdf:Seq - A Sequence or Seq represents a group of resources or literals, possibly including duplicate members, where the order of the members is significant. For example, a Sequence might be used to describe a group that must be maintained in alphabetical order.
rdf:Alt - An Alternative or Alt represents a group of resources or literals that are alternatives (typically for a single value of a property). For example, an Alt might be used to describe alternative language translations for the title of a book, or to describe a list of alternative Internet sites at which a resource might be found. An application using a property whose value is an Alt container should be aware that it can choose any one of the members of the group as appropriate.

For example, a statement about "The resolution was approved by the Rules Committee, having members Fred, Wilma, and Dino" will have the form in triples:

ex:resolution exterms:approvedBy ex:rulesCommittee .
ex:rulesCommittee rdf:type rdf:Bag .
ex:rulesCommittee rdf:_1 ex:Fred .
ex:rulesCommittee rdf:_2 ex:Wilma .
ex:rulesCommittee rdf:_3 ex:Dino .

and this is much better than:

ex:resolution exterms:approvedBy ex:Fred .
ex:resolution exterms:approvedBy ex:Wilma .
ex:resolution exterms:approvedBy ex:Dino .

since these statements say that each member individually approved the resolution.

For further examples RDF Primer.

But containers only say that certain identified resources are members; they do not say that other members do not exist. There is no way to exclude that there might be another graph somewhere that describes additional members.

RDF Collections
RDF provides support for describing groups containing only the specified members, in the form of RDF collections. An RDF collection is a group of things represented as a list structure in the RDF graph.

in RDF/XML a collection is something like this:

<rdf:Description rdf:about="http://e.org/family/349">      
<s:familyMembers rdf:parseType="Collection">
<rdf:Description rdf:about="http://e.org/person/Paolo"/>
<rdf:Description rdf:about="http://e.org/person/Emanuele"/>
<rdf:Description rdf:about="http://e.org/person/Maria"/>
<rdf:Description rdf:about="http://e.org/person/Franco"/>
</s:familyMembers>
</rdf:Description>

this can also be written in RDF/XML by writing out the same triples (without using rdf:parseType="Collection") using the collection vocabulary:

<rdf:Description rdf:about="http://e.org/family/349">
   <s:familyMembers rdf:nodeID="sch1"/>
</rdf:Description>

<rdf:Description rdf:nodeID="sch1">
   <rdf:first rdf:resource="http://e.org/person/Paolo"/>
   <rdf:rest rdf:nodeID="sch2"/>
</rdf:Description>

<rdf:Description rdf:nodeID="sch2">
   <rdf:first rdf:resource="http://e.org/person/Emanuele"/>
<rdf:rest rdf:nodeID="sch3"/>
</rdf:Description>

<rdf:Description rdf:nodeID="sch3">
   <rdf:first rdf:resource="http://e.org/person/Maria"/>
   <rdf:rest rdf:nodeID="sch4"/>
</rdf:Description>

<rdf:Description rdf:nodeID="sch4">
   <rdf:first rdf:resource="http://e.org/person/Franco"/>
   <rdf:rest rdf:resource=
"http://www.w3.org/1999/02/22-rdf-syntax-ns#nil"/>
</rdf:Description>

For more examples RDF Primer.

RDF imposes no "well-formedness" conditions on the use of the collection vocabulary (it is possible, for instance, to define multiple rdf:first elements), thus, RDF applications that require collections to be well-formed should be written to check that the collection vocabulary is being used appropriately, in order to be fully robust. Maybe OWL which can define additional constraints on the structure of RDF graphs, can rule out some of these cases?

OWL and Ordering
OWL have no support for ordering, but the natural constructs from the underlying RDF vocabulary (rdf:List and rdf:nil) are unavailable in OWL-DL because they are used in its RDF serialization. In principle, rdf:Seq is not illegal but it depends on lexical ordering and has no logical semantics accessible to a DL classifier. In other terms: (1) The elements in a container are defined using the relations rdf:_1, rdf:_2, and so on that have no formal definition in RDF. Using them for the purpose of reasoning will require us to define and enforce the properties of these relations. (2) It is not possible to define a container that has elements only of a specific type. (3) For updating a specific element in a container in a remote source, one is forced to transmit the whole container. (4) It is not possible to associate provenance information with the elements in a container [1].

But OWL has greater expressivity than RDF (with constructs such as transitive properties) and reasoning capabilities (for checking consistency and inferring subsumption). Thus, the idea of reasoning with sequential structures in OWL-DL looks appealing.

[1] Vinay K. Chaudhri, Bill Jarrold, John Pacheco. Exporting Knowledge Bases into OWL. OWL Experiences and Directions (OWLED 2006), Athens, Georgia, USA.

Friday, November 21, 2008

SWAN Ontology v. 1.2 almost ready to go

In the last months, I've been busy in developing the new version of the ontology [SWAN Ontology] that represents the "backbone" of the SWAN project. In this iteration, I had two major goals in mind: modularity and provenance. The new ontology is composed by a set of modules, actually the SWAN ontology consists of a collection of ontologies. I think this is an important step for several reasons:

first of all the SWAN ontology is growing in size. Modules can help in managing the increasing complexity
modules can improve the learning process of people that want to approach our ontology for modeling scientific discourse or simply for reusing a part of it
defining modules helped me in thinking a little bit more

The architecture of the SWAN ontology release candidate

As additional feature I was also thinking to provide sub-modules for increasing reuse without asking potential users to write their own subset of the SWAN ontology. Thus, for instance, the Agents ontology is split in different modules that can include or not provenance and/or collections. This is because I assume that not everybody wants to deal with the tedious ordered lists and not anybody needs to define provenance the level we need.

Provenance is one of the major aspects in the semantic web world that we are trying to build with SWAN. Our application is mashing up data coming from different sources and we would like to be able to export the new knowledge product giving credit to the original data provider and declaring which piece of software performed the conversion of such data into our format.

I will speak more in detail about provenance in my next post.

[SWAN Ontology] Ciccarese P, Wu E, Kinoshita J, Wong G, Ocana M, Ruttenberg A, Clark T. The SWAN Biomedical Discourse Ontology. Journal of Biomedical Informatics, in press. PMID: 18583197

Wednesday, March 05, 2008

SWAN - Semantic Web Applications in Neuromedicine [2]

Thus, SWAN is not a like Wikipedia because several "hypotheses" (consistent or inconsistent) can co-exist. to be more precise I would say that the SWAN ontology is an ontology for modeling scientific discourse. Thus, I would define discourse elements as key entities in the SWAN ecosystem. They represent the hubs of the scientific discourse, or in general of the discourse.

Figure 1 - Walsh Hypothesis in the SWAN browser

Looking at fig. 1 it is possible to see the title of the Hypothesis, a description, the authors of such hypothesis (in this case the authors are the authors of the journal article the hypothesis has been derived from). Then, after the journal article used as source of the informatin related to the hypothesis we have the contained discourse elements. Right, a hypothesis can contain a list of discourse elements. In this case we have a list of claims (scientifically proved discourse elements) but it is possible to have in the discourse elements list other hypothesis, research questions or comments...

The SWAN Team: Tim Clark, June Kinoshita, Paolo Ciccarese, Marco Ocana, Gwen Wong, Elizabeth Wu.

Monday, March 03, 2008

SWAN - Semantic Web Applications in Neuromedicine [1]

In the last months, Marco and I have been coding for the SWAN project for Mass General Hospital (Neurology Dept) and Harvard Medical School. The SWAN project is the reason I moved to Boston to work. It is not easy to explain in a few words what SWAN (that stands for Semantic Web Applications in Neuromedicine) does (or it is supposed to do). I could say that 'we are using Semantic Web technologies with the idea of helping the researchers' life' but I understand that this is not really useful.

I'll try to explain it better with an example. When I was a student, I used to create summaries of the lessons integrating my notes with what I was finding in some books. I was using obviously (I am not that young anymore) paper sheets, colors, drawings... and so on. I had my formalism for stressing a definition, a theorem a short summary and whatsoever. It was efficient, I could easily remember the things (I have visual memory) and it was faster than going through the book again and again. This was perfect for a single lesson. But what was happening with an entire year of lectures? With different topics somehow connected each others? Well, I tried to update the things but it was hard and everything was getting terribly messy. At that time, the word processors were really poor and crispy. Now we can think of organizing the things in some electronic documents... better we can use a wiki where several students can cooperate to build faster with less effort. Everybody knows wikipedia right? Nice, we have wiki tools, we can decide our formalism, the meaning of the colors... this works if we have "one truth". Let's say I want to create in Wikipedia a page about a politician and I really dislike him/her (something that occurs often to me). I would probably be aggressive and biased. Somebody else could have a different perspective on the same person... this needs a mediation and rules to follow.

Now, the same perspective can be true in science. When we have hypothesis these are still not confirmed facts. Scientists need to prove them and it is normal to have disagreement. Disagreement is part of the scientific process (and as we are not in the Middle Ages we don't risk our life saying something 'different'... I guess). In science disagreement can be a real value.

SWAN is not Wikipedia, it is in some perspective the opposite of it. In SWAN, several 'truths' or better 'hypotheses' (consistent or not) can exist at the same time... inconsistencies can be both declared and inferred (nice uh?). In SWAN we can build the map of science (well, a part of it)... (TO BE CONTINUED)

Thursday, February 14, 2008

Classes which are things and classes 'about' things

One of the most interesting distinctions that I keep always in mind when I create an ontology is what is representing a 'real thing' and what is 'talking about a real thing'. Let's consider an example related to bioinformatics. I want to create an ontology which is modeling proteins. Nowadays there are different sources where we can find information about proteins. If we are building a system performing data integration, we probably don't want to copy all the data belonging to those sources in our knowledge base. It is more correct to build references, sort of records that are pointing to the original source when the user/system wants to know more. What we are building are records, entities that 'talk about' real things like proteins. Vice versa, if we want to provide content about proteins (i.e. providing proteins variants) we would probably model the real things, the actual proteins. It doesn't really make sense to say that a record 'hasVariant' another record. Maybe a record 'refersToVariantRecord' or something like that.

But why all this? Well, this is helping in building the models. Let's say that I want to define a 'authoredBy' property for a scientific article. Now, if I consider the real thing (i.e. the actual article) I can say 'authoredBy' but if i am building a record of the article (a reference like the ones that PubMed does) and I say 'authoredBy' am I referring to the article or to the record? As ontologies are meant to define semantic.... I guess this is a crucial point.

Wednesday, February 13, 2008

Making ontologies

In the last years, I have been creating some ontologies for different purposes. When I started, I've been investigating several languages and I ended up to use RDF (Resource Description Framework). Not really for the expressiveness, that is quite limited, but more because it was possible to find more examples about its usage. It is true, specifications have been published by W3C but still, I believe in examples. Thus, I started creating what I would define data schemas in RDF. The idea was simple and the goal was not to use reasoners but to express something semantically. And for simple things it was ok.

But when the ontologies started to grow, I started to feel the need of doing something better defined. And it was time for OWL (Web Ontology Language) and reasoners (at least to check consistency not yet to infer classes). Now, if you are doing a pure ontology definition exercise is fine, but when you have to produce real applications I would say "good luck". First, it is really hard to find good exhaustive examples on owl usage. I mean you can find stuff here and there, but nothing well organized and well described. Sure you have the W3C specifications and document but there's no cookbook nor best practice described out there. Nothing explaining with some examples how to create modular ontologies (ontologies that actually reuse other ontologies not redefining them completely in the very same file) keeping in mind open and close world.

And when you look at online ontologies, you can find all sort of things (meta classes inserted in a hierarchy of real things classes, consistency failures, terms coming from WordNet used as classes in the hierarchy of real things, real things and records with no distinctions). It is really hard to learn out of this.