Thursday, February 14, 2008

Classes which are things and classes 'about' things

One of the most interesting distinctions that I keep always in mind when I create an ontology is what is representing a 'real thing' and what is 'talking about a real thing'. Let's consider an example related to bioinformatics. I want to create an ontology which is modeling proteins. Nowadays there are different sources where we can find information about proteins. If we are building a system performing data integration, we probably don't want to copy all the data belonging to those sources in our knowledge base. It is more correct to build references, sort of records that are pointing to the original source when the user/system wants to know more. What we are building are records, entities that 'talk about' real things like proteins. Vice versa, if we want to provide content about proteins (i.e. providing proteins variants) we would probably model the real things, the actual proteins. It doesn't really make sense to say that a record 'hasVariant' another record. Maybe a record 'refersToVariantRecord' or something like that.

But why all this? Well, this is helping in building the models. Let's say that I want to define a 'authoredBy' property for a scientific article. Now, if I consider the real thing (i.e. the actual article) I can say 'authoredBy' but if i am building a record of the article (a reference like the ones that PubMed does) and I say 'authoredBy' am I referring to the article or to the record? As ontologies are meant to define semantic.... I guess this is a crucial point.

Wednesday, February 13, 2008

Making ontologies

In the last years, I have been creating some ontologies for different purposes. When I started, I've been investigating several languages and I ended up to use RDF (Resource Description Framework). Not really for the expressiveness, that is quite limited, but more because it was possible to find more examples about its usage. It is true, specifications have been published by W3C but still, I believe in examples. Thus, I started creating what I would define data schemas in RDF. The idea was simple and the goal was not to use reasoners but to express something semantically. And for simple things it was ok.

But when the ontologies started to grow, I started to feel the need of doing something better defined. And it was time for OWL (Web Ontology Language) and reasoners (at least to check consistency not yet to infer classes). Now, if you are doing a pure ontology definition exercise is fine, but when you have to produce real applications I would say "good luck". First, it is really hard to find good exhaustive examples on owl usage. I mean you can find stuff here and there, but nothing well organized and well described. Sure you have the W3C specifications and document but there's no cookbook nor best practice described out there. Nothing explaining with some examples how to create modular ontologies (ontologies that actually reuse other ontologies not redefining them completely in the very same file) keeping in mind open and close world.

And when you look at online ontologies, you can find all sort of things (meta classes inserted in a hierarchy of real things classes, consistency failures, terms coming from WordNet used as classes in the hierarchy of real things, real things and records with no distinctions). It is really hard to learn out of this.