Thursday, February 14, 2008

Classes which are things and classes 'about' things

One of the most interesting distinctions that I keep always in mind when I create an ontology is what is representing a 'real thing' and what is 'talking about a real thing'. Let's consider an example related to bioinformatics. I want to create an ontology which is modeling proteins. Nowadays there are different sources where we can find information about proteins. If we are building a system performing data integration, we probably don't want to copy all the data belonging to those sources in our knowledge base. It is more correct to build references, sort of records that are pointing to the original source when the user/system wants to know more. What we are building are records, entities that 'talk about' real things like proteins. Vice versa, if we want to provide content about proteins (i.e. providing proteins variants) we would probably model the real things, the actual proteins. It doesn't really make sense to say that a record 'hasVariant' another record. Maybe a record 'refersToVariantRecord' or something like that.

But why all this? Well, this is helping in building the models. Let's say that I want to define a 'authoredBy' property for a scientific article. Now, if I consider the real thing (i.e. the actual article) I can say 'authoredBy' but if i am building a record of the article (a reference like the ones that PubMed does) and I say 'authoredBy' am I referring to the article or to the record? As ontologies are meant to define semantic.... I guess this is a crucial point.

No comments: