Tuesday, February 15, 2011

Principle: Adequate Documentation [1]

This is trivial to understand. The documentation for an ontology is very important for adoption and for data sharing. This works for software as well. One thing I found always awesome regarding Java was the quality of the API documentation, the JavaDocs are cool and easy to generate. For ontologies the reality is a bit more complicated.

Some ontologies, including some of those I made, are just the pure RDFS/OWL file. Often with no descriptions whatsoever of the classes/properties. The understanding of those ontologies relies totally on the ability of the users to interpret the labels/names correctly. Other ontologies include very fuzzy, sometimes poor, descriptions - "Document: a document. The Document class represents those things which are, broadly conceived, 'documents'.". The interpretation relies mostly on the users' common sense. To be fair, in recent versions of the FOAF vocabulary, the previous definition is accompanied by a note saying: "We do not (currently) distinguish precisely between physical and electronic documents, or between copies of a work and the abstraction those copies embody. The relationship between documents and their byte-stream representation needs clarification". And let's be honest, defining what a document is nowadays, in the digital era, is not trivial. Is every file a document?

In the case of OBO Foundry, a set of principles has been defined. Here are some of them:
"The ontologies include textual definitions for all terms. Many biological and medical terms may be ambiguous, so terms should be defined so that their precise meaning within the context of a particular ontology is clear to a human reader."
"The ontology is well documented."

The thing is, definitions are a necessary step but that is not enough. When I talk about 'Adequate Documentation' I mean many different things: definitions, examples of use cases, examples of resulting triples, motivations, related projects... In other words, a good amount of shared knowledge about the ontology and the process that generated it.

Unfortunately there are no clear rules, I keep trying different ways of translating the most tacit knowledge I can into explicit and I can't say I found the right recipe yet. Definitions can certainly help, I find valuable to include explanations of the ontology building process where motivations behind the different choices are given, explanatory figures, plenty of examples with actual triples, maybe a list of Frequently Asked Questions where the authors can publicly address some of the concerns of real users. All this takes time and effort, and, by personal experience, can also cause collateral damage...

No comments: