Monday, January 31, 2011

Annotation and Content Improvement

I was recently attending the workshop 'Beyond the PDF' in San Diego and I noticed multiple times how the concept of 'Annotation' is often intended as a task performed after publication of a physical or digital document.

I consider Annotation to be more ubiquitous and important at all stages: before, during and after publication. Also, Annotation is not only about classic textual document. Images, database records and data-sets can be annotated as well. Even physical objects can be digitally annotated when we create a correspondent digital record or - speaking in terms of ontologies - when we refer to the representation of that particular instance of a certain class.

Annotation can exist as such forever or can be incorporated back in the original document/resource or a new version of the original document/resource. If you think at the old fashion paper encyclopedia, every year - or bunch of years - the editor was collecting the several annotations to come up with a new edition of the heavy volumes. This was very close to what in the digital world is called versioning.

In the modern digital world annotation is everywhere. Tags attached to a document are annotation. Leveraging crowdsourcing makes possible to include the most popular tags as keywords for that document. Delicious users are experiencing this anytime they are in the process to tag a new resource and they receive suggestions of popular or appropriate tags. Reviews of catalog items in Amazon are annotations and the statistical analysis of such results appears close to the selected item under the form of stars. To some extents, edits in a Wiki can be seen as annotation - and could be exported as such - where a user changes the current document content. However,  I understand using the term Annotation for edits might sound a bit of a stretch.

Maybe, in today digital world, a better way to refer to this process is 'Content refinement' as everything can potentially be 'changed'. But even the term refine might fall short as 'to refine' means improving by making small changes. Sometimes edits are massive and an article in Wikipedia can evolve dramatically in time. It is not simply polishing and fixing, we can add/remove big chunks of the original documents - adding missing items or removing items that are redundant or not valid anymore - or can make the original document more actual - for instance adding new evidence that was not available when the document has been previously published. 'Content Improvement' is probably generic enough to cover refinements and edits.

Sure, I am talking of evolving documents but it does not preclude to take snapshots of it in a 'traditional publication' or in a version of that resource. Take online news. I realized more than once that the news at a specific URL was changing and journalists were incrementally adding new sections at the bottom of the page whenever the new updates were available. You might argue this is not good practice but it happens more often than what you think.  The reason is simple: in the digital world, it is possible and cheap. We don't have to reprint a book or to add an errata page to avoid reprinting. We just create a note or directly edit the content - hopefully while keeping track of the changes.

I see many attempts to redefine what a publication is. These days, I believe publication is a multidimensional evolving artifact including images, videos, live tables, data, metadata... and no matter what it includes or what it looks like, it has to manage change or content improvement. Only snapshots of it, at particular times, would match the 'classic' concept of publication.