Friday, October 17, 2014

When SPARQL query length is an issue

While developing Annotopia I wrote some code to create dynamically SPARQL queries servicing a faceted search. According to the facets values, the queries can become extremely long... until I hit the limit:

Virtuoso 37000 Error SP031: SPARQL: Internal error: 
       The length of generated SQL text has exceeded 10000 lines of code
It seems that the SPARQL compiler stops because the SQL compiler, the successor in the processing pipeline, will fail to compile it in any reasonable time. After initial surprised reaction I started to dig deeper in the structure of my queries. Here is what I learned.

Use the FILTER + IN construct instead of multiple UNIONs


It might result simpler, when writing code, to dynamically compose a query with lists of UNIONs. Unfortunately that translates in much longer SQL queries. So this:

        
        { ?s oa:serializedBy <urn:application:domeo> }
        UNION
        { ?s oa:serializedBy <urn:application:utopia> }

for multiple items, should become:

        { ?s oa:serializedBy ?serializer .
            FILTER ( ?serializer IN 
               (<urn:application:domeo>, <urn:application:utopia>) )
        }

No comments: