Platform Release 5 Notes
From n² wiki
Nearby: Platform Releases
These are notes on new features and bug fixes that are planned for release 5, released to internal staging on 18th September 2007 and due to go to live on 27th September.
Contents |
Enhanced Augmentation Services
Each store in the platform provides an augmentation service which can be used to augment/annotate search results from other stores (or, in fact, any RSS 1.0 feed). Up until this release the augmentation rules have been specified on a store-by-store basis by the platform administrator. From this release on each store will have a default augmentation strategy which looks for any URIs in the input and adds any triples it knows about that URI into the result. As an example suppose you have a store full of RDF schema information and another store with some arbitrary RDF instance data. You could take the results of a search on the instance data and pass it into the schema store's augmentation service. With the changes introduced in release 5 the schema store would automatically annotate each class and property it found with triples such as labels and comments. These could then be used to build a nicer user interface display for the results. (Related Jira issues: jira:CAL-127)
Performance Enhancements
We are continually improving our performance testing capabilities for platform releases and are now at the point where we can make direct comparisons on a range of performance criteria between versions. We used some of this information to make some significant performance improvements for this release. In terms of transactions per second, contentbox searching has improved 25% and posting of changesets has improved by over 100% when compared to the previous release. Performance testing is firmly embedded in our development and release cycle and we hope to see further performance improvements in future releases as we continue to analyse and evaluate our test results.
Read-only Stores
From release 5, the platform administrators can set stores to be read-only. This is in preparation for the upcoming feature to enable stores to be snapshotted for backup purposes. The snapshot process puts the store into a read-only state while it runs to ensure that the snapshot is consistent. While a store is in the read-only state, all mutating HTTP interactions will return a 503 Service Unavailable status. Applications should check for this status code when attempting to modify the data or configuration of a store and take appropriate action such as informing the user. Where possible a Retry-after header is sent by the platform along with the 503 response which could be used by applications to determine whether it is worth attempting the submission after a period of time or to simply fail the request. (Related Jira issues: jira:CAL-368)
Expensive SPARQL Queries
We want to allow open SPARQL access to as many stores as possible, but certain types of SPARQL queries are expensive to execute. Poorly written queries involving cross products (e.g. when there are query patterns that do not share variables) or overly broad selection criteria can result in many hundreds of thousands of triples being selected. However, these large queries make it harder for the platform to maintain an even level of service for all stores. We have therefore introduced a limit on the number of triples that can be selected from a store. Currently this limit stands at 50,000 but we plan to monitor and review this limit. Some types of queries select more results that you might otherwise expect even when you use limit clauses. For example, using order by or filter can sometimes require all the possible triples be evaluated before any limits are applied which, for large stores, can hit the imposed limit quite easily. As suggested by the SPARQL Protocol for RDF working draft the HTTP response code for a refused query will be 500. As usual, we also include an appropriate plain text message describing the nature of the refusal. (Related Jira issues: jira:CAL-399)
Improved SPARQL Support
Our SPARQL services have been upgraded to use ARQ 2.0 making them more compliant with the current SPARL working draft. However since SPARQL is still under development the behaviour of some queries will change over time. So far we have found only one significant difference introduced by our upgrade. In earlier versions of ARQ the following two queries were treated equivilently, whereas in the latest version they give diferent answers:
ASK
{
?x ex:foo ?y .
{
FILTER ( ISIRI(?y) )
}
}
ASK
{
?x ex:foo ?y .
FILTER ( ISIRI(?y) )
}
The reason is around empty group patterns and the scope of filters. We are still watching for other subtle changes and will post them on the wiki when we find them, but let us know if you find any too.
Blank Nodes Eliminated in Versioned Changesets
When a changeset is posted to a Metabox Changesets Collection the usual changeset handling rules are applied, and the submitted changeset is also stored in the metabox. We recently discovered that this process was leaving blank nodes in the metabox which makes those triples impossible to address using changesets. That means short of clearing the whole store there was no way of removing or altering those triples. From this release we replace these blank nodes with URIs. (Related Jira issues: jira:CAL-316)
Other Changes
This release also fixes an issue that caused duplicate information to be returned from searches when a field predicate map contains mappings of different properties to the same field name. (Related Jira issues: jira:CAL-314)

