Primary Key Pattern

From n² wiki

Jump to: navigation, search

Nearby: Guides and Tutorials

This is a discussion of a pattern for faking primary keys in a store's metabox.

When creating a URI for a resource in Bigfoot you can simply post a changeset with a blank subject of change. The Platform will dutifully create a URI for the subject and guarantee that it is unique and not used for any other purpose. However, if you want to maintain your own URI structure (perhaps because you want [URIs]) then you need a way of ensuring that any URI you choose is new and unique.

In a traditional relation database this isn't usually a problem - just use an autoincrement primary key. In many circumstances it's possible to derive a URI from some part of the description, perhaps a reference number or title for a document. If you want simple URIs like:

 http://example.com/things/1

then you have to work a bit harder.

The issue is that the Platform doesn't give many atomicity guarantees which means that there are likely to be contention problems when creating unique keys. For example one (incorrect) strategy might be to look for the highest number allocated, increment it and then use that as the identifier. The problem is that to look up the highest number requires a Sparql query. Then to increment the number requires a changeset to be posted. In a single-user environment this would probably work, but when you have many thousands of users then it's likely that the number is gets updated between the read and the write. A naive application might simply retry the read/write combination until it works which could take a long time if there is a lot of contention on the key .

All is not lost though because Bigfoot makes one atomicity guarantee that can be used in this scenario: changesets containing removals and additions will not be applied unless all the triples to be removed already exist. This can be used to perform a conditional update.

The primary key pattern works like this: a pool of available numeric identifiers is added to the system before the application first runs. A client needing a numeric identifier issues a Sparql query to fetch an arbitrary 5 numbers from the pool. It selects one at random and generates a changeset that removes that number from the pool and adds a new number calculated by adding the size of the pool to the selected number. If the number is still available the changeset is applied, the selected number is removed from the pool and the replacement identifier is added. If the number is not available then the changeset will fail and no alterations will be made to the pool. The client can select another of the 5 initial candidates or ask for a fresh batch.

Suppose the pool size was 100, i.e. the numbers 1 to 100 have been added as available identifiers. The client asks for 5 identifiers and is given 5, 11, 52, 53 and 90. It selects 52 and generates a changeset removing 52 from the pool and adding 152 in its place. If 52 had been used between the initial selection of candidates and the update then the changeset would fail and the client could select another candidate.

The pool always remains with 100 available identifiers which reduces the contention on the primary key generation. The pool size can be tuned to the expected level of activity expected.

Moriarty contains a ValuePool class implementing this pattern.

Personal tools