Kniblet Tutorial Part 6

From n² wiki

Jump to: navigation, search

<< Part 5 | Index | Part 7 >>

Cleaning Up

Now that we have the capability to add and edit articles you're probably finding that your data is getting a bit messy and could do with a cleanup. One way to do this is to reset the store and reload the data. The Talis Platform supports this kind of bulk operation via what it calls jobs. A job is an operation that is scheduled to happen some time in the future. Each store has a queue of pending jobs which can be accessed using a URL like:

  http://api.talis.com/stores/kniblet-dev1/jobs

You need to supply your username and password to view the job queue. Adding a job to the queue involves create a Job Request and POSTing it to the queue. A Job Request consists of a small quantity of RDF/XML describing the type of job needed and the time at which it should start. The Platform will endeavour to invoke the job as soon as possible after the start time but depending on load this could be quite a few minutes after.

Resetting a Store

One job type supported by the Platform is the ResetDataJob which clears out a store's metabox and contentbox. We can use this to remove all the RDF from our store and then re-post our bootstrap data. Our job request looks like this:

<rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:bf="http://schemas.talis.com/2006/bigfoot/configuration#" > 
  <bf:JobRequest>
    <bf:jobType rdf:resource="http://schemas.talis.com/2006/bigfoot/configuration#ResetDataJob"/>
    <bf:startTime>2008-05-02T14:30:00Z</bf:startTime>
  </bf:JobRequest>
 </rdf:RDF>

The job request describes a ResetDataJob that we'd like to run after 2:30pm on 2nd May 2008. We can POST that to our store's job queue with a CURL command:

  curl -v -d @request.rdf --digest -u "user:password" -H content-type:application/rdf+xml http://api.talis.com/stores/kniblet-dev1/jobs

Once the job has run it disappears from the job queue. The store should be empty and we can start reloading our data.

Using Snapshots

If you have more than a small amount of bootstrapping data then POSTing it back in after a reset job has run can get tedious. This is especially true when you have binary content in the store too. The Platform supports a pair of bulk operations that make this process even easier: snapshot and restore. The snapshot job creates a copy of all the RDF and binary content in the store and makes it available for downloading. The restore job can take any snapshot and load it into the store, replacing any data already there. In effect it acts like the reset and reload we were doing in the previous section.

A snapshot job looks like this:

<rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:bf="http://schemas.talis.com/2006/bigfoot/configuration#" > 
  <bf:JobRequest>
    <bf:jobType rdf:resource="http://schemas.talis.com/2006/bigfoot/configuration#SnapshotJob"/>
    <bf:startTime>2008-05-02T14:30:00Z</bf:startTime>
  </bf:JobRequest>
 </rdf:RDF>

When this job runs it dumps all the RDF and all the binary content contained in the store into a snapshot file which is then made accessible via the store's snapshot collection:

  http://api.talis.com/stores/kniblet-dev1/snapshots

The snapshot is just a standard UNIX tar file containing a set of folders corresponding to the store's contentbox and metabox. Snapshots can be downloaded for offsite backup or left online for easy restoration. Larger stores will take longer to snapshot and while snapshotting is in process the store will be put into read-only mode. This means that any HTTP interactions that would normally alter data in the store will instead return a HTTP status of 503 Unavailable. Applications need to be aware of this and report an appropriate message back to their users.

We can restore an earlier snapshot with another job:

<rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:bf="http://schemas.talis.com/2006/bigfoot/configuration#" > 
  <bf:JobRequest>
    <bf:jobType rdf:resource="http://schemas.talis.com/2006/bigfoot/configuration#RestoreJob"/>
    <bf:snapshotUri rdf:resource="http://api.talis.uat/stores/mystore/snapshots/20071112155923.tar" />
    <bf:startTime>2008-05-02T14:30:00Z</bf:startTime>
  </bf:JobRequest>
 </rdf:RDF>

A restore job uses an extra property bf:snapshotUri to specify the location of the snapthot to be restored.

This combination of snapshot and restore can be used to return a store to a known state which could be useful for writing application unit tests. Starting with an empty store we can simply load all the data we need and then take a snapshot. Our application's unit test suite can then restore this snapshot before every run to ensure that the tests are accurate.

Moriarty Support

Moriarty provides special support for these bulk operations. The JobQueue class contains methods for each type of job. The typical pattern of usage looks like this:

$store = new Store('http://api.talis.com/stores/mystore', new Credentials('user', 'mypassword') );
$queue = $store->get_job_queue();
$queue->schedule_reset_data();

To create a snapshot simply replace the call to schedule_reset_data() with the following code:

$queue->schedule_snapshot();

Restoring a snapshot works similarly, passing in the URI of the snapshot:

$queue->schedule_restore("uri of existing snapshot");

All of these schedule methods can accept an optional time parameter which indicates the time at which the job should run. If omitted they default to using the current time and date which means the job will run as soon as the Platform can schedule it.

Summary

In this part of the tutorial we covered:

  • Bulk operations are performed with jobs
  • Stores have a job queue containing pending jobs
  • Jobs run as soon as the Platform can schedule them after their start times
  • Use a ResetDataJob to clear all the RDF and content out of a store
  • Use a SnapshotJob to make a copy of all the RDF and content in a store
  • Stores have a collection of snapshots
  • Snapshots are UNIX tar files containing all the data held in a store
  • Snapshots can be loaded into a store using a RestoreJob
  • Moriarty provides simple methods for scheduling jobs



<< Part 5 | Index | Part 7 >>

Personal tools