Kniblet Tutorial Part 7
From n² wiki
Refining Search
As we add more articles and try to search for them we soon notice that only the title is being indexed for searching. Stores use two types of configuration to this searching.
Field/Predicate Maps control how data is indexed and how RDF predicates should be mapped to short names for easy field-based searching. This lets us search just in the title of a record by prefixing the search term with the field name title: for example
title:cat
would only search the dc:title predicate in our data. If no field name is specified then the store uses a Query Profile to map the search terms to specific short names. The Query Profile also specifies what weight each field carries. We can use this to adjust the relevance calculations for our search applications.
Field/Predicate Maps
A Field/Predicate Map, as the name might suggest, maps predicate to short field names. It lets us refer to http://purl.org/dc/elements/1.1/title simply as title which makes searching more broadly accessible. A typical Field/Predicate Map might look like the following:
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:bf="http://schemas.talis.com/2006/bigfoot/configuration#" xmlns:frm="http://schemas.talis.com/2006/frame/schema#" xml:base="http://api.talis.com/stores/kniblet-dev1/" > <bf:FieldPredicateMap rdf:about="config/fpmaps/1"> <frm:mappedDatatypeProperty> <rdf:Description rdf:about="config/fpmaps/1#title"> <frm:property rdf:resource="http://purl.org/dc/elements/1.1/title"/> <frm:name>title</frm:name> </rdf:Description> </frm:mappedDatatypeProperty> <frm:mappedDatatypeProperty> <rdf:Description rdf:about="config/fpmaps/1#description"> <frm:property rdf:resource="http://purl.org/dc/elements/1.1/description"/> <frm:name>desc</frm:name> </rdf:Description> </frm:mappedDatatypeProperty> </bf:FieldPredicateMap> </rdf:RDF>
This Field/Predicate Map maps two predicates to short names. http://purl.org/dc/elements/1.1/title maps to title and http://purl.org/dc/elements/1.1/description maps to desc. That means our store could support queries like the following:
title:cat desc:cat title:cat desc:stray
We want to replace the default Field/Predicate Map with one that maps our body predicate. We need something like:
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:bf="http://schemas.talis.com/2006/bigfoot/configuration#" xmlns:frm="http://schemas.talis.com/2006/frame/schema#" xml:base="http://api.talis.com/stores/kniblet-dev1/" > <bf:FieldPredicateMap rdf:about="config/fpmaps/1"> <frm:mappedDatatypeProperty> <rdf:Description rdf:about="config/fpmaps/1#title"> <frm:property rdf:resource="http://purl.org/dc/elements/1.1/title"/> <frm:name>title</frm:name> </rdf:Description> </frm:mappedDatatypeProperty> <frm:mappedDatatypeProperty> <rdf:Description rdf:about="config/fpmaps/1#body"> <frm:property rdf:resource="http://schemas.talis.com/kniblet/body"/> <frm:name>body</frm:name> </rdf:Description> </frm:mappedDatatypeProperty> </bf:FieldPredicateMap> </rdf:RDF>
We can simply use a command line cURL instruction to PUT this in the right place. Our field/predicate map is at
http://api.talis.com/stores/kniblet-dev1/config/fpmaps/1
We need to issue an HTTP PUT against that URI using a content-type of application/rdf+xml
curl -v -T fpmap.rdf --digest -u "user:pass" -H content-type:application/rdf+xml http://api.talis.com/stores/kniblet-dev1/config/fpmaps/1
Moriarty includes support for fetching, editing and updating Field/Predicate Maps. To replace a store's map you would use code like this:
$fp = new FieldPredicateMap("http://api.talis.com/stores/kniblet-dev1/config/fpmaps/1"); $fp->add_mapping('http://purl.org/dc/elements/1.1/title', 'title'); $fp->add_mapping('http://schemas.talis.com/kniblet/body', 'body'); $response = $fp->put_to_network(); if ( $response->is_success() ) { // do something with qp... }
Reindexing
One caveat to be aware of is that changes to Field/Predicate Maps are not automatically applied to existing data. They do apply to new data added after the Field/Predicate Map has been updated. There are a couple of ways around this. For development purposes it's probably sufficient to reload the data using a snapshot. This data will be indexed using the new mappings. If that isn't practical, such as in a live application then you can request that the store be reindexed.
Like restoring a snapshot this is a bulk operation so we use a Job to invoke it. The ReindexJob request looks like this:
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:bf="http://schemas.talis.com/2006/bigfoot/configuration#" > <bf:JobRequest> <bf:jobType rdf:resource="http://schemas.talis.com/2006/bigfoot/configuration#ReindexJob"/> <bf:startTime>2008-05-21T08:10:00Z</bf:startTime> </bf:JobRequest> </rdf:RDF>
We can simply POST it to the store's job queue and wait for it to be executed.
We can test to see if our changes took effect by searching for text that only occurs in the body of an article, e.g.
body:feed
If all is working well then we should see a list of articles that contain the word feed somewhere in their body.
Query Profiles
As mentioned earlier Query Profiles control the relative weights fields have when the user searches without any field specifiers. When the Platform receives a search term that is not prefixed by a field it uses the query profile to rewrite the query behind the scenes. A simple query like:
cat
might get rewritten internally to:
title:cat body:cat
Queries that contain a mix of prefixed and unprefixed terms will only have their unprefixed ones rewritten. So
cat title:feeding
might become:
title:cat body:cat title:feeding
The query profile controls which fields are used in this rewriting process and whether they should have a greater influence in the relevance ranking. The fields in a query profile must have been defined by the store's field/predicate map. There's no requirement that all the fields in the field/predicate map need to also appear in the query profile.
Here's what out query profile should look like:
<?xml version="1.0" encoding="utf-8"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:bf="http://schemas.talis.com/2006/bigfoot/configuration#" xmlns:frm="http://schemas.talis.com/2006/frame/schema#" xml:base="http://api.talis.com/stores/kniblet-dev1/"> <bf:QueryProfile rdf:about="config/queryprofiles/1"> <bf:fieldWeight> <rdf:Description rdf:about="config/queryprofiles/1#title"> <bf:weight>2.0</bf:weight> <frm:name>title</frm:name> </rdf:Description> </bf:fieldWeight> <bf:fieldWeight> <rdf:Description rdf:about="config/queryprofiles/1#body"> <bf:weight>0.5</bf:weight> <frm:name>body</frm:name> </rdf:Description> </bf:fieldWeight> </bf:QueryProfile> </rdf:RDF>
Like we did with the field/predicate map we can use cURL to send it to the platform. Our query profile is at
http://api.talis.com/stores/kniblet-dev1/config/queryprofiles/1
We need to issue an HTTP PUT against that URI using a content-type of application/rdf+xml
curl -v -T qp.rdf --digest -u "user:pass" -H content-type:application/rdf+xml http://api.talis.com/stores/kniblet-dev1/config/queryprofiles/1
Once again Moriarty can make this easy:
$qp = new QueryProfile("http://api.talis.com/stores/kniblet-dev1/config/queryprofiles/1"); $qp->add_field_weight('title', '2.0'); $qp->add_field_weight('body', '0.5'); $response = $qp->put_to_network(); if ( $response->is_success() ) { // it's all ok } else { // report error }
Exploring Search
Now we have our data being indexed appropriately and we have assigned weightings to fields we can explore the search syntax a little more. The syntax currently supports both multiple and single character wildcard searches using the "*" and "?" symbols respectively.
The single character wildcard search looks for terms that match that with the single character replaced. For example, to search for "text" or "test" you can use the search:
te?t
Multiple character wildcard searches looks for 0 or more characters. For example, to search for test, tests or tester, you can use the search:
test*
You can also use the wildcard searches in the middle of a term.
te*t
Note: You cannot use a * or ? symbol as the first character of a term. In addition, placing a wildcard too close to the start of a term may result in an a query error due to the current implementation of wildcard searches. If this happens, and where possible, make your query more specific by moving the wildcard further down the term.
You can search for phrases by enclosing in quuotes:
"cat grooming"
The "+" or required operator requires that the term after the "+" symbol exist somewhere
For example the query:
+title:cat title:grooming
searches for items where the title must contain the term "cat" and may contain the term "grooming"
Certain characters need to be escaped by prefixing with a \ before they can be used in a search. For example to search for http://example.com/things?id=foo:bar use the query:
http\://example.com/things\?id=foo\:bar
The current list of special characters are
+ - && || ! ( ) " * ? : \
Summary
- Search behaviour can be configured for each store
- Field/Predicate maps control how data is indexed and map property URIs to short field names
- Query Profiles map unprefixed queries to prefixed ones with specific weights
- Updating both is done using PUT on their URIs
- Updating Field/Predicate map doesn't automatically reindex store
- Reindexing can be requested using a ReindexJob
- Search syntax supports wildcards, phrases and required terms

