Ttracker
From n² wiki
Contents |
[edit] Ttracker
Web Page Traffic Analytics
[edit] Overview
A service which provides statistics about site visits (and data about the pages themselves).
A small piece of code is inserted into any page to track. Whenever someone visits the page with a standard browser, details of the visit are passed into a Talis Platform store via a PHP service. The data in the store can be then be queried to provide reports about visitor's activity.
See also : Ttracker How It Works
[edit] Status
In short, the data acquisition side of the application is working and virtually complete. So far there is only minimal material ready on the reporting/visualization side.
2010-02-12
- Moving docs to n2 Wiki from http://hyperdata.org/wiki/wiki/Ttracker
- Simply query/display working Ttracker Demo
- Data-gathering working
- collects visited page URI, visitors IP address, referer, user-agent, datetime
- extracts any data embedded in the visited page: dc, erdf, openid, microformats, rdfa (this happens once only, when the page gets its first hit)
- Saves to a Talis Platform store (as well as a custom Apache2 error log - note you might need to refresh the log page, and there may be debugging junk in there)
- Code commented and explanation started at Ttracker How It Works also Notes on Cross-Domain Ajax
- CGI Environment Variables vocab created
- Marker script tested on local and remote domains (including this Wiki's template /usr/lib/python2.5/site-packages/Trac-0.11.4-py2.5.egg/trac/templates/theme.html )
- Confirmed operation for Firefox, Opera and IE
- demo, source (demo is in samples dir), SPARQL endpoint
- Key pages used during research tagged del.icio.us ttracker
[edit] Sample of RDF generated for a page hit
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:ns0="http://www.w3.org/2006/http#"
xmlns:ns1="http://purl.org/stuff/cgi#"
xmlns:ns2="http://purl.org/stuff/hsh#"
xmlns:dct="http://purl.org/dc/terms/">
<rdf:Description rdf:nodeID="request">
<rdf:type rdf:resource="http://www.w3.org/2006/http#Request"/>
<ns0:requestURI rdf:resource="http://danny.ayers.name/test.html"/>
<ns1:remoteAddr>79.9.5.104</ns1:remoteAddr>
<ns2:referer rdf:resource="http://danny.ayers.name/test.html"/>
<ns2:agent>Opera/9.64 (X11; Linux x86_64; U; en) Presto/2.1.1</ns2:agent>
<dct:date>2009-08-02T07:47:36Z</dct:date>
</rdf:Description>
</rdf:RDF>
[edit] Sample of data extracted from a page
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:ns0="http://purl.org/dc/elements/1.1/">
<rdf:Description rdf:about="http://hyperdata.org/ttracker/samples/page2.html">
<ns0:format>text/html; charset=UTF-8</ns0:format>
<ns0:title>Page Two</ns0:title>
<rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Document"/>
</rdf:Description>
</rdf:RDF>
[edit] Sample Queries
SPARQL endpoint : http://api.talis.com/stores/danja-dev1/services/sparql
Info about requests:
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix http: <http://www.w3.org/2006/http#>
prefix ns1: <http://purl.org/stuff/cgi#>
prefix dct: <http://purl.org/dc/terms/>
select ?s ?p ?o
where {
?s a http:Request .
?s ?p ?o .
}
Info about pages:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?s ?p ?o
WHERE {
?s a foaf:Document ;
?p ?o .
}
Evidence most popular browsers are supported:
prefix hsh: <http://purl.org/stuff/hsh#>
prefix h: <http://www.w3.org/2006/http#>
select distinct ?uri ?agent
where {
?s h:requestURI <http://danny.ayers.name/test.html> .
?s hsh:agent ?agent .
}
Get all request info from a specific domain:
PREFIX h: <http://www.w3.org/2006/http#>
SELECT DISTINCT ?p ?o
where {
[ h:requestURI ?uri ;
?p ?o ].
FILTER regex(str(?uri), "^http://dannyayers.com", "i")
}
Get all visited pages and their number of visits:
PREFIX h: <http://www.w3.org/2006/http#>
SELECT ?uri ( count(?uri) AS ?count ) WHERE {
[ h:requestURI ?uri ].
}
GROUP BY ?uri
HAVING (?count > 1)
ORDER BY DESC(?count)
[edit] Next Steps
(write up between steps)
- Figure out better way of tracking client than IP, ideally without using cookies
- Grab more data
- Simple SPARQL
- Figure out pre-post-to-store caching strategy
- Simple reporting via SPARQL plus XML/XSLT and/or JSON/Javascript
later...
- blog
- live deployment (on hyperdata.org for starters)
- hook up to Piwik or other existing reporting widgets

