“Getting Real” with RDF and SPARQL

he “Getting Real” approach by Web 2.0 poster child 37signals reverses the usual development process (from model to code to user interface) by going “from brainstorm to user interface sketches to HTML to coding.” Principles such as, “Don’t write a functional specification,” or “Essentials only” can help developers stay focused and thus reduce the time-to-launch.

The Resource Description Framework (RDF) supports and accelerates many code-oriented aspects of Getting Real. For illustration, this article describes how to create dooit, a simple to-do list manager (a nice type of software that is usually prone to feature creep during the design phase). You can try an online demo and download the source for this application.

Pre-coding Steps and System Setup
The Getting Real process starts by formulating the overall application idea, followed by identifying the core feature set. In dooit’s case, this includes a tool to add, edit, and tick off taggable to-do items. The second step is to create paper sketches, and after that, static HTML screens are created (see Figure 1).

 
Figure 1. From Sketch to Mockup: The paper designs are turned into HTML mockups to get a first impression of the application’s look and feel as soon as possible.

When you are satisfied with the HTML mockups, you continue by setting up the backend and then start programming. Instead of a conventional web framework, dooit was programmed with Trice, which is an RDF-based system. You can download the source code and reproduce the steps described in the following paragraphs (please see the readme.txt in dooit.zip for setup instructions, you need Apache, PHP, and a MySQL database). The unzipped project file contains four directories, with dooit-specific sub-directories in code/ and themes/:

  • cache (should be write-enabled, used for CSS and JavaScript documents)
  • code
    • arc (the core RDF toolkit)
    • trice (reusable framework components)
    • dooit (the project controller, custom scripts, and templates)
  • config (database configuration and path dispatching rules)
  • themes (CSS and images)

Why RDF Instead of a Classical Relational Database?
RDF provides a data model that can represent any piece of information as a graph fragment (i.e., as nodes and relations between nodes). The handy side-effect of this graph model is that it can freely evolve and changes are cheap. You simply add relations and nodes, and the RDF system takes care of the storage layer. No need to ponder over tables and column types, no messing with the database when the schema changes, and no complicated JOIN syntax to retrieve information any more (querying also takes place at the graph, and not at the storage level). Figure 2 shows a sample to-do item drawn as an RDF graph. Every set of two nodes (subject and object, in RDF parlance) connected by a directed relation (the predicate) forms a so-called triple.

 
Figure 2. RDF Graph Representation of a To-do Item: The graph consists of five triples. Predicates are always represented as web identifiers (URIs), the single subject and the ical:Vtodo object values are URIs, too. The remaining objects are literals.

Of course, you don’t want to draw circles and arrows all the time in order to use RDF. There are several serialization formats available, for example, the W3C-recommended RDF/XML syntax, or RDFa, a syntax to embed graph data in HTML. This article uses Turtle, a text notation that is easy to read and write, but also machine-readable:

@base @prefix rdf: @prefix dc: @prefix ical:  rdf:type ical:Vtodo ;             ical:summary "Create sketches" ;             ical:status "done" ;             dc:subject "dooit" ;             dc:subject "work" .

This snippet contains the same information as the graph in Figure 2. You can use semicolons in Turtle to repeat a triple subject. Reusing existing RDF vocabularies such as Dublin Core (dc:) or iCalendar (ical:) is good practice. Namespaced terms simplify global identification and also prevent name clashes with other people’s models. If you are new to RDF, be careful not to waste too much time scanning schema repositories such as Swoogle for matching schemas. It is possible to make up your own ad-hoc vocabulary and align data with the public semantic web at a later stage.

Data First
Data first means starting with a real use case, an actual problem with a clear scope, as opposed to structure first where you try to predict the long-term model your application might be using one day. Interestingly, RDF supports both approaches. The latter is still dominant in the semantic web community, often leading to the impression that writing an ontology is the recommended way of getting started with RDF. I suggest giving Data first a try, especially if you want your application to Get Real as quick as possible. You may have to scribble the basic triple/graph structure on a paper, but that is usually all that is needed to start. Just think of the first data item and design the input form (on paper and as HTML mockup). When this is done, you need program the code to store data coming from a submitted form, but again, RDF helps us keep things simple.

RDF-generating Forms
For easier editing, separate the form definition method (dooit/Dooit_getItemForm.php) from the main controller. If you open it in a text editor (or see Listing 1, which is a copy of the method), you can see that the form is built from a simple array of field definitions. The editable fields section is where you define the visible input fields. Following the mockup, only two fields are considered for now, the to-do item’s summary and a comma-separated list of tags:

.../* editable fields */'summary' => array('required',  'label' => 'Summary',  'term' => 'ical:summary',),'tags' => array('required',  'label' => 'Tags',  'term' => 'dc:subject',  'value_type' => 'csv',  'info' => 'comma-separated'),...
 
Figure 3. To-do Item Form: The underlying system auto-generates RDF from the provided form field definitions. It also assigns a web-friendly identifier to each created entry, which becomes handy should you want to publish your to-do list at some later stage, or if you plan to combine it with external data sources.

The term key is important, it indicates the RDF relation to use when the submitted form data is converted to RDF triples (namespace prefixes are defined in config/sys.php). Figure 3 shows the HTML form generated by these field definitions.

What exactly happens when the form is submitted? The framework auto-generates an RDF graph by combining the received data with the term indicators specified in the form definition. These triples are then saved using a SPARQL INSERT query. SPARQL (http://www.w3.org/TR/rdf-sparql-query/), the RDF query language, works similar to SQL, but it is optimized for graph-shaped data and web repositories.

Write operations in SPARQL are not standardized yet, but there is only a small number of related proposals and they are likely to converge sooner or later (dooit uses SPARQL+ internally to enable INSERT operations).

Building a simple RDF graph from the form values can help you keep the amount of custom code at a minimum, which in turn minimizes the cost of changing your application later. Before you start thinking of new features, you still have to replace the static mockup lists (active and completed items and tags) with real data.

SPARQL-driven Views
As already mentioned above, using SPARQL as a data access mechanism allows you to query your dataset at the graph, not at the storage level. SPARQL has some additional benefits (such as multiple result structures and formats, or being able to retrieve data from remote RDF repositories), but it is basically like using SQL against a very simple, three-column table layout:

DESCRIBE ?todo WHERE {  ?todo a ical:Vtodo ;        ical:status "done" ;        dct:modified ?datetime . }ORDER BY DESC(?datetime)LIMIT 5

This DESCRIBE query returns the five most current to-do items marked as “done.” The results are then used to populate the field section of the to-do item template (dooit/t_item.php contains the complete template):

$o) { echo '' . $o['value'] . ''; } ?>

These views probably look familiar if you have worked with template-based systems before. There is not much that RDF or SPARQL can optimize here. The template pre-processing could be improved, so that you wouldn’t have to use long, namespaced keys, but for the purpose of this article, it seemed to make sense not to hide RDF completely.

Finishing a first-running version from here just means writing a few parameterized SPARQL queries (for active items, completed items, and tags) and tweaking the associated templates. In a last step, completing or re-activating to-dos is implemented via a JavaScript call that triggers a SPARQL operation, which in turn changes the item’s status and modification date:

// resource identifier and tick action // are sent via a POST parameter$res = $this->p('item');$status = $this->p('tick-action') == 'on' ? '' : 'done';$now = Trice::getXSDDate(time(), 1);$g = str_replace('#self', '', $res);// update the relevant triples$store->query('DELETE FROM <'.$g.'> {   <'.$res.'> ical:status ?status ; dct:modified ?modified . }');$store->query('INSERT INTO <'.$g.'> {   <'.$res.'> ical:status "'.$status.'" ; dct:modified "'.$now.'" . }');
 
Figure 4. First Version of Dooit: The initial release allows creating, editing, tagging, deleting, filtering, and completing to-do items.

A screenshot of the final application is shown in Figure 4.

Keeping It Real
Congratulations, you managed to benefit from RDF’s generic data model and the simple SPARQL query language to quickly implement a running application (Click hereif you are interested in seeing how long it took to build dooit) and with very little custom code (about 15KB of PHP and JavaScript). Equally important for an early first release, however, is the ability to keep iterating to improve the application—and ideally without losing agility. You should perform at least one test to verify that the RDF system stays flexible. The first thing you may want to improve in dooit is probably item ordering, maybe by adding a “priority” field and using that for sorting.Add a new field to the form builder (dooit/Dooit_getItemForm.php), maybe after the tags definition:

'prio' => array(  'type' => 'selection',  'label' => 'Priority',  'term' => 'dooit:priority',  'options' => array('A' => 'A', 'B' => 'B', 'C' => 'C')),

If you do not want to check the ical vocabulary for a possible matching term, you can use the ad-hoc property instead. The editing form contains an additional “Priority” drop-down now and the specified value is saved without the need for any further code or database change.

The views are not affected yet. To use the new attribute and sort the item list by priorities, you simply call the query builder with a different sort argument. Open dooit/t_items.php, and change the second parameter in getItemListHTML from ical:summary to the newly introduced dooit:priority. The to-do items are now sorted by priority values.Finally, you may want to display priorities in the item template (dooit/t_item.php):

[]

In three simple steps, you added a new field and improved the application, without writing a new method or having to touch the database. You can add more fields and features the same way.

What’s Next?

 
Figure 5. SPARQL Interface: The SPARQL protocol simplifies the deployment of APIs. It can also be used to debug and experiment with application data.

This was just a basic introduction to agile web development with RDF. To get a closer look at the internal triple data and RDF’s standardized API, try adding a SPARQL endpoint (see Figure 5) to dooit by uncommenting and activating the sparql path definition in config/handlers.php. Only read operations are enabled by default, but if you extend the feature configuration with load or insert, you can enhance your application with a two-way API, or even integrate data from the growing semantic web.

Share the Post:
Share on facebook
Share on twitter
Share on linkedin

More From DevX