Implementing the Virtual Router and the XProc Engine
code (the virtual router) is actually quite simple (see Listing 10
Most of the router code just involves determining which action is appropriate and retrieving the initial collection references. These references make heavy use of an imported library (xproc.xq), including the xproc:get-parameter() and xproc:set-parameter commands that are used to either retrieve or set critical information that will be needed in the pipeline processing. Once everything is determined, the relevant action is then passed to the xproc:process-action() method. The result of this is then passed into a simple XHTML page, along with a debugging option (set with the debug query-string parameter) that displays which parameters are in effect at the end of the processing.
This virtual router has a fairly significant design flaw in the XHTML document, however. In a full pipeline, the XHTML document would likely be passed in an appropriate XInclude or XSLT transformation. However, as much of this is a proof of concept, this isn't a fatal flaw and subsequent versions will handle this more appropriately.
The real magic is done by the XProc.xq library (see Listing 11).
Aside from three functions: xproc:get-param, xproc:set-param and xproc:twitter-format (which will eventually be moved into their own libraries as xrx:get-param, xrx:set-param and twitter:twitter-format, respectively), most of the code is handled using recursion. The process-action invokes the first pipeline element in the action, which in turn calls process-pipeline(). The process-pipeline() method sets up the appropriate environment and then calls process-pipe() to handle the first pipe in the sequence.
It might be tempting to just use a for/in expression to loop through the pipes, but that approach simply generates a sequence. Instead, after process-pipe determines which pipe handler needs to be called, it tests to see if it has reached the last pipe in its sequence. If it has, the results are passed back. Otherwise, the output is passed back in as the input to the next pipe in the sequence (using the following-sibling::* axis to determine what that particular step is).
XProc is capable of creating subordinate pipelines so that you can have pipes running in parallel or as alternatives, but even here the same tree-walking algorithms can be applied to walk through the pipeline setup. In general, the key is to understand that a pipeline step (what has been referred to as a pipe) is essentially a node of context, and at any given time only one such context is applicable in the overall pipeline. The final XProc library will be considerably more complex, but even that differs only in scale, not design.
Updating the Database
One of the key things to remember when working with RESTful interfaces is that they do not necessarily imply that back-end actions can't occur. For instance, the application described here is essentially a spider; it reads content from an external site (in this case Twitter) and stores the results internally. This operation can occur concurrently with the RESTful interfaces because they originate from the server.
In this particular instance, the update.xq script handles the read operation of the database (see Listing 12).
The XQuery script in this case can actually be invoked in a couple of ways:
- Use an external cron or similar service to call the script periodically using the REST URI for the script:
cron 0,5,10,15,20,25,30,35,40,45,50,55 * * * * * wget http://localhost:8080/exist/rest//db/twitter/update.xq
This would invoke the code once every five minutes.
- Set up a startup.xq script that would use the internal scheduler to invoke the operation:
declare namespace scheduler="http://exist-db.org/xquery/scheduler";
schedule-xquery-cron-job("/db/twitter/update.xq","0,5,10,15,20,25,30,35,40,45,50,55 * * * * *")
There's no real difference between the two, except that the latter is more secure.
Note again that this provides only a bare-bones updater. In practice, it's likely that the update.xq file would perform a certain degree of filtering upon the incoming data, such as allowing only status updates if a URL is contained in the text body. After each iteration, zero or more status updates that merge the XML and Atom formats of each status will be displayed.
The httpclient: namespace provides the mechanism for actually retrieving the external feeds. This is a fairly powerful toolyou can perform all HTTP verbs with the interface and can use it to work with authentication (the above example illustrates the use of BASIC authentication across the GET protocol).
I leave it to you to create a Twitter status updater from the user interface. As a hint, it uses httpclient:post().
Too Much Code?
Looking at the amount of code involved here, it may seem like overkill for a simple web application. (The source code for the application is available at the XMLToday.org site
.) In fact, it is considerably simpler to build an application using parameters and an XML-RPC-like approach. That is, if this is the only thing that you will do with this application.
However, much of this is a proof of concept to illustrate the idea of RESTful services. Once the foundational code is in place, adding or modifying a service becomes generally as simple as modifying the services.xml document to add a new service, method, face or pipeline. Pipelines can also be combined and stacked to handle more complex logic, such as validating incoming data against existing schemas. You also can use them to integrate multiple data stream formats into a single document, to generate RSS or Atom feeds (or JSON), and more.
What's more, by treating your application as a set of resource collections (real or virtual), you significantly reduce the complexity of your interfaces for users, you can provide different levels of display based upon authorization levels, and you make it easier to define components (both client and server) that can readily interact with your datawhile still giving you a modicum of protection against corruption or abuse of that data.