he first time I encountered Twitter I dismissed it. At the time, people were keeping "logs" of their day-to-day activities, most of which were just dull. Unless you were actively following someone you cared about, such daily minutiae was irrelevant, and the Twitter interface didn't make it immediately obvious how you could subscribe to the people you cared about. So Twitter faced the critical mass problem that a lot of network-oriented technologies face.
However, when Tim O'Reilly started actively twittering, I followed him and began to realize that the Twitter universe was considerably more active than the last time I visited. Moreover, a subtle change had taken place. No longer limited to people talking about their day-to-day activities, Twitter increasingly was becoming an alternate channel to RSS feedsa way for users to broadcast links for interesting, amusing or relevant web content to people who follow them.
Social applications aside, Twitter is also ideal for performing all kinds of useful programmatic projects, from invoking web services to creating link-lists to performing semantic analysis. This becomes especially powerful when combined with something like an XML database that can perform periodic queries of content and retrieve only relevant pieces of information.
This article walks through the creation of a RESTful data application around an XML database, which will demonstrate how to use the Twitter API and how to make such an API work using virtual collections. This simple web application will periodically query the Twitter API for new status messages from a chosen user's account and download them to the database as XML, combining the XML and Atom formats to give access to both.
Twitter API and RESTful Interfaces
The Twitter API is remarkably simple to use, and it contains a number of goodies for XML and JSON developers. The API allows access to most of the primary feeds that Twitter provides, including the general Twitter stream, an authenticated user's personal and friends' streams, replies, personal messages, and system-level status messages. For the most part, these are RESTful interfaces. In essence, each URL is a collection of Twitter status messages (known colloquially as tweets). You use HTTP GET operations to receive a listing of tweets relevant to the particular collection feed, POST operations to add a new tweet to the appropriate collection, and DELETE to remove a tweet from a collection. These are good examples of "virtual collections." The tweet universe is in fact just a single table of tweets with these different views representing different "filterings" of that single list.
Except for the general feed, to access a Twitter feed you need to provide authentication for a given account. In a tool such as Curl, you can provide the authentication through a command line invocation. For instance, to get just the status messages (tweets) of a given user with the username uname and the password pword, you'd use the following:
curl -u uname:pword http://twitter.com/statuses/user_timeline.format
The format could be "XML,""Atom,""RSS," or "JSON." If the username and password are valid, then the XML format:
curl -u uname:pword http://twitter.com/statuses/user_timeline.xml
Will return a collection of status items in the form shown in Listing 1. While the Atom format:
curl -u uname:pword http://twitter.com/statuses/user_timeline.atom
Uses the Atom 1.0 specification to generate the output in Listing 2.
Technically speaking, these are not APIs; they are RESTful services. An API implies methods with parameters, returning a fixed result (typically invoked via some encapsulated SOAP message launched from bound code). RESTful services more properly describe a representation of a given collection of objects, in this case tweets. While the services do have parameters (including count, ID, page, and so forth), these are generally used to control the paging of the output rather than to perform internal calculations or otherwise invoke methods. The emphasis is on the resources as a collection rather than specific properties of the resources themselves.
In order to add new tweets, Twitter also exposes the update service: http://twitter.com/statuses/update.format, where the service returns the created tweet in the given format (currently just XML or JSON). The service in this case does take two parameters, the status message itself and the parameter in_reply_to_status_id. This latter, optional parameter contains the Twitter ID of a message to which the current message refers back. Typically, you won't see this initial ID as part of the user interface, so this parameter is specifically targeted to Twitter client developers. Thus, from Curl, you could send a new tweet to the current stream as follows:
curl -u uname:pword -d status="Coming to the conclusion that the Twitter API is technically a RESTful
Service - RESTful operations on collections of resources (tweets)." http://twitter.com/statuses/update.xml
This returns the XML result set in Listing 3.
It also adds the tweet to the Twitter stream.