Turn Twitter Into Your Personal Assistant

icroblogging service Twitter has become a disruptive everyday tool. It is increasingly replacing not only instant messaging clients, but also social bookmarking sites, interest tracking applications, support forums, email, and (to a certain extent) classical blogs.

A few simple conventions, together with RDF and SPARQL, can turn your Twitter feeds into rich information streams, which you can then use for a more productive microblogging experience.

The following sections explain how to:

  • Enhance microposts with machine-extractable data
  • Query the extracted data with SPARQL
  • Generate custom streams and reports to support your personal workflow

You can reproduce the examples in this article with the supplied source code that contains early components of a semantic microblogging system.

System Setup

Download the source code archive (smesher.zip) and copy its contents to your web server. Follow the setup instructions in the readme.txt file, you will need Apache, PHP, and a MySQL database. The project consists of five directories, with application-specific sub-directories in code/ and themes/:

  • cache: should be write-enabled, used for CSS and JavaScript documents
  • code:
    • arc: the core RDF toolkit
    • trice: reusable framework components
    • smr: the project controller, custom scripts, and templates
  • config: database configuration and path dispatching rules
  • logs: should be write-enabled, used for system messages
  • themes: CSS and images

Step 1: Subscribe to Your Twitter Feeds

First, you need some input data to work with, such as the most recent posts mentioning your username or interesting keywords (see Figure 1). Luckily, Twitter provides Atom feeds for all pages, and the demo system includes an Atom-to-RDF converter, so you don’t have to learn how the Twitter API works. You can directly import user timelines and search results instead. Click Settings in the upper right navigation to open a simple Feeds form. For the sake of simplicity, you only have to enter your username and a set of tags that are then used internally to generate corresponding feed URLs.

When you are done, return to the main screen by clicking on the logo in the upper left corner. Instead of cronjobs or background processes, the demo simply checks and periodically refreshes your subscriptions when you access the start page. After a few seconds (you might have to reload the page to see the changes), the first items should appear, as shown in Figure 2.


Figure 1. Import settings: Based on the provided information, the demo application imports a selection of microfeeds.
 
Figure 2. Initial timeline: So far, the microposts can (only) be filtered by author.

Step 2: Explore the Data

The individual items carry a number of structured elements, which you can use for formatting (for example, displaying an image instead of a raw avatar URL) or basic filtering (for example, by author). Together with SPARQL’s

 
Figure 3. SPARQL API Example: The COUNT feature is not part of the current SPARQL specification yet, but a new W3C Working Group just launched to explore aggregate functions and similar extensions.

REGEX command, you can already run some interesting queries against the API at /sparql. For example, this SPARQL query returns the names and Twitter accounts of people who mentioned “Berners-Lee” in their posts:

SELECT ?author ?account WHERE {  ?post a sioct:MicroBlogPost ;        dc:creator ?author ;        sioc:has_creator ?account ;        content:encoded ?content .  FILTER(REGEX(?content, "Berners-Lee", "i"))}

Not too spectacular data-wise, but the exciting thing here is the fact that a semantic API lets you retrieve exactly the elements that you need (see Figure 3). Twitter’s search feature can only return a list of posts, SPARQL allows you to generate a list of persons, or dates, or any other available attribute. This greatly simplifies data integration and repurposing.

Step 3: Increase the Granularity

While the default structures are a handy starting point, the really interesting data is still hidden in the post’s body. People are addressed (leading @name) or mentioned (@name somewhere in the text), hashtags (#tag) and links (http://…) are embedded, and quoted Tweets are marked up with a leading RT.

The demo system contains a PHP class (located at code/smr/SMR_RDFExtractor.php) that auto-extracts these elements from the otherwise opaque content and turns them into RDF triples. The converter is based on simple regular expressions and you can extend them with custom patterns (more on this later).

After the granular information is added to the RDF store, you may add respective filters to the main view. The facets are defined in code/smr/options/SMR_Options_DefaultBox.php. You can add entries to the getTabs method, and then write a matching method where the SPARQL pattern with its RDF relation is specified:

 
Figure 4. Filtered Stream: The advanced facets helps you find out who re-tweeted any of your posts, or posts that contain a certain link, or popular links in general.
function getTabs() {  return array(    [...]    'tags' => array('label' => 'Tags'),    'users' => array('label' => 'Mentioned Users'),    'links' => array('label' => 'Links'),  );}function getUsersTabHTML() {  $pattern = '?res smr:mentionedUser ?val . ';  return $this->getFilterList($pattern, 'smr:mentionedUser');}function getLinksTabHTML() {  $pattern = '?res smr:link ?val . ';  return $this->getFilterList($pattern, 'smr:link');}

The application can now generate clickable filter lists (see Figure 4). You can browse your subscriptions in a more fine-grained way, for example, to discover popular links or socially active users.

Step 4: Create Custom Reports with SPARQL

Fine-grained stream filtering is already helpful in terms of noise reduction and interest tracking. But in order to let Twitter assist you even more, you will also need quick access to custom query results. After all, it’s not the post, but the information in the post that enables automation. One option is to simply bookmark query URLs from the SPARQL API. For example, a query to generate a list of your online contacts (not the people in your followers list, but those you really interact with) is not too complicated:

SELECT DISTINCT ?name WHERE { # people mentioned by me  ?post sioc:has_creator ?account ;        smr:mentionedUser ?name .  ?account rdfs:label "your_twitter_username" . # people who mentioned me  ?post2 sioc:has_creator ?account2 ;         smr:mentionedUser "your_twitter_username" .  ?account2 rdfs:label ?name .}

A single SPARQL operation alone might not be enough, though, especially if you want to inject additional data or reformat the results. In this case, you can extend the main page with your own tabs. Open code/smr/SMR_ViewBox.php and add your own entries to the getTabs method. This works similar to extending the filters, but this time you have to create a view class in code/smr/custom/ for each new tab:

  function getTabs() {    return array(      'all' => array('label' => 'All posts'),      'contacts' => array('label' => 'My Contacts'),      'bookmarks' => array('label' => 'Bookmarks'),    );  }  function getAllTabHTML() {    $tab = Trice::getObject('SMR_Stream_Tab', $this->a, $this->caller);    return $tab->getHTML();  }  function getContactsTabHTML() {    $tab = Trice::getObject('SMR_Custom_ContactsTab', $this->a, $this->caller);    return $tab->getHTML();  }  function getBookmarksTabHTML() {    $tab = Trice::getObject('SMR_Custom_BookmarksTab', $this->a, $this->caller);    return $tab->getHTML();  }
 
Figure 5. Custom Tabs with Bookmarks and Proven Contacts: The bookmark query was aligned with the filtering mechanism, so that you can use facets to narrow down the results.

You can also fine-tune the stylesheet information in code/smr/custom/custom.css if you like. Listing 1 contains the complete class for a basic bookmarks tab that orders entries by popularity (see Figure 5).

Other possible use cases include project reports or a birthday reminder. To a certain extent, you can implement the former based simple tags, but the latter requires the ability to detect and extract the month and calendar day of a person’s birthday from a micropost.

Step 5: Type Your Tags and Run More Powerful Queries

You can filter your microposts nicely and also create custom reports now. However, as just mentioned, for a personal assistant you will need even more granularity. Technically, this is not too much of a problem as the data model of your RDF-driven application can be freely extended at run-time. The question is rather how to add the functionality to existing Twitter clients. You want to add more structure than provided through tags, but are restricted to a simple input form and the maximum length of 140 characters. The solution proposed in this article is a slightly extended tag syntax, similar to machine tags on Flickr, but for now without the namespace-qualified keys.

You can tweak the regular expressions in the RDFExtractor (code/smr/SMR_RDFExtractor.php) to add support for such typed tags that follow a #key=value pattern. (You actually don’t have to write this method yourself, it’s already part of the code bundle.) Now you can add something like #birthday #month=08, #todo #priority=A or #done #task=”DevX article” #hours=16 to your tweets and let the application extract the classified tags (which use a local my: namespace in the resulting RDF). Create conventions you feel comfortable with and let them evolve.

 
Figure 6. Basic Summary of Working Hours: The report uses SPARQL+ to aggregate working hours and then groups the results by related tags.

Typed tags open the door to very sophisticated and automated reports on top of Twitter, for example, a personal software bug tracker (see Figure 6):

SELECT DISTINCT ?post ?prio ?text WHERE {  ?post smr:tag "todo" , "bug" ;        my:priority ?prio ;        content:encoded ?text .}ORDER BY ASC(?prio)

or a calculator for working hours:

SELECT ?project SUM(?hours) as ?sum WHERE {  ?post my:hours ?hours ;        my:project ?project .}GROUP BY ?project

or upcoming birthdays:

SELECT ?text WHERE {  ?post content:encoded ?text ;        smr:tag "birthday" ;        my:month "2" ;        my:day ?day .  FILTER(?day > 10)       }ORDER BY ?day

Further Steps

This article demonstrated the basic building blocks and possibilities for accessing and processing semantic Twitter posts. The obvious next step is to create a dedicated microblogging client that keeps selected tweets or tags private. A long-running background script that sends alerts for pre-defined events would also be handy. Another compelling idea is a script that uses SPARQL to combine the simple working hours example with external information to generate a more complete project report; or perhaps a monthly invoice.

Share the Post:
Share on facebook
Share on twitter
Share on linkedin

Overview

Recent Articles: