Login | Register   
LinkedIn
Google+
Twitter
RSS Feed
Download our iPhone app
TODAY'S HEADLINES  |   ARTICLE ARCHIVE  |   FORUMS  |   TIP BANK
Browse DevX
Sign up for e-mail newsletters from DevX


advertisement
 

An Introduction to the Yahoo! Query Language Platform : Page 2

Yahoo! has released a base platform that opens up Yahoo! user data via web standard APIs and also provides a framework for how developers, publishers, and advertisers can build applications on and off Yahoo!.


advertisement

Functional Basics

Dot Syntax

The SELECT statement's "what" and local filtering clauses use a "dot" style syntax to address parts in the table item data structure. Each dot-separated part in the field path refers to the name of the element (for JSON and XML documents), an attribute, or CDATA/text portion of the element (for XML documents). The more "dots" in a path, the deeper into the data structure the final element is found. This dot syntax provides a simple and expressive mechanism to address parts of an item's structure. Consider the following XML item.

<item> <title>some item title</title> <link> <content>http://foo</content> </link> <image width=100 height=100 src="http://bar" /> <geo:lat>37.123</geo.lat> </item>

The following are valid dot-paths that refer to different parts of this structure:

Item.title Item.link Item.link.content Item.image.width Item.lat

The namespace of any attributes or elements in XML is ignored.



Remote vs. Local Processing

YQL accesses a diverse range of services and data. Many queries access data sources with thousands of items in them. To manage these large remote set sizes and to work with existing web services without modification, YQL provides two distinct ways—remote and local—to filter the data being queried in the table.

Remote filtering simply means pushing as much of the filter (or query) to the remote data provider or service. The most common way that services enable their collections to be queried and filtered is through REST-style syntax and query/path parameters. Because these often do not map closely to the result structure that is returned by that query, YQL exposes them explicitly as remote "keys" in the WHERE cause, some of which must be provided. For example, the local.search table requires both "query" and "location" keys (what to search for and where):

SELECT * FROM local.search WHERE query="sushi" AND location="san francisco, ca"

Keys support only "equality" comparisons. Describing a table (DESC local.search) shows what keys or combinations of keys are possible for a given table.

Local Filtering

After data arrives at the YQL service, you can filter it through any combination of ANDs, Ors, and comparators. You can refer to particular parts of each item in the result set through the dot-style syntax (the root/item level is not needed). For example, consider the local example above. If you are interested only in restaurants with a high average review, you can locally filter the results using the Rating element like this:

SELECT * FROM local.search WHERE query="sushi" AND location="san francisco, ca" AND Rating.AverageRating>4

Paging

By default, YQL only provides 10 items from a table. If you want to get more data locally processed, or return more results, you need to instruct YQL to retrieve more items by inserting the number into parenthesis after the table name:

SELECT * FROM flickr.photos.interestingness(20)

If you want to start at an offset in the table, you can use a second parameter in the parentheses:

SELECT * FROM flickr.photos.interestingness(10,20)

This returns 20 items starting at offset 10. If you'd like YQL to process as much data as it can in 30 seconds (the default timeout for any YQL statement), you can use (0) as the table sizing parameter.

Projection

Often, you don't want all of the data in each item that's coming back from a table. For example, perhaps you only want the title of an RSS feed item. The "what" part of the SELECT statement enables you to select (or project) which parts of the table data that satisfy the WHERE clause are actually returned, including any parent parts of the structure that are necessary. Each dot-style path included in the comma separated list selects a sub-tree to return for each item in the table.

Consider a web search example:

select * FROM search.web where query="pizza"

Each item returned from this result looks like this:

<result xmlns="http://www.inktomi.com/"> <abstract><![CDATA[Official site for the international <b>pizza</b> chain, with online
ordering, <b>Pizza</b> Hut store finder, coupons, and menu. Also in Spanish.]]=</abstract> <clickurl>http://lrd.yahooapis.com/_ylc= =10t5eod6j/**http%3A//www.pizzahut.com/</clickurl> <date>2008/12/01</date> <dispurl><![CDATA[www.<b>pizzahut.com</b>]]=</dispurl> <size>12880</size> <title><![CDATA[<b>Pizza</b> Hut]]=</title> <url>http://www.pizzahut.com/</url> </result>

If you need only the title and URL, for example, you can project only those pieces of the result:

SELECT title, url FROM search.web where query="pizza"

This query returns:

<result xmlns="http://www.inktomi.com/"> <title><![CDATA[<b>Pizza</b> Hut]]=</title> <url>http://www.pizzahut.com/</url> </result>

Joins

One of the most powerful aspects of YQL is its ability to join data across different tables (and web services) through sub-SELECTs. For example, to get your own profile information, use this statement:

SELECT * FROM social.profile where guid=me

The social.profile table returns the profile for a given GUID, the identifier for a person within Yahoo! (and me is a convenience constant that expands the current user's profile key or identifier).

Do this to get your friends:

SELECT * FROM social.connections where owner_guid=me

The result of this table query is a list of friends that don't have much information; it's mostly a list of GUID's.

Now, you can combine both of these tables using a join to obtain all the profiles for each of your friends:

SELECT * FROM social.profile WHERE guid IN (SELECT guid FROM social.connections WHERE owner_guid=me)

In essence, the sub-SELECT needs to return the foreign-keys that are used for the join with the outer-SELECT. The project part of the sub-SELECT needs to point to a leaf node of the sub-SELECTs results, like the GUID for each social connection returned.

You can do joins whenever you get a table that provides input to another table. Consider getting an international weather forecast for any location on the planet. The weather.forecast table provides forecasts but requires a location that is either a U.S. ZIP code or an international location identifier. The weather.com site provides a great service that returns this identifier from a string. So now you can join these two services together to get the weather using a string:

SELECT * FROM weather.forecast WHERE location IN (SELECT id FROM xml WHERE url="http://xoap.weather.com/search/search?WHERE=prague" AND itemPath="search.loc")

This query also uses YQL's ability to bring in data from any place on the Internet through the XML table.

Piping Output Through Post-query Functions

After a SELECT has finished filtering and projecting the items, YQL allows you to add additional "post-query" functions to the results before they are returned. Appending the "pipe" symbol to the end of the statement performs these functions:

SELECT … |sort(field=item.date) SELECT … |unique(field=item.title)| …



Comment and Contribute

 

 

 

 

 


(Maximum characters: 1200). You have 1200 characters left.

 

 

Sitemap