Functional Basics
Dot Syntax
The
SELECT statement's "what" and local filtering clauses use a "dot" style syntax to address parts in the table item data structure. Each dot-separated part in the field path refers to the name of the element (for JSON and XML documents), an attribute, or CDATA/text portion of the element (for XML documents). The more "dots" in a path, the deeper into the data structure the final element is found. This dot syntax provides a simple and expressive mechanism to address parts of an item's structure.
Consider the following XML item.
<item>
<title>some item title</title>
<link>
<content>http://foo</content>
</link>
<image width=100 height=100 src="http://bar" />
<geo:lat>37.123</geo.lat>
</item>
The following are valid dot-paths that refer to different parts of this structure:
Item.title
Item.link
Item.link.content
Item.image.width
Item.lat
The namespace of any attributes or elements in XML is ignored.
Remote vs. Local Processing
YQL accesses a diverse range of services and data. Many queries access data sources with thousands of items in them. To manage these large remote set sizes and to work with existing web services without modification, YQL provides two distinct waysremote and localto filter the data being queried in the table.
Remote filtering simply means pushing as much of the filter (or query) to the remote data provider or service. The
most common way that services enable their collections to be queried and filtered is through REST-style syntax and
query/path parameters. Because these often do not map closely to the result structure that is returned by that query,
YQL exposes them explicitly as remote "keys" in the WHERE cause, some of which must be provided. For
example, the local.search table requires both "query" and "location" keys (what to search for and where):
SELECT * FROM local.search WHERE query="sushi" AND location="san francisco, ca"
Keys support only "equality" comparisons. Describing a table (
DESC local.search) shows what keys or combinations of keys are possible for a given table.
Local Filtering
After data arrives at the YQL service, you can filter it through any combination of
ANDs,
Ors, and comparators. You can refer to particular parts of each item in the result set through the dot-style syntax (the root/item level is not needed). For example, consider the local example above. If you are interested only in restaurants with a high average review, you can locally filter the results using the
Rating element like this:
SELECT * FROM local.search WHERE query="sushi" AND location="san francisco, ca" AND Rating.AverageRating>4
Paging
By default, YQL only provides
10 items from a table. If you want to get more data locally processed, or return more results, you need to instruct YQL to retrieve more items by inserting the number into parenthesis after the table name:
SELECT * FROM flickr.photos.interestingness(20)
If you want to start at an offset in the table, you can use a second parameter in the parentheses:
SELECT * FROM flickr.photos.interestingness(10,20)
This returns
20 items starting at offset
10. If you'd like YQL to process as much data as it can in
30 seconds (the default timeout for any YQL statement), you can use (
0) as the table sizing parameter.
Projection
Often, you don't want all of the data in each item that's coming back from a table. For example, perhaps you only
want the title of an RSS feed item. The "what" part of the
SELECT statement enables you to select
(or project) which parts of the table data that satisfy the
WHERE clause are actually returned, including any parent parts of the structure that are necessary. Each dot-style path included in the comma separated list selects a sub-tree to return for each item in the table.
Consider a web search example:
select * FROM search.web where query="pizza"
Each item returned from this result looks like this:
<result xmlns="http://www.inktomi.com/">
<abstract><![CDATA[Official site for the international <b>pizza</b> chain, with online
ordering, <b>Pizza</b> Hut store finder, coupons, and menu. Also in Spanish.]]=</abstract>
<clickurl>http://lrd.yahooapis.com/_ylc= =10t5eod6j/**http%3A//www.pizzahut.com/</clickurl>
<date>2008/12/01</date>
<dispurl><![CDATA[www.<b>pizzahut.com</b>]]=</dispurl>
<size>12880</size>
<title><![CDATA[<b>Pizza</b> Hut]]=</title>
<url>http://www.pizzahut.com/</url>
</result>
If you need only the title and URL, for example, you can project only those pieces of the result:
SELECT title, url FROM search.web where query="pizza"
This query returns:
<result xmlns="http://www.inktomi.com/">
<title><![CDATA[<b>Pizza</b> Hut]]=</title>
<url>http://www.pizzahut.com/</url>
</result>
Joins
One of the most powerful aspects of YQL is its ability to join data across different tables (and web services) through sub-
SELECTs. For example, to get your own profile information, use this statement:
SELECT * FROM social.profile where guid=me
The
social.profile table returns the profile for a given GUID, the identifier for a person within Yahoo! (and
me is a convenience constant that expands the current user's profile key or identifier).
Do this to get your friends:
SELECT * FROM social.connections where owner_guid=me
The result of this table query is a list of friends that don't have much information; it's mostly a list of GUID's.
Now, you can combine both of these tables using a join to obtain all the profiles for each of your friends:
SELECT * FROM social.profile WHERE guid IN
(SELECT guid FROM social.connections WHERE owner_guid=me)
In essence, the
sub-SELECT needs to return the foreign-keys that are used for the join with
the
outer-SELECT. The project part of the
sub-SELECT needs to point to
a leaf node of the
sub-SELECTs results, like the GUID for each social connection returned.
You can do joins whenever you get a table that provides input to another table. Consider getting an international weather forecast for any location on the planet. The weather.forecast table provides forecasts but requires a location that is either a U.S. ZIP code or an international location identifier. The weather.com site provides a great service that returns this identifier from a string. So now you can join these two services together to get the weather using a string:
SELECT * FROM weather.forecast WHERE location IN
(SELECT id FROM xml WHERE
url="http://xoap.weather.com/search/search?WHERE=prague"
AND itemPath="search.loc")
This query also uses YQL's ability to bring in data from any place on the Internet through the XML table.
Piping Output Through Post-query Functions
After a
SELECT has finished filtering and projecting the items, YQL allows you to add additional "post-query" functions to the results before they are returned. Appending the "pipe" symbol to the end of the statement performs these functions:
SELECT … |sort(field=item.date)
SELECT … |unique(field=item.title)| …