statement's "what" and local filtering clauses use a "dot" style syntax to address parts in the table item data structure. Each dot-separated part in the field path refers to the name of the element (for JSON and XML documents), an attribute, or CDATA/text portion of the element (for XML documents). The more "dots" in a path, the deeper into the data structure the final element is found. This dot syntax provides a simple and expressive mechanism to address parts of an item's structure.
Consider the following XML item.
<title>some item title</title>
<image width=100 height=100 src="http://bar" />
The following are valid dot-paths that refer to different parts of this structure:
The namespace of any attributes or elements in XML is ignored.
Remote vs. Local Processing
YQL accesses a diverse range of services and data. Many queries access data sources with thousands of items in them. To manage these large remote set sizes and to work with existing web services without modification, YQL provides two distinct waysremote and localto filter the data being queried in the table.
Remote filtering simply means pushing as much of the filter (or query) to the remote data provider or service. The
most common way that services enable their collections to be queried and filtered is through REST-style syntax and
query/path parameters. Because these often do not map closely to the result structure that is returned by that query,
YQL exposes them explicitly as remote "keys" in the WHERE cause, some of which must be provided. For
example, the local.search table requires both "query" and "location" keys (what to search for and where):
SELECT * FROM local.search WHERE query="sushi" AND location="san francisco, ca"
Keys support only "equality" comparisons. Describing a table (DESC local.search
) shows what keys or combinations of keys are possible for a given table.
After data arrives at the YQL service, you can filter it through any combination of
, and comparators. You can refer to particular parts of each item in the result set through the dot-style syntax (the root/item level is not needed). For example, consider the local example above. If you are interested only in restaurants with a high average review, you can locally filter the results using the Rating
element like this:
SELECT * FROM local.search WHERE query="sushi" AND location="san francisco, ca" AND Rating.AverageRating>4
By default, YQL only provides 10
items from a table. If you want to get more data locally processed, or return more results, you need to instruct YQL to retrieve more items by inserting the number into parenthesis after the table name:
SELECT * FROM flickr.photos.interestingness(20)
If you want to start at an offset in the table, you can use a second parameter in the parentheses:
SELECT * FROM flickr.photos.interestingness(10,20)
This returns 20
items starting at offset 10
. If you'd like YQL to process as much data as it can in 30
seconds (the default timeout for any YQL statement), you can use (0
) as the table sizing parameter.
Often, you don't want all of the data in each item that's coming back from a table. For example, perhaps you only
want the title of an RSS feed item. The "what" part of the SELECT
statement enables you to select
(or project) which parts of the table data that satisfy the WHERE
clause are actually returned, including any parent parts of the structure that are necessary. Each dot-style path included in the comma separated list selects a sub-tree to return for each item in the table.
Consider a web search example:
select * FROM search.web where query="pizza"
Each item returned from this result looks like this:
<abstract><![CDATA[Official site for the international <b>pizza</b> chain, with online
ordering, <b>Pizza</b> Hut store finder, coupons, and menu. Also in Spanish.]]=</abstract>
If you need only the title and URL, for example, you can project only those pieces of the result:
SELECT title, url FROM search.web where query="pizza"
This query returns:
One of the most powerful aspects of YQL is its ability to join data across different tables (and web services) through sub-SELECT
s. For example, to get your own profile information, use this statement:
SELECT * FROM social.profile where guid=me
table returns the profile for a given GUID, the identifier for a person within Yahoo! (and me
is a convenience constant that expands the current user's profile key or identifier).
Do this to get your friends:
SELECT * FROM social.connections where owner_guid=me
The result of this table query is a list of friends that don't have much information; it's mostly a list of GUID's.
Now, you can combine both of these tables using a join to obtain all the profiles for each of your friends:
SELECT * FROM social.profile WHERE guid IN
(SELECT guid FROM social.connections WHERE owner_guid=me)
In essence, the sub-SELECT
needs to return the foreign-keys that are used for the join with
. The project part of the sub-SELECT
needs to point to
a leaf node of the sub-SELECTs
results, like the GUID for each social connection returned.
You can do joins whenever you get a table that provides input to another table. Consider getting an international weather forecast for any location on the planet. The weather.forecast table provides forecasts but requires a location that is either a U.S. ZIP code or an international location identifier. The weather.com site provides a great service that returns this identifier from a string. So now you can join these two services together to get the weather using a string:
SELECT * FROM weather.forecast WHERE location IN
(SELECT id FROM xml WHERE
This query also uses YQL's ability to bring in data from any place on the Internet through the XML table.
Piping Output Through Post-query Functions
After a SELECT
has finished filtering and projecting the items, YQL allows you to add additional "post-query" functions to the results before they are returned. Appending the "pipe" symbol to the end of the statement performs these functions:
SELECT … |sort(field=item.date)
SELECT … |unique(field=item.title)| …