ne of Yahoo!’s key goals is to become more open through the Yahoo! Open Strategy (Y!OS). At the heart of Y!OS is an open and extensible platform that allows developers to rapidly access Yahoo! network data and develop applications with access control using an open authentication standard.
Yahoo! makes structured data available to developers through its web services, such as Flickr and Yahoo! Local, and through other sources like RSS feeds or CSV documents. There are also numerous external web services and APIs outside of Yahoo! that provide valuable data. For example, Programmableweb.com shows more than a thousand APIs available on topics ranging from project management to the Bible.
These disparate services require developers to locate the correct URLs for accessing the APIs and the documentation for querying them. Data remains isolated and separated, requiring developers to combine and work on the data after it’s returned to them.
What Is YQL?
The Yahoo! Query Language (YQL) platform provides a mediator service that enables developers to query, filter, and combine data across the web. YQL exposes a SQL-like SELECT syntax that is both familiar to developers and expressive enough for getting the right data. Through YQL’s SHOW and DESC commands, Yahoo! attempts to make YQL self-documenting, enabling developers to discover the available data sources and structure without opening another web browser or reading a manual.
The SELECT statement is the primary verb for YQL and borrows much from a SQL-like syntax:
SELECT what FROM table WHERE filter
“Tables” in YQL refer to external data sources that often contain very large collections of structured data. You can think of each item in a table as a “row” in a more traditional relational database. All data in YQL is treated as XML, and if the underlying table source does not provide XML, it converts into an XML-like representation.
You can join any table or data source with another data source through sub-selects—if each table has a matching value—similar to a foreign key in SQL.
Consider the following YQL statement:
SELECT filename,thumbnail_url FROM search.images(50) WHERE query="hat" AND mimetype LIKE "%jpeg%"
AND height<400 AND width<400 LIMIT 2
This returns the thumbnail URL and filename of up to two results FROM the search.image table for images of "hats" that are jpeg encoded and under a certain size:
yahoo:lang="en-US" yahoo:updated="2008-12-02T12:50:59Z" yahoo:uri="http:// query.yahooapis.com/v1/
http://boss.yahooapis.com/ysearch/images/v1/hat?count=50&start=0&format=xml 320 258 2008.12.01.15:20 RedHat4151.jpg http://ac4.yt-thm-a01.yimg.com/image/45b413c52b2d4a88 red-hat-tshirt-p-lg.jpg http://ac4.yt-thm-a01.yimg.com/image/239a055f9d7c0fb8
You can run this query by simply copying the following URL into your browser:
The SELECT statement's "what" and local filtering clauses use a "dot" style syntax to address parts in the table item data structure. Each dot-separated part in the field path refers to the name of the element (for JSON and XML documents), an attribute, or CDATA/text portion of the element (for XML documents). The more "dots" in a path, the deeper into the data structure the final element is found. This dot syntax provides a simple and expressive mechanism to address parts of an item's structure.Consider the following XML item.
some item title http://foo 37.123
The following are valid dot-paths that refer to different parts of this structure:
The namespace of any attributes or elements in XML is ignored.
Remote vs. Local Processing
YQL accesses a diverse range of services and data. Many queries access data sources with thousands of items in them. To manage these large remote set sizes and to work with existing web services without modification, YQL provides two distinct ways?remote and local?to filter the data being queried in the table.
Remote filtering simply means pushing as much of the filter (or query) to the remote data provider or service. The most common way that services enable their collections to be queried and filtered is through REST-style syntax and query/path parameters. Because these often do not map closely to the result structure that is returned by that query, YQL exposes them explicitly as remote "keys" in the WHERE cause, some of which must be provided. For example, the local.search table requires both "query" and "location" keys (what to search for and where):
SELECT * FROM local.search WHERE query="sushi" AND location="san francisco, ca"
Keys support only "equality" comparisons. Describing a table (DESC local.search) shows what keys or combinations of keys are possible for a given table.
After data arrives at the YQL service, you can filter it through any combination of ANDs, Ors, and comparators. You can refer to particular parts of each item in the result set through the dot-style syntax (the root/item level is not needed). For example, consider the local example above. If you are interested only in restaurants with a high average review, you can locally filter the results using the Rating element like this:
SELECT * FROM local.search WHERE query="sushi" AND location="san francisco, ca" AND Rating.AverageRating>4
By default, YQL only provides 10 items from a table. If you want to get more data locally processed, or return more results, you need to instruct YQL to retrieve more items by inserting the number into parenthesis after the table name:
SELECT * FROM flickr.photos.interestingness(20)
If you want to start at an offset in the table, you can use a second parameter in the parentheses:
SELECT * FROM flickr.photos.interestingness(10,20)
This returns 20 items starting at offset 10. If you'd like YQL to process as much data as it can in 30 seconds (the default timeout for any YQL statement), you can use (0) as the table sizing parameter.
Often, you don't want all of the data in each item that's coming back from a table. For example, perhaps you only want the title of an RSS feed item. The "what" part of the SELECT statement enables you to select (or project) which parts of the table data that satisfy the WHERE clause are actually returned, including any parent parts of the structure that are necessary. Each dot-style path included in the comma separated list selects a sub-tree to return for each item in the table.
Consider a web search example:
select * FROM search.web where query="pizza"
Each item returned from this result looks like this:
pizza chain, with online
ordering, Pizza Hut store finder, coupons, and menu. Also in Spanish.]]=
http://lrd.yahooapis.com/_ylc= =10t5eod6j/**http%3A//www.pizzahut.com/ 2008/12/01 pizzahut.com]]= 12880 Pizza Hut]]= http://www.pizzahut.com/
If you need only the title and URL, for example, you can project only those pieces of the result:
SELECT title, url FROM search.web where query="pizza"
This query returns:
Pizza Hut]]= http://www.pizzahut.com/
One of the most powerful aspects of YQL is its ability to join data across different tables (and web services) through sub-SELECTs. For example, to get your own profile information, use this statement:
SELECT * FROM social.profile where guid=me
The social.profile table returns the profile for a given GUID, the identifier for a person within Yahoo! (and me is a convenience constant that expands the current user's profile key or identifier).
Do this to get your friends:
SELECT * FROM social.connections where owner_guid=me
The result of this table query is a list of friends that don't have much information; it's mostly a list of GUID's.
Now, you can combine both of these tables using a join to obtain all the profiles for each of your friends:
SELECT * FROM social.profile WHERE guid IN(SELECT guid FROM social.connections WHERE owner_guid=me)
In essence, the sub-SELECT needs to return the foreign-keys that are used for the join with the outer-SELECT. The project part of the sub-SELECT needs to point to a leaf node of the sub-SELECTs results, like the GUID for each social connection returned.
You can do joins whenever you get a table that provides input to another table. Consider getting an international weather forecast for any location on the planet. The weather.forecast table provides forecasts but requires a location that is either a U.S. ZIP code or an international location identifier. The weather.com site provides a great service that returns this identifier from a string. So now you can join these two services together to get the weather using a string:
SELECT * FROM weather.forecast WHERE location IN (SELECT id FROM xml WHERE url="http://xoap.weather.com/search/search?WHERE=prague" AND itemPath="search.loc")
This query also uses YQL's ability to bring in data from any place on the Internet through the XML table.
Piping Output Through Post-query Functions
After a SELECT has finished filtering and projecting the items, YQL allows you to add additional "post-query" functions to the results before they are returned. Appending the "pipe" symbol to the end of the statement performs these functions:
SELECT … |sort(field=item.date)SELECT … |unique(field=item.title)| …
Using the Service
Getting Started: The Console
|Figure 1. YQL Console Explore what YQL can do.|
The easiest way to explore what YQL can do, and to start using the service, is to use the console (see Figure 1). The right-hand side of the page shows the set of "built-in" tables that YQL exposes, primarily most of Yahoo!'s web services. You can enter the YQL commands into the main text area at the top of the page and the results display underneath. Click any of the table names or sample queries to cause examples to "run" in the console.
Using the Web Service
After using the console you've probably come up with a few queries that you'd like to execute from your application. Accomplish this by sending a GET request to one of the two YQL web service endpoints, either:
The default output is XML for both endpoints, regardless of the table source that you access. If you like, JSON can append an output parameter to the call:
In addition, the JSON response also supports JSONP to allow script includes in the browser:
Authentication and Access Control
Each table in YQL has an authentication scope required for access. Typically, this corresponds to the type of data the table exposes. Private data, like user profile information, is limited to applications that can verify they are acting on behalf of a user. Public data, like web search results, is generally accessible to any application.
Public tables are accessible through the /v1/public/yql endpoint. You access private tables (and public tables) through the main /v1/yql endpoint, which requires that you sign the call using OAuth. The public URL has stricter rate limiting, so if you plan to use YQL heavily, we recommend you access the OAuth-protected URL.
OAuth is an open standard that enables providers of user data to share some or all of that data with an external application with the user's explicit permission. YQL (and Yahoo!) uses this standard to enable users to share their information, like their name or social network details, with third-party developers (you!).
The second step, if you need it, is to get permission from the user using your application to access their private data in Yahoo!. After you've done that, you'll have an "access token" and an "access token secret."
You can find more information here. Getting a request signed using OAuth may seem a little daunting at first. Fortunately, there are libraries for most languages that take care of the details for you. One of these, the Yahoo! social SDK provides a PHP library that makes accessing YQL and other Yahoo web services easy:
See Listing 1 for an example of how to get the most of your social information from Yahoo!
YQL enables developers to easily access, combine, and filter remote data to fit their application requirements. It provides consistent and unified semantics for accessing user-specific private data and public data in Yahoo!, as well as the rest of the web. Data is returned in a normalized form.More detailed documentation and information can be found here.To learn more about Y!OS and to see how it provides new avenues for you to reach and connect with Yahoo! users like never before, visit Yahoo!'s Open Strategy home page.