The json Module
JSON stands for "JavaScript Object Notation," which is a popular data exchange format, especially for AJAX applications. JavaScript can evaluate JSON and have a ready-made data structure to work on. JSON can represent infinitely nested lists (Arrays in JavaScript) and dictionaries (Object in JavaScript) of numbers, strings and Booleans. You can find the JSON specification in
RFC 4627.
Python 3.0's json module provides a pickle-like interface that uses
loads and
dumps calls. Here's a simple example:
>>> d = {'a': [(1, 4), (2, 5), (3, 6)], 'b': [1, 2, 3, 1, 2, 3]}
>>>json_text = json.dumps(d)
>>>print(json_text)
{"a": [[1, 4], [2, 5], [3, 6]], "b": [1, 2, 3, 1, 2, 3]}
>>> json.loads(json_text)
{'a': [[1, 4], [2, 5], [3, 6]], 'b': [1, 2, 3, 1, 2, 3]}
As you can see, JSON is pretty similar to Python lists and dictionaries. Tuples are converted to arrays, and single quotes are converted to double quotes, but overall, it should look pretty familiar.
As a test, I tried calling the Google AJAX search API, which returns data in JSON format. I used the urllib module to get the results for the query "python rocks," which returned some JSON. I then used the json module to decode the results into accessible Python data structures:
from urllib.request import urlopen
from urllib.parse import urlencode
import json
query = urlencode(dict(q='python rocks'))
#url_mask = 'http://ajax.googleapis.com/ajax/services/search/web?v=1.0&{0}&start={1}&rsz=large'
#url = url_mask.format(query, 0)
url_mask = 'http://ajax.googleapis.com/ajax/services/search/web?v=1.0&{0}'
url = url_mask.format(query)
# The [2:-1] slicing is to get rid of the "b'" prefix and the "'" suffix
text = str(urlopen(url).read())[2:-1]
response = json.loads(text)
results = response['responseData']['results']
for r in results:
print(r['url'])
Output:
http://pythonrocks.com/
http://mail.python.org/pipermail/python-list/2000-September/051415.html
http://personalpages.tds.net/~kent37/stories/00020.html
http://personalpages.tds.net/~kent37/blog/
The response data structure contains a lot of information. I drilled down directly to the results (
response['responseData']['results']). Each result is a dictionary that uses the following keys:
['GsearchResultClass', 'visibleUrl', 'titleNoFormatting', 'title', 'url', 'cacheUrl', 'unescapedUrl', 'content'].
By default, you get only four results. I added a couple of query parameters (see the commented lines) to get more results and json broke down. It turns out that the json module can't handle some of the unicode properly (even though it's valid JSON) that the Google search returns. This
comparison between various Python implementations of json modules reports some bugs in Unicode handling, even though the standard library json module is based on the simplejson module—which actually gets perfect marks in the unicode part of the comparison.
The ssl Module
The ssl (Secure Socket Layer) module is a wrapper around the OpenSSL (if installed) library. OpenSSL should be available on any modern OS. The ssl module lets you create encrypted sockets and authenticate on the other side. It can be used for both client-side (connect to a secure server) and server-side (accept secure connections from clients) applications. The main function is
wrap_socket(), which takes a standard network socket and returns an SSLSocket object. You need a certificate to connect. Certificates are pairs of private and public keys, and are used both for identification/authentication and for encrypting/decrypting the payload. I don't have access to a certificate (which needs to be issued by a certificate authority), so I couldn't test the ssl module; however, here's some sample code from the ssl module's documentation for client-side operation:
import socket, ssl, pprint
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
# require a certificate from the server
ssl_sock = ssl.wrap_socket(s,
ca_certs="/etc/ca_certs_file",
cert_reqs=ssl.CERT_REQUIRED)
ssl_sock.connect(('www.verisign.com', 443))
print (repr(ssl_sock.getpeername()))
print (ssl_sock.cipher())
print (pprint.pformat(ssl_sock.getpeercert()))
# Set a simple HTTP request -- use httplib in actual code.
ssl_sock.write("""GET / HTTP/1.0\r
Host: www.verisign.com\r\n\r\n""")
# Read a chunk of data. Will not necessarily
# read all the data returned by the server.
data = ssl_sock.read()
# note that closing the SSLSocket will also
# close the underlying socket
ssl_sock.close()