n March of this year, Amazon.com opened its servers to developers with an online-storage offering. As described in a previous DevX news article
, Amazon.com's Simple Storage Service (S3) provides access to a large volume of storage capacity using a relatively simple Web-services API. Unofficial testing of the service proves that it is fast enough to support applications requiring near real-time access to data. Further, the service's pricing makes it a great option for a diverse set of applications requiring fast and reliable online-storage.
Amazon's S3 offering is only the most recent step in a larger trek to provide a wide range of developer-facing services. Among Amazon.com's current Web service offerings
are the E-Commerce Service (access extensive information about books, music, movies that Amazon sells), Historical Pricing (provides historical sales information and volume), and the Alexa line of offerings (enables developers to build custom search engines based on an extensive repository of Web pages and associated information available on the Internet). Amazon also has the Amazon Simple Queue Service and the Mechanical Turk Service in beta at the moment.
This article focuses on delivering a simple application built to store information using the Amazon S3 service. You will learn how the service works from a high-level, including how the storage space is organized. We will then drill down on the methods used to store and retrieve the data using the S3 SOAP API. Finally, you can inspect a sample application using S3's SOAP interface. For a more complete marketing-style description of the S3 service, you are encouraged to visit Amazon's official S3 homepage.
Amazon S3 Pricing
|Author Note: The Amazon S3 service was designed to support a very wide range of applications, which might require very large data storage requirements and possibly specific access control mechanisms. The services' APIs support methods for handling large data objects efficiently and for enabling fine-grained access control. These methods will not be covered in this article. Instead, this article will focus on how you manipulate data using the service. You should consult Amazon's published documentation for more information on these topics.
Amazon has positioned S3 as a zero entry-cost solution, requiring no up-front costs to developers signing up for the service; in other words, the cost of the service depends solely on the data volume you usethere are no minimum or recurring overhead fees. The cost of storage using the service is $0.15 per GB per month, so 100 GB of storage would cost only $15 per month. However, in addition to storage costs, there are significant costs for transferring the data: $0.20 per GB transferredAmazon charges for data transfer in both directions. Nevertheless, it's difficult to imagine building a storage service that would cost less that could match the capacity, speed, and reliability claimed by Amazon.
Still, the cost of data transfer might all but eliminate the feasibility of using the service for applications such as backing up personal computers. For a 50 GB backup set, the user would pay $7.50 per month for storage and at least $10.00 per month for data transfer. That $10.00 is likely to increase dramatically, because traditional backup applications perform multiple from-scratch backups monthly and many more incremental backups. There would be an additional charge for retrieving data from the service. At $20-25 per month, S3 seems too costly for use as a personal backup service. However, there are many other options for this type of service, as you will discover once you understand the S3 API.
High-Level Functionality Overview
As advertised by Amazon.com, "Amazon S3 is intentionally built with a minimal feature set." It provides the developer with the ability to read, write, and delete "objects." At the highest level, objects are stored in "buckets." Buckets provide the developer with a way to associate a namespace with each object. In addition, the system can maintain access control to objects at the bucket level, simplifying the maintenance that would otherwise be associated with managing access control at the object level. This scheme also requires bucket names to be unique across the entire S3 service. Unfortunately, each S3 account is limited to 100 buckets. It is unclear why Amazon S3 would limit the number of buckets to 100; that limitation seems to limit the diversity of applications that can be built using S3.
There's a five gigabyte limit to objects stored in S3but you can store an unlimited number of objects in each bucket. You reference objects using a unique developer-assigned keyobject key uniqueness is enforced at the bucket level. While updates to objects are not officially supported, you can accomplish the same result by writing an object with the same name as an existing object, effectively replacing it.
In addition to storing object data, developers can associate metadata with each object. Metadata entries are key-value associations that are stored with the object. Developers may create any metadata entries necessary to support the application: Amazon doesn't publish a maximum number of metadata entries that may be associated with an object.