n March of this year, Amazon.com opened its servers to developers with an online-storage offering. As described in a previous DevX news article, Amazon.com’s Simple Storage Service (S3) provides access to a large volume of storage capacity using a relatively simple Web-services API. Unofficial testing of the service proves that it is fast enough to support applications requiring near real-time access to data. Further, the service’s pricing makes it a great option for a diverse set of applications requiring fast and reliable online-storage.
Amazon’s S3 offering is only the most recent step in a larger trek to provide a wide range of developer-facing services. Among Amazon.com’s current Web service offerings are the E-Commerce Service (access extensive information about books, music, movies that Amazon sells), Historical Pricing (provides historical sales information and volume), and the Alexa line of offerings (enables developers to build custom search engines based on an extensive repository of Web pages and associated information available on the Internet). Amazon also has the Amazon Simple Queue Service and the Mechanical Turk Service in beta at the moment.
This article focuses on delivering a simple application built to store information using the Amazon S3 service. You will learn how the service works from a high-level, including how the storage space is organized. We will then drill down on the methods used to store and retrieve the data using the S3 SOAP API. Finally, you can inspect a sample application using S3’s SOAP interface. For a more complete marketing-style description of the S3 service, you are encouraged to visit Amazon’s official S3 homepage.
|Author Note: The Amazon S3 service was designed to support a very wide range of applications, which might require very large data storage requirements and possibly specific access control mechanisms. The services’ APIs support methods for handling large data objects efficiently and for enabling fine-grained access control. These methods will not be covered in this article. Instead, this article will focus on how you manipulate data using the service. You should consult Amazon’s published documentation for more information on these topics.|
Amazon S3 Pricing
Amazon has positioned S3 as a zero entry-cost solution, requiring no up-front costs to developers signing up for the service; in other words, the cost of the service depends solely on the data volume you use?there are no minimum or recurring overhead fees. The cost of storage using the service is $0.15 per GB per month, so 100 GB of storage would cost only $15 per month. However, in addition to storage costs, there are significant costs for transferring the data: $0.20 per GB transferred?Amazon charges for data transfer in both directions. Nevertheless, it’s difficult to imagine building a storage service that would cost less that could match the capacity, speed, and reliability claimed by Amazon.
Still, the cost of data transfer might all but eliminate the feasibility of using the service for applications such as backing up personal computers. For a 50 GB backup set, the user would pay $7.50 per month for storage and at least $10.00 per month for data transfer. That $10.00 is likely to increase dramatically, because traditional backup applications perform multiple from-scratch backups monthly and many more incremental backups. There would be an additional charge for retrieving data from the service. At $20-25 per month, S3 seems too costly for use as a personal backup service. However, there are many other options for this type of service, as you will discover once you understand the S3 API.
High-Level Functionality Overview
As advertised by Amazon.com, “Amazon S3 is intentionally built with a minimal feature set.” It provides the developer with the ability to read, write, and delete “objects.” At the highest level, objects are stored in “buckets.” Buckets provide the developer with a way to associate a namespace with each object. In addition, the system can maintain access control to objects at the bucket level, simplifying the maintenance that would otherwise be associated with managing access control at the object level. This scheme also requires bucket names to be unique across the entire S3 service. Unfortunately, each S3 account is limited to 100 buckets. It is unclear why Amazon S3 would limit the number of buckets to 100; that limitation seems to limit the diversity of applications that can be built using S3.
There’s a five gigabyte limit to objects stored in S3?but you can store an unlimited number of objects in each bucket. You reference objects using a unique developer-assigned key?object key uniqueness is enforced at the bucket level. While updates to objects are not officially supported, you can accomplish the same result by writing an object with the same name as an existing object, effectively replacing it.
In addition to storing object data, developers can associate metadata with each object. Metadata entries are key-value associations that are stored with the object. Developers may create any metadata entries necessary to support the application: Amazon doesn’t publish a maximum number of metadata entries that may be associated with an object.
Amazon S3 SOAP API Methods
The S3 SOAP interface is composed of two classes of method calls: operations on buckets and operations on objects, as defined below:
Operations on Buckets:
- CreateBucket creates a new bucket with the specified name. Because bucket names must be unique across the entire Amazon S3 service, the system returns an error if a bucket with the specified name already exists, even if that bucket name is owned by another account holder.
- DeleteBucket deletes the bucket with the specified name. S3 will return an error if you attempt to delete a bucket containing any objects. To delete a bucket, you must first delete all the objects in the bucket.
- ListBucket returns a list of the objects contained in the specified bucket. Because there’s no limit to the number of objects that you can store in a bucket, this method automatically supports paging through the list of objects contained in a bucket.
- ListAllMyBuckets returns the names of all the buckets owned by the Amazon.com account specified in the request.
- GetBucketAccessControlPolicy and SetBucketAccessControlPolicy get and set the access control policy assigned to the specified bucket. As these are more advanced, I won’t cover these methods in this article.
Operations on Objects:
- PutObjectInline places an object into a bucket. You provide the name of the bucket, the name of the object, and the object contents as parameters to the method call.
- GetObject retrieves an object with the specified bucket and object names.
- DeleteObject deletes an object with the specified bucket and object names.
- PutObject places an object into a bucket, but instead of specifying the object data in a parameter to the method call as with PutObjectInline, you provide it in a DIME attachment. I won’t cover this method in this article.
- GetObjectExtended is similar to the GetObject method, but supports advanced features such as reading specific byte-ranges from the object and conditional reads. I won’t cover this method in this article.
- GetObjectAccessControlPolicy and SetObjectAccessControlPolicy get and set the access control policy assigned to the specified object. I won’t cover these methods in this article.
When you sign up for an Amazon Web services (AWS) account, the system assigns you an Access Key ID and a Secret Access Key that uniquely identify you as the account owner. You use these keys for authentication when making Web service calls.
For each Web service call, you must specify three parameters to authenticate successfully: the Access Key ID, a timestamp specifying the time the method call was made, and a signature for the method call. These parameters are described below, and are further documented in the downloadable code for this article.
You pass the Access Key ID exactly as it was assigned to your AWS account. You can generate the timestamp from the current system time, formatting it as specified in the AWS documentation. Here’s a VB.NET code sample that generates a timestamp in the specified format:
strISOTimestamp = TimeStamp.ToUniversalTime.ToString( "yyyy-MM-ddTHH:mm:ss.fffZ", _ System.Globalization.CultureInfo.InvariantCulture)
For example, if provided with a time string of 4/1/2006 8:31:13.891 PM, the preceding timestamp code would produce a timestamp string of 2006-04-02T02:31:13.891Z.
Building the signature is more complex, but you can copy it directly from the sample code. It uses a HMAC-SHA1 digest of the string AmazonS3
As an example, using the timestamp shown above with a PutObjectInline method call, the string would be AmazonS3PutObjectInline2006-04-02T02:31:13.891Z. To create the signature, you encrypt that concatenated string using the HMACSHA1 algorithm and your Secret Access Key. From the encrypted stream, a base-64 hash is extracted to produce the signature. You can see a sample of the code to do this below. You pass the signature the code produces as the third of the three authentication parameters used in all S3 method calls.
Public Function aws_GetSignature _ (ByVal Operation As String, ByVal TimeStamp As _ DateTime) As String Dim strSig_Raw As String Dim strSig_UTF8 As Byte() Dim strSignature As String Dim objUTF8Encoder As UTF8Encoding Dim objHMACSHA1 As HMACSHA1 strSig_Raw = "AmazonS3" & Operation & _ aws_GetISOTimestamp(TimeStamp) objUTF8Encoder = New UTF8Encoding() strSig_UTF8 = objUTF8Encoder.GetBytes(strSig_Raw) objHMACSHA1 = New HMACSHA1( _ objUTF8Encoder.GetBytes(m_strSecretAccessKey)) strSignature = Convert.ToBase64String _ (objHMACSHA1.ComputeHash( _ objUTF8Encoder.GetBytes( _ strSig_Raw.ToCharArray()))) Return strSignature End Function
Getting Started with Amazon S3
To get started using Amazon S3, you must first create an Amazon Web services account. You’ll need to provide some minimal personal information to create the account. If you already have an Amazon.com account (an account you use to make purchases from Amazon.com), you can simply login to your existing account.
|Figure 1. Signing Up: After logging into your Amazon Web services account, navigate to http://aws.amazon.com/S3 and click this button.|
After logging in, you must enable the S3 service for your account. Navigate to http://aws.amazon.com/S3 and click the Amazon S3 link in the left panel. That will take you to a page where you should see and click the “Sign Up for Web service” graphic link (see Figure 1).
If you haven’t already provided a credit card for normal Amazon billing, you will be asked to do so at this time. After enabling the S3 service for your account, you can click “View Access Key Identifiers” link to view your Access Key ID and Secret Access Key (see Figure 2). You’ll see a screen that displays your Access Key ID and your Secret Access Key as shown in Figure 3.
At this point your account is configured properly to use S3.
The Sample Application
|Figure 4. Change your Access Key ID and your Secret Access Key in the project settings before attempting to run the sample application.|
Before attempting to run the sample code, you must first modify the project’s application settings to update your Access Key ID and Secret Access Key. Using Visual Studio 2005, there are two ways to change the settings. First, you can right-click the project name and select Properties. Then select the Settings tab, and modify the settings appropriately. See Figure 4.
The second method of modifying the identifiers is to modify the values directly in the app.config file. Now you’re ready to test the application.
To exercise the functionality of the sample application, enter a bucket name and click the “Create Bucket” button. The list of buckets should automatically refresh and should now include the name of the bucket you just entered. Click the “View Contents” link for the bucket you just created, and verify that there are no objects listed in the bucket. Now create several sample objects in the bucket. Your sample application should look similar to the example shown in Figure 5.
Click the “View Object Data” link for one of the sample objects you created. A dialog will appear showing the data contained in the selected object?see Figure 6.
In addition to displaying the object’s name and value, the dialog displays metadata entries associated with the object.
The sample application focuses on the AmazonS3 class, which provides connectivity to the AmazonS3 Web services interface, constructs the required authentication parameters, and calls the specific Web methods.
As described above, each of the AWS method calls includes three parameters used for authentication: the Access Key ID, a timestamp, and the signature. As illustrated in the code, it is important that that timestamp (datetime variable) used in the second authentication parameter be the same timestamp used to generate the signature. Also, when generating the signature, it is important that you use the correct method name to generate the signature.
The primary goal of the sample application provided with this article is to provide a rapid introduction to the Amazon S3 SOAP API. Amazon S3 provides a considerably more extensive set of features for reading and writing objects as well as managing access control to buckets and objects. These features are documented in the Amazon S3 Developer Guide, and I urge you to explore them.