epending on which side of the consumer-business equation you are on, you might either expect to perform a transaction with another machine or you might expect a person to be on the other end of the transaction. When you run a business that requires legitimate user-accounts, you may be surprised to find that some of your accounts may belong to a single person?one using a skillfully-crafted script running on his machine to create many “virtual” accounts with your business. These accounts tie up your resources, bandwidth, and other time and materials.
The process by which such scripts create accounts is called identity spoofing, and?for most simple sites?can be accomplished rather easily. All the spoofer needs to do is to create an HTML form that contains fields identical to those in your login form and then “HTTP-POST” the data to your server, where your user-account creation process takes place. The problem is even worse if you allow your login forms to be processed via “HTTP-GET”. After successfully creating an account once, there’s nothing stopping the spoofer from automating the whole process.
With an automated script, spoofers can literally create hundreds of accounts with a single command. If your server doesn’t validate the data, you risk being swamped by a huge amount of useless “virtual” accounts. If your server does validate it, the sheer number of requests can tie up your system resources and slow down or crash your application.
Another potential spoofing problem occurs because it’s easy to write scripts that log in using the same user account from many different Web browsers. While this may not be a problem for some applications, it can waste resources and bandwidth, particularly when your business application allows clients to download files or other resources. Some applications check whether a user is already logged in before allowing them to create another instance of the application in their browser. Multiple-client attacks on these applications tie up resources such as database connections and system memory as the server repeatedly performs the login check.
The CAPTCHA Solution
There are many ways to prevent spoofing the user-account creation process. This article discusses a technique called “word-verification technology” and the pros and cons of implementing it in your applications. Popular sites such as Yahoo and MSN Hotmail implement it in their applications to help reduce spam originating from their mail domain accounts. Yahoo’s word-verification technology, called the “CAPTCHA Project” was developed with Carnegie Mellon University.
A captcha is a technique for differentiating humans from machines. A captcha method presents a problem that’s relatively easy for humans to solve, but difficult or impossible for computers to solve at this time. One common captcha method presents users with an image containing some embedded text. Users must decipher the text and enter that along with the submitted login or user-account creation form. For example, Figure 1 shows a sample captcha image
|Figure 1. Sample Captcha Image: The image contains a humanly-readable randomized mixture of capital and lower-case letters obfuscated by the patterned background and some additional lines that help to make it difficult for non-human text-recognition schemes to read the text accurately.
A human can easily read “PVHKf” from the image in Figure 1, but it’s far more difficult for a machine to read the letters. For a machine to successfully decipher the text, it would need to read the characters using an optical character recognition (OCR) engine. While OCR engines are becoming increasingly accurate, they’re still easily stymied by colored or patterned backgrounds, and/or extraneous lines or dots mixed in with the letters. You can use the difference in machine/human reading capability to your advantage by presenting text that you know is difficult for OCR engines to read, giving you a modicum of assurance that the request for a new account or a login comes from a human user rather than a machine.
It’s worth noting that no captcha yet devised is completely hack-proof. OCR technologies have evolved and advanced to the point where a concerted attack can break captcha techniques. Particularly, simple text-obfuscation techniques are subject to sophisticated attacks that use OCR to scan the captchas and successfully read the characters inside the image. In response, you can make the captchas more difficult to read by obscuring the text with even more random arcs, lines, or background patterns; however, if you overdo it deciphering the text becomes a challenge and a chore for legitimate users.
Fortunately, OCR-based attacks aren’t yet either perfect or common, so the word-verification technique discussed here probably offers sufficient protection to deter all but the most determined attackers.
How Captchas Work
The login or the user-creation process must include logic whereby the application compares the letters drawn onto an image with some text by the user after reading the characters in the image. If the drawn characters and the user-entered characters match, you can assume the user is actually a person rather than an automated script running from a machine.
Verifying the match on the server doesn’t solve the problem of spoofing attacks that simply throw large numbers of requests at the server. To achieve a robust solution, the application must meet these constraints:
- To minimize system resource usage, the application should not use a data store of any sort?it should not write files, store information to databases, etc.
- The application may not use sessions to manage state. Sessions are not a scalable solution both because they’re stored on the server (violating the preceding constraint) and because they may not work well in typical Web-farm load-balanced environments
- The application must protect itself from multiple-request spoofing attacks by validating the match between the captcha characters and the user’s response on the client rather than on the server.
To satisfy the constraints, it becomes obvious that you need to store the generated random letters on the client. This stored data must be encrypted so spoofers can’t just read it and submit it via an automated script. The sample application solves that by hashing the letters using the SHA1 hashing technique.
A hash is a value or key generated from the content of a string. Once hashed, applications can use the (usually) shorter hash value in place of the original string. Hashing is a lot faster than encryption as it is one-way; you can’t recover the original string from the hashed value. There are a number of hashing algorithms available. The SHA1 hash algorithm produces hash values such that (a) hashing a given string always produces the same hash value, and (b) it’s extremely unlikely for any other string to produce the same hash value. Therefore, if you have the hash values for two strings, you can compare the hash values rather than the strings themselves to determine if the original strings are identical. You can read more about the SHA1 hash algorithm here.
In this case, you can hash the characters embedded in the captcha on the server, and then send that to the client saved in the form of a HTTP cookie. After the client has sent back the user-entered text read from the image to the server, the server hashes the user-entered data and compares that with the hash value stored in the HTTP cookie. If the two hash values match, the user has successfully entered the same characters embedded in the captcha.
Next, you need to decide exactly how and where to store the hash value when you create the page.
Although it’s possible to create a Web Form that returns both generated images and content, you can’t return them both at the same time. To return a JPEG-formatted image, for example, the page must set the Response.ContentType to “image/jpeg,” whereas to return HTML content, the Response.ContentType should be “text/html“. Therefore the simplest way to solve the problem is to use two pages: One to return the text content, and another to create and return the captcha image.
When the server creates the text content, it will include a tag. The browser will parse the img tag and request the image from the server as a separate request. That request will generate the random string and the image, but it can’t write text content back to the browser, because, for the browser to render a response as an image, the Response.ContentType must be “image/jpeg.” Thus you must come up with a scheme to pass the hashed value back to the client when the browser requests the image so that the client can validate the user entry against the hash value.
Because the constraints remove sessions, files, or database fields as a storage mechanism, you’re left with cookies. Before generating the image out of random letters, generate the hash and write a cookie containing the hash value into the HTTP-Headers. That cookie gets sent to the client as part of the response header. As there’s no text contained in the HTTP-Content, the Response.ContentType of “image/jpeg” will render without a hitch.
So there you have a solution that meets all the constraints. Time to dig into the code:
Exploring the Captcha Code
The main page includes some user input controls, the random image file, and a submit button. The HTML for the page (Index.asp) looks something like this:
Note that the image control tag (asp:image) is actually a parallel version of a standard HTML image tag except that the URL points to the page that returns the image rather than directly at some image file.
There are two critical parts of the application logic to make this happen. First you need a random string generator:
The following GenerateRandomString function takes an integer parameter that signifies how many letters the random string should hold. The range of acceptable characters is between a-z and A-Z. You can modify the range to include numeric values if you want to create an alpha-numeric string instead.
Public Shared Function GenerateRandomString( ByVal iLength As Integer) As String Dim iStartBC, iEndBC, iStartSC, iEndSC, _ iCount, iTmpC As Integer Dim sRandomString As String Dim rRandom As New Random( _ System.DateTime.Now.Millisecond) ' Convert characters into their integer equivalents ' (their ASCII values) iStartSC = Asc("a") iEndSC = Asc("z") iStartBC = Asc("A") iEndBC = Asc("Z") ' Now loop as many times as is necessary to build ' the string length we want While (iCount = iStartSC) And (iTmpC = iStartBC) And _ (iTmpC
Creating a Hash Value
Next, you need a hashing algorithm to create the hash value for the random string passed into this function. However, that doesn't solve all the problems. A spoofer can easily spot the vulnerability in this technique once he notices that all the page does is send a hashed version of a string to the client, comparing that with the hashed version of whatever the user entered. He can create a spoof to:
- Post a hash version of any string he chooses.
- Post his chosen string in clear text to the server's processing logic.
All that the processing logic does is:
- Hash the clear text.
- Compare that value with the posted hash value.
So, you need a way to prevent the spoofer from simply hashing any string he chooses. Yet, because hashes are a "one-way" function, you need to be able to verify that the hash could only have been generated from your application.
Creating a Unique Hash
This is where a Machine Authentication Check (MAC) comes in handy. Just as the name implies, a MAC is a technique to generate a unique hash value that could only come from one particular machine. You append this unique value to the plain text before hashing the whole message. When you do that, only the machine that knows the secret key can verify the hash. No one else will know how to generate the exact same hash even if they know the plain text message. Here are the steps:
- Generate a unique GUID key (MAC key) with the Guid.NewGuid function.
- Save this MAC key in the Web.Config file of the application.
- Retrieve the stored MAC key and append it to the string you want to hash.
- Hash the entire message.
- Save the message in a cookie.
- To verify, all your form needs to do is to HTTP-POST the string that the user entered for verification.
- When the application receives this string, append the same MAC key from the Web.Config file to the end of this string.
- Hash the whole message.
- Retrieve the cookie value from the HTTP Request Headers and compare the message in Point (8) with the cookie value in Point (4)
- If the values don't match, you either have a spoof or the user simply read or entered the captcha text incorrectly.
As you can see from the steps below, no one can produce the same hash even for the same plain text if they do not know what the MAC key is. This prevents Man-In-the-Middle (MITM) attacks as well. In fact, the ASP.NET EnableViewStateMAC command uses this model to provide added security against MITM attacks. Here's the code for the hashing function, which you'll find in the RandomStringGenerator.vb class file in the downloadable code:
Public Shared Function HashMACMe(ByVal s As String) _ As String Dim b As Byte Dim HashValue() As Byte Dim retString As String ' Create a new instance of the UnicodeEncoding ' class to convert the string into an array of ' Unicode bytes Dim UE As New UnicodeEncoding 'Convert the string into an array of bytes. Dim MessageBytes As Byte() = UE.GetBytes(s & _ AppSettings("MACKey")) ' Create a new instance of the SHA1Managed class ' to create the hash value. Dim SHhash As New SHA1Managed ' Create the hash value from the array of bytes. HashValue = SHhash.ComputeHash(MessageBytes) ' Return a hexadecimal representation of the string For Each b In HashValue retString += b.ToString("X2") Next Return retString End Function
The code reads the MAC key from the web.config file and appends it to the initial string before hashing the final output message. The page sends that hash value as a cookie.
Drawing the Captcha Image
You still need to draw the string as an image. The .NET framework simplifies the drawing process into just a few fairly intuitive lines of code. Listing 1 shows the drawing code, which you can find in the DrawRandomImage.aspx.vb file in the downloadable code.
The code in Listing 1 takes the string parameter (ds) and then draws it in a bold typeface onto a graphic image. Finally, it adds some random lines into the image to throw off OCR scanning software. Finally, the page sets the ContentType to "image/jpeg" and saves the image to the Response.OutputStream for rendering.
To link that all up, take a look at the following Page_Load event code:
Private Sub Page_Load(ByVal sender As System.Object, _ ByVal e As System.EventArgs) Handles MyBase.Load ' Conjure up some Random Characters in a String Dim b As RandomStringGenerator Dim s As String s = b.GenerateRandomString(5) ' Hash the Random String together with a SecretKey ' (Machine Authentication Check) to prevent MITM spoof Dim hMACIString As String = b.HashMACMe(s) ' Store the results in a HTTPCookie Dim c As HttpCookie = New HttpCookie("hMACIString") c.Value = hMACIString Dim dtNow As DateTime = DateTime.Now ' Set expiration of 365 days - Change this to your requirements Dim tsYear As New TimeSpan(365, 0, 0, 0) c.Expires = dtNow.Add(tsYear) Response.Cookies.Add(c) ' Call the above DrawStringImage routine Call DrawStringImage(s) End Sub
Verifying the Data
Now all that's left is verifying the data, which the page does in the Submit button event handler:
Private Sub Button1_Click(ByVal sender As System.Object, _ ByVal e As System.EventArgs) Handles Button1.Click Dim hMACIString As String Try Dim c As HttpCookie = Request.Cookies("hMACIString") hMACIString = c.Value Catch End Try Dim b As RandomStringGenerator ' Send to next page in a real application If (hMACIString = b.HashMACMe(AccessKey.Text)) Then lblResult.Text = "Real Person" Else lblResult.Text = "SPOOFED" End If End Sub
The AccessKey Control contains the string data that the user enters. The client sends it back to the server, which append the secret MAC key, hashes the string, and then compares it against the cookie value written earlier in the DrawRandomImage.aspx page. Figure 2 shows the completed page in a browser.
|Figure 2. The Completed Captcha: Users read the text of the obfuscated image, enter that into the text field, and click submit.
As you can see from the code snippets above, it's fairly simple to implement a captcha-like technology in .NET. You do have to be careful not to over-do the obfuscation of the random images. Over-engineering them making reading the characters a challenge and a chore for legitimate users and will definitely be a turn-off.
Please note that I only recommend this as an added layer of security on top of other security features that your application implements. All the security layers must work together to form part of application authentication and security.
To extend this idea, you might want to expose this captcha feature as a Web service and implement it as part of your organization's service-oriented architecture so that your other login pages can use the feature as well. To do that, you may want to try WS-Attachments and DIME to stream the image binaries across the wire. Be careful though, because WS-Attachments and DIME are not supported in Indigo, Microsoft's upcoming Web services framework.