The process by which such scripts create accounts is called identity spoofing, andfor most simple sitescan be accomplished rather easily. All the spoofer needs to do is to create an HTML form that contains fields identical to those in your login form and then "HTTP-POST" the data to your server, where your user-account creation process takes place. The problem is even worse if you allow your login forms to be processed via "HTTP-GET". After successfully creating an account once, there's nothing stopping the spoofer from automating the whole process.
With an automated script, spoofers can literally create hundreds of accounts with a single command. If your server doesn't validate the data, you risk being swamped by a huge amount of useless "virtual" accounts. If your server does validate it, the sheer number of requests can tie up your system resources and slow down or crash your application.
Another potential spoofing problem occurs because it's easy to write scripts that log in using the same user account from many different Web browsers. While this may not be a problem for some applications, it can waste resources and bandwidth, particularly when your business application allows clients to download files or other resources. Some applications check whether a user is already logged in before allowing them to create another instance of the application in their browser. Multiple-client attacks on these applications tie up resources such as database connections and system memory as the server repeatedly performs the login check.
The CAPTCHA Solution
There are many ways to prevent spoofing the user-account creation process. This article discusses a technique called "word-verification technology" and the pros and cons of implementing it in your applications. Popular sites such as Yahoo and MSN Hotmail implement it in their applications to help reduce spam originating from their mail domain accounts. Yahoo's word-verification technology, called the "CAPTCHA Project" was developed with Carnegie Mellon University.
A
captcha is a technique for differentiating humans from machines. A captcha method presents a problem that's relatively easy for humans to solve, but difficult or impossible for computers to solve at this time. One common captcha method presents users with an image containing some embedded text. Users must decipher the text and enter that along with the submitted login or user-account creation form. For example,
Figure 1 shows a sample captcha image
 | |
| Figure 1. Sample Captcha Image: The image contains a humanly-readable randomized mixture of capital and lower-case letters obfuscated by the patterned background and some additional lines that help to make it difficult for non-human text-recognition schemes to read the text accurately. |
A human can easily read "PVHKf" from the image in
Figure 1, but it's far more difficult for a machine to read the letters. For a machine to successfully decipher the text, it would need to read the characters using an optical character recognition (OCR) engine. While OCR engines are becoming increasingly accurate, they're still easily stymied by colored or patterned backgrounds, and/or extraneous lines or dots mixed in with the letters. You can use the difference in machine/human reading capability to your advantage by presenting text that you know is difficult for OCR engines to read, giving you a modicum of assurance that the request for a new account or a login comes from a human user rather than a machine.
It's worth noting that no captcha yet devised is completely hack-proof. OCR technologies have evolved and advanced to the point where a concerted attack can break captcha techniques. Particularly, simple text-obfuscation techniques are subject to sophisticated attacks that use OCR to scan the captchas and successfully read the characters inside the image. In response, you can make the captchas more difficult to read by obscuring the text with even more random arcs, lines, or background patterns; however, if you overdo it deciphering the text becomes a challenge and a chore for legitimate users.
Fortunately, OCR-based attacks aren't yet either perfect or common, so the word-verification technique discussed here probably offers sufficient protection to deter all but the most determined attackers.