RSS Feed
Download our iPhone app
Browse DevX
Sign up for e-mail newsletters from DevX


All Input Data Is Evil—So Make Sure You Handle It Correctly and with Due Care : Page 2

Neglecting to check all application input to ensure it contains only valid content is simply asking for trouble.

What's Input for a Web Application?
Input data is just any data that users and/or external sources bring inside of an application. The list includes, of course, configuration and helper files, databases (why not?), as well as user interface controls that directly interact with the end user.

Note that the list is even longer (and subtler, to some extent) for Web applications. For a Web application, user input comes from any form fields displayed in the page such as text boxes, check boxes, list boxes, and their combinations. Combinations? Yes. How would you otherwise define custom ASP.NET controls that combine together, say, text boxes and drop-down lists?

In this way is the list complete? No. Consider arguments in the query string. And what about cookies? Cookies are not what some urban legends claim—some sort of executable code that may format your hard drive on the next full moon. Cookies are text files that become legal input for a Web application. For example, a Web application uses cookies to transmit the ID of the current session and the encrypted credentials for an authenticated user of a site. Cookies are a form of input for applications and, as such, you should carefully verify their content before use.

Now consider HTTP headers. HTTP headers are chunks of information that qualify an HTTP request, for example, to tell the browser about the type of resource being downloaded. You can define custom HTTP headers to move custom data, that is, input for the application.

In general, you should carefully monitor and examine any source of information for an application, including data you retrieve from a database query or receive from a Web service, read out of an e-mail message, or get through an FTP connection. You should carefully verify everything before use. But why? What could happen if you fail to check input data?

Last September, MITRE Corp—a company founded to provide engineering and technical services to the U.S. federal government—reported that attackers are now changing their preferences and paying much more attention to application flaws that allow for code injection. MITRE maintains a list of standardized names and descriptions for publicly known IT security vulnerabilities and exposures. According to their latest statistics, more than 20 percent of attacks discovered in 2006 fall under the umbrella of cross-site scripting (XSS) and 14 percent can be classified as SQL injection. About 10 percent of attacks are due to PHP "includes" and less than 8 percent to buffer overflows. You can see the full statistics here.

Given these numbers, a couple of considerations spring up quite naturally. The prevalence of XSS and SQL injection attacks indicates clearly and neatly that Web applications are definitely an easy place for attackers. However, with more than 34 percent of discovered attacks, can you really conclude that one attacker out of every three is planning XSS or SQL injection attacks as you read this? Of course not. Likewise, you can't determine that one third of Web sites have vulnerabilities.

For an attacker today, it's easier to find XSS or SQL exploits than overflows and such exploits are easier to find for two reasons. A Web site has more likelihood of being vulnerable (than a desktop application) and its databases may contain sensitive and, therefore, attractive information.

But why is a Web site inherently more at risk than a desktop application? Because the nature of the application—open to any user with an Internet connection—makes it easy to reach and therefore it's exposed to attacks. And because most Web developers overlook security issues. A managed platform like .NET significantly reduces the attack surface for buffer overflows; the same doesn't happen for SQL injection and XSS attacks.

In ASP.NET, a first line of defense is built into the framework, but it is not sufficient for you to sleep easy. It still requires that you change your programming habits.

Script Injection
Cross-site scripting (XSS) is an attack that manifests itself when untrusted input is echoed to the HTML page. XSS owes most of its power to the fact that HTML is a markup language. For this reason, in any HTML page some characters are given a special meaning and prepare the ground for special browser behavior. For instance, some characters are inserted to require a particular formatting or command the execution of some script code. The most typical example is the "<" character. There's nothing wrong with this fact as long the page author consciously inserts any special characters.

What if, instead, an attacker silently and sneakily injects markup characters into a regular page and the user's browser processes them? What if the user is then directed to view a malicious page just crafted to steal information or execute code on the local machine? This is more or less the typical effect of an XSS attack.

How could a hacker incorporate external text into a page? There's just one way actually—via input data.

To reliably avoid XSS vulnerabilities, you must HTML encode any text that you display programmatically. In this way, you neutralize any malicious script embedded in user-provided data and display the HTML as plain text instead of letting the browser interpret and execute it.

The drawback is that this approach also ignores any benign HTML formatting that users legally apply. This is not necessarily a problem for most Web applications; however, forums, most portals, and news-driven applications may have the need to host user-provided rich text. Now what?

If your application needs to display user-provided HTML markup, you should use whitelists. A whitelist consists of a list of valid and authorized characters and expressions. Hence, your code will just parse the input and strip off any text that doesn't appear in the whitelist. Period.

Security specialists generally consider the whitelist approach to be more secure than the opposite blacklist approach. A blacklist designates characters and expressions that you know to be potentially malicious and dangerous. A blacklist may get quickly outdated and, more importantly, beyond your control. In fact, attackers are generally quick to find new ways to encode malicious code to make it appear safe and innocuous.

What could a hacker gain out of an XSS attack? Frequently, they could steal your cookies, including session and authentication cookies. A hacker who happens to hold your authentication cookie may use the application as if they were you. At the very minimum, he or she could change your user settings; depending on the type of the application, they could gain access to your private and likely sensitive stuff. On the average less dangerous, but still quite annoying, is when they steal your session cookie. In this case, the attacker controls your session data. Again, what they could do depends on what the application does with session data.

Due to XSS, an application that relies on cookies to implement some of its features may receive poisoned data that affects the behavior and maybe the user. False advertising is another possible effect of an XSS exploit. But there's more. XSS can lead to exploit vulnerabilities in Web sites in a sort of anonymous way. The hacker may inject code in an unaware application so that whenever a user performs an innocuous action (i.e., sending a mail or viewing a report) it also triggers a denial-of-service attack against another site. More in general, being a victim of an XSS exploit may mean little to you and your application. XSS holes leave a door opened to uninvited guests to come in and exploit other security bugs that may exist in the application, the browser and, worse yet, the server. Though most of the time the highest costs of XSS holes are represented by damaged reputation.

What can you do as a user to fight XSS? Ideally, you should disable JavaScript and raise your security settings to high. However, a good defense consists also in following links only within the same Web site. Links pointing to external services and Web sites may be dangerous; likewise, clicking links from unknown senders is dangerous, especially when the link is not a plain http://www.somewebsite.com.

What can you do, instead, as a developer? You should never trust any user input and always encode meta characters such as the notorious < and >, but also round parentheses, the sharp symbol (#) and the ampersand (&). Table 1 lists the safe representation for these HTML meta characters.

Table 1: Safe Meta Character encodings.
Meta Encoded Description
< &lt; Used to signal the beginning of a markup segment.
> &gt; Used to signal the end of a markup segment.
( &#40; May indicate a function call.
) &#41; May indicate a function call.
# &#35; May indicate a section in the page with potentially unknown content.
& &#38; May indicate a query string parameter.

Will I Be Totally XSS-Safe with HTTPS?
A site working over an HTTPS channel sends and receives data to and from the browser in an encrypted format, but you should note that HTTPS per se provides no guarantee of protection. HTTPS is secure because we all trust the authority that has issued the certificate used for encryption. HTTPS is not technologically secure; it just makes you feel secure because it uses a secure tool from a trusted authority. This said, there's not much that HTTPS can do to prevent XSS attacks. HTTPS is a great barrier against attacks such as eavesdropping or man-in-the-middle; but it can't do anything against XSS. XSS takes place on the client—after the page has been downloaded and before the data is uploaded.

Close Icon
Thanks for your registration, follow us on our social networks to keep up-to-date