espite having a good understanding of the term null, most developers still fail to adequately accommodate null values in their databases. Accommodating null values appropriately is an important part of protecting the validity of your data. At the very least, mismanaging nulls leads to erroneous data. If you don’t catch the bad data, it could lead to unsound business decisions.
The concept of null is often misunderstood. Some experts describe null as nothing or an empty value. In other words, no value has been entered for that entity. Others describe a null value as an unknown value. The value may or may not exist, but it is presently unknown. Technically, the ANSI standard states that null is not a data value, but an indicator that data is missing.
Now, let’s apply the concept of null to actual data. For example, when reviewing a new customer record, the user might see a main phone number but not a fax or cell phone number. At that point, the user doesn’t know whether the new customer actually doesn’t have a fax or cell phone or simply failed to provide that data. If dealt with properly, these nulls would have shared important information, but instead the data is not known.
Dealing with Nulls
There’s no right or wrong way to deal with nulls; the keys are awareness and consistency. The place to start is at the table level. For instance, properly normalized tables usually eliminate the phone number type of null because they won’t even create a record for a fax or cell phone number until a user enters that information.
Not all nullables can be normalized away though. When that’s the case, eliminate null values at the column level by requiring a value. Then, you have to decide whether to set a default value or allow a zero-length string. Invariably, however, you will face a few nullable columns. Either a business rule will force the issue or you’ll encounter them in a legacy application.
Finding Nulls Before They Do Damage
Any expression that encounters a null can return bad data. For instance, Figure 1 shows the results of comparing a null to a null. This statement has the potential to return no records, depending on your ANSI_NULLS setting.
|Figure 1. Results of Null-to-Null Comparison: By ANSI standards, any comparison that includes a null returns null.|
The ANSI_NULLS option is off (False) by default. That means that any null-to-null comparison returns True instead of null. When this option is on (True), any null-to-null comparison returns null. This particular default violates ANSI standards, but it prevents errors that might otherwise occur. How you set this option is a matter of preference, but knowing how it determines null evaluation is important.
The following statement turns the option on so you can see the potential for trouble:
USE AdventureWorksSET ANSI_NULLS ONSELECT EmployeeID FROM HumanResources.EmployeeWHERE ManagerID = NULL
If you peruse the records, you’ll find one record with a null value in the ManagerID column. However, executing the preceding statement returns an empty result set?in other words, it doesn’t return the existing record. Even worse, instead of a traditional error, you probably wouldn’t catch the “error” exemplified by the empty result set.
|Figure 2. IS NULL to Find Null Values: Regardless of the ANSI_NULLS setting, IS NULL catches null values.|
The correct way to find null values, regardless of this option’s setting, is to use IS NULL, as follows:
USE AdventureWorksSELECT EmployeeID FROM HumanResources.EmployeeWHERE ManagerID IS NULL
As you can see in Figure 2, you don’t need to know the ANSI_NULLS option’s setting. In addition, you guarantee that the statement will work correctly should someone reset that option.
Using IS NULL, you can easily find null values in a specific column, but searching several or all columns is a bit more complicated. At first, you might consider a complex statement using numerous OR clauses, but that isn’t necessary. Simply concatenate fields using the following syntax:
SELECT list|* FROM tableWHERE nulcol1 + nulcol2 + ... IS NULL
You can mix and match the columns or concatenate them all. Figure 3 shows the results of concatenating three fields:
USE AdventureWorksSELECT * FROM Person.ContactWHERE Suffix + EmailAddress + Phone IS NULL
|Figure 3. Results of Concatenating Three Fields: You can mix and match columns or concatenate them all.|
You still have to explicitly reference each column that you’re checking, but this method saves a bit of typing. Don’t try to concatenate columns that aren’t nullable, as doing so will return an error. This method works only with nullable fields.
While this behavior can come in handy, it also can return unexpected results. For instance, the following statement concatenates first, last, and middle names:
Use AdventureWorksSELECT FirstName + ' ' + MiddleName + ' ' + LastNameFROM Person.Contact
What Figure 4 shows might be a surprise: The ANSI_NULLS setting makes no difference in this case. If any name value is null, the resulting value is null. As shown in Figure 5, you can avoid this problem using ISNULL() as follows:
Use AdventureWorksSELECT FirstName + ' ' + ISNULL(MiddleName, '') + LastNameFROM Person.Contact
You don’t have to wrap every column in ISNULL(); if column properties require a value, there’s no possibility of a null. (FirstName and LastName aren’t nullable.) The ISNULL() function replaces any null value with a specified replacement value using the following syntax, where checkexpression is the value or column you’re checking for null values and replacementvalue is the value you’re using to replace nulls:
In this case, the only nullable column in the example is MiddleName. Be aware that records with no middle name value will end up with two space characters between the first and last names.
|Figure 6. ISNULL() Function Displays “None”: Give users more meaningful information when they encounter null values.|
SQL Server’s CONCATENATE_NULL_YIELDS_NULL setting will affect the results of any expression that concatenates null values. The default is on (True), which means concatenating a null returns null. When turned off (False), SQL Server ignores the null, returning the concatenated results of the other non-null values. Similarly to the earlier IS NULL example, ISNULL() eliminates unexpected results should someone reset this option.
Even when a null value does no damage, it doesn’t provide much value to users. They would benefit from more meaningful data. When viewing data, users might guess at what NULL means, but do you really want them to? Instead of displaying NULL (Figures 3 and Figure 4), the ISNULL() function in the following statement displays “None” as shown in Figure 6:
USE AdventureWorksSELECT Contact, ISNULL(Suffix, 'None') FROM Person.Contact
That, the user understands.
How Aggregates Evaluate Null
Because nulls have no value, the aggregate functions SUM(), AVG(), MIN(), and MAX() ignore them. Using best practices, you should set a default value, such as 0, to avoid the problem of nulls in your calculations. However, keep in mind that aggregates will evaluate 0 and that might not be what you want. SQL Server lends a bit of help in this regard. When eliminating null values, SQL Server can warn you if you simply turn on the following option:
SET ANSI_WARNINGS ON
However, the SET ANSI_WARNINGS ON setting also affects how SQL Server handles division by zero, arithmetic overflow, and character/binary data truncation, so be sure you learn its full potential. The important distinction is that adding null to any value returns null and the aggregate functions circumvent that standard by eliminating nulls.
If you want to evaluate all values, you must account for those values at the table level by forcing a default value. If that’s not possible, use COALESCE() to return a value other than null in the expression. For example, the first AVG() function shown in Figure 7 doesn’t evaluate null values. However, the second one does?technically, the second one is evaluating 0’s, not nulls. Neither statement is better than the other. Whether a function ignores nulls or not depends upon your objective.
|Figure 7. First AVG() Function Doesn’t Evaluate Null Values: Control whether aggregates evaluate null values.|
The COUNT() function is a bit different from the other four aggregates because it has two forms:
- COUNT(column): Returns the count of rows in a table, less the null values in column
- COUNT(*): Returns the count of rows in a table
As you can see in Figure 8, the following forms return the same results:
SELECT COUNT(*) ? Count(column) AS NullCountSELECT COUNT(*) AS NullCountFROM tblWHERE column IS NULL
However, if you specify the column as follows, the statement returns 0:
|Figure 8. COUNT(column) Eliminates All Nulls Before Evaluating: Depending on the form you use, COUNT() will ignore or evaluate nulls.|
SELECT COUNT(column) AS NullCountFROM tblWHERE column IS NULL
That’s because COUNT(column) eliminates all the nulls before it evaluates anything. Consequently, there are no null values to count.
If You Allow Them, You Must Handle Them
If even one column in your database is nullable, you have to consider how a null value will affect your data. Knowing how SQL Server evaluates nulls and how to handle them will prevent those nulls from working their way into your data and returning errors.