ut here in the field, there is a popular urban legend: “If your application issues only ANSI standard SQL, then your application is portable.” Many developers actually believe it. Unfortunately in some cases, identical SQL works differently on different RDBMS. This article presents some examples when the same query runs against identical data on SQL Server and Oracle and returns different results. It also discusses what it takes to develop truly portable applications and whether the applications ought to be portable in the first place.
The following sections describe examples when identical SQL works differently on SQL Server and Oracle.
NULLs in Indexes
Consider the following table:
CREATE TABLE Names(Name VARCHAR(10), OtherData VARCHAR(100))CREATE UNIQUE INDEX UQ_Names ON Names(Name)INSERT INTO Names(Name) VALUES(NULL)INSERT INTO Names(Name) VALUES(NULL)
Oracle just doesn’t store NULL entries in the index, so both inserts succeed. SQL Server includes NULL entries in the index, so the second INSERT will fail.
This difference also may cause significant performance differences. For example, should you drop the unique index and recreate it as non-unique as follows:
DROP INDEX Names.UQ_NamesgoCREATE INDEX IX_Names ON Names(Name)
This index may speed up a query only on SQL Server and never on Oracle:
SELECT Name, OtherData FROM Names WHERE Name IS NULL
To emulate Oracle’s indexes on SQL Server, you can use a “nullbuster” approach (this name was coined by Steve Cass), which I described in “Indexes on Computed Columns: Speed Up Queries, Add Business Rules“. To emulate SQL Server’s indexes on Oracle, you can use a function-based index on (Name, CASE WHEN Name IS NULL THEN 1 ELSE NULL END).
Zero Length Strings
Consider the following table:
CREATE TABLE Party(FirstName VARCHAR(30), MiddleName VARCHAR(30), LastName VARCHAR(30))
and some sample data:
INSERT INTO Party VALUES('Alex','','Kuznetsov')INSERT INTO Party VALUES('Scott',NULL,'Larson')
I am Russian. As such, I do not have a middle name. The table provides a zero length string to store that fact in the database. On the other hand, I do not know if my buddy Scott Larson has a middle name, nor what it is if he does. Therefore, the table provides a null to store that quite different fact in the database.
On SQL Server, these two facts are stored differently. So I can easily retrieve parties without middle names as follows:
SELECT FirstName, MiddleName, LastName FROM Party WHERE MiddleName = ''
Also on SQL Server, I can separately select all the parties whose middle names I do not know as follows:
SELECT FirstName, MiddleName, LastName FROM Party WHERE MiddleName IS NULL
Unfortunately, Oracle does not distinguish between a known zero length string and a null value; it converts a zero length string into a null. Because of that, the former query will return nothing, and the latter query will return both rows.
As you can see, identical queries against identically populated tables return different results on different RDBMSs, but that is not all. Suppose you have an empty table Names (as described above) as well as the unique index UQ_Names. On Oracle, the following inserts succeed no matter how many times you issue them:
INSERT INTO Names(Name) VALUES('')INSERT INTO Names(Name) VALUES(NULL)
On SQL Server, the batch succeeds only for the first time.
Don’t Forget About Collations and Case Sensitivity
Many SQL Server shops use case-insensitive collations, so the following query against the table defined and populated in the previous section will return one row:
SELECT FirstName, MiddleName, LastName FROM Party WHERE FirstName = 'SCOTT'
Many Oracle shops’ string comparisons are case sensitive, so the value Scott stored in the party table will not match the condition WHERE FirstName = ‘SCOTT’. Because many businesses do not need case sensitivity, you can uppercase all strings as you save them. If you go for this approach, you can also use a case-sensitive collation on SQL Server, which is somewhat more efficient because comparing case-sensitive strings requires less CPU.
Also, do not assume that sort order is the same on all servers. Suppose the Names table is populated and selected from the following:
DELETE FROM NamesINSERT INTO Names(Name) VALUES('ABC')INSERT INTO Names(Name) VALUES('A_B')SELECT MAX(Name) FROM Names
Depending on the RDBMS and its configuration, this query may return either ABC or A_B.
Because you need only one connection to reproduce all the problems previously discussed, it is relatively easy to expose them using proper unit and regression testing. For regression testing, you need to compare data in Oracle and SQL Server tables. Many tools are available for accomplishing that. For instance, you could import all Oracle data into a SQL Server database, and use Red Gate Software’s Data Compare tool. Alternatively, you could export all the data into XML files and compare the files.
Some differences in Oracle’s and SQL Server 2000’s behavior manifest themselves only when multiple connections modify and select data. Even then, they happen only intermittently, which makes them difficult to reproduce and understand. The following section provides a short example of this.
Suppose you need to implement a simple business rule stating that no more than one technical lead can report to a manager, and the table is as follows:
CREATE TABLE Employee(EmployeeId INT NOT NULL IDENTITY PRIMARY KEY, ManagerId INT FOREIGN KEY REFERENCES Employee(EmployeeId ), Role VARCHAR(20), FullName VARCHAR(50))
Suppose you embed in your application an insert statement to insert one employee and implement that business rule as follows:
INSERT INTO Employee(ManagerId, Role, FullName) SELECT ManagerId, @Role, @FullName FROM EmployeeWHERE ManagerId = @ManagerId AND (@Role <> 'Technical Lead' OR NOT EXISTS(SELECT 1 FROM Employee WHERE ManagerId = @ManagerId
AND @Role = 'Technical Lead' ))
This statement will insert anybody except for a technical lead as long as you provide a valid ManagerId. On SQL Server 2000 (or on SQL Server 2005, if you do not use snapshot isolation), this statement will prevent you from inserting a second technical lead reporting to the same manager. However, on Oracle (and on SQL Server 2005 using snapshot isolation), you can do the following:
- Add a new manager and commit.
- Begin a new transaction, add a technical lead, but do not commit yet.
- Open another connection and add another technical lead reporting to the same manager. Because this connection does not see uncommitted changes from the first one, the insert will succeed.
- Commit the transaction in the first connection, and note that two technical leads report to the same manager.
The biggest problem with bugs like this is that they are intermittent; you cannot reproduce them if you are the only user on the system. In many cases, this is exactly what is happening?the bug ends up with ‘Cannot Reproduce’ status. In order to reproduce such issues, you need to understand concurrency really well and use test scenarios like the previously demonstrated ones.
Portability Is Not as Easy as It Seems
As you have seen, portability is not an easy-to-accomplish feature. Achieving portability requires a good understanding of the differences between different back ends, a lot of work, and a lot of testing. If you really want your application to work on both Oracle and SQL Server, you need to be proactive. Take the following steps:
- Learn about the differences between these two RDBMSs.
- Review all your source code, and identify and fix potential problems.
- In your test plans, incorporate test cases exposing potential differences. For example, make sure that your application distinguishes between an unknown middle name and a known absence of a middle name.
- Do regression testing, making sure that all your unit tests work identically on both platforms.