Globally unique identifiers (GUIDs) distinguish hardware, software, and users. Originally, Microsoft designed the GUID value for use with ActiveX controls; Windows uses a 128-bit GUID to identify ActiveX controls. A 48-bit version generates an Ethernet adapter address. Because no two systems should have the same Ethernet adapter address, a GUID based on your computer shouldn’t be duplicated on any other computer. To assure this precaution, a GUID datatype produces unique values using the time and address of your network card to ensure that no other computer can generate the same values.
These values, which are very long, consist of both numbers and letters. SQL Server and Access provide a 16-byte GUID datatype, a hexadecimal number with a guarantee of uniqueness for the next century. (Presumably, by then the duplicate value problem will be solved for the next millennium.) This article focuses on the nature and use of the GUID datatype in Access and SQL Server.
You’ll Outgrow Your System?Sooner or Later
The availability of the GUID datatype brings to mind the ancient Chinese curse: “May you live in interesting times.” Suppose you have at least one table that could store more than two billion rows. For instance, seismologists depend on a thousand different instruments buried in the ground to measure seismic activity every five seconds. As you can imagine, a table recording these measurements grows rather quickly. In fact, it isn’t all that unusual for a seismologist’s database to grow an average of 100MB per month.
You also can outgrow your system because primary key values are susceptible to the same recent trends affecting the marketplace, such as acquisitions, downsizing, and consolidations. Increasingly, information technology departments are expected to marry databases that different companies designed for different purposes. Suddenly you could find yourself having to merge varying databases?which all use an auto-incrementing datatype for their primary keys. As a result, all your tables have primary key values that increment by one and begin with the value of one. You’re faced with a multitude of duplicate primary key values.
GUID primary keys could solve your problem, but at an intolerable performance cost. Don’t rush out and convert all primary key fields to the GUID datatype. For the most part, system-generated key values need to be unique only to a single server. In most situations, changing an auto-incrementing primary to a GUID has no advantage, only the following disadvantages:
- GUID values are up to four times larger than the alternatives (Access’ AutoNumber and SQL Server’s Identity columns are Long Integer datatypes), so GUIDs require more overhead.
- GUID values take longer to search and sort. Consequently, a GUID primary key suffers a performance penalty that cascades throughout the database.
- The number of characters in and the awkward composition of the actual GUID values themselves make them hard to remember or work with directly, although neither is really necessary when used as a primary key value.
The GUID datatype can assist with a number of database challenges:
- Global databases that must guarantee unique key values across many servers
- Replicated databases that must guarantee unique key values across many servers
- Large databases that have outgrown incremental primary key values
- Companies faced with combining database files where primary key values are duplicated
Large databases that have outgrown their incremental primary key values (tables with more than 2 billion records) and developers who must combine two or more systems with duplicate primary key values require key values that are unique across an indeterminate number of servers. When feasible, a cost-effective and efficient solution in both cases is simply to declare the primary key values in all tables suspect and replace them all with GUID values (generated by the GUID datatype). The process requires updating every primary and foreign key value, which might be tedious but it guarantees that none of the primary key values are duplicated. The process won’t cost you anything but time.
Microsoft discourages the use of GUID as a primary key, but that’s because most people don’t really need universal uniqueness. When uniqueness across many systems is vital, however, the GUID datatype is definitely the way to go. Just remember that you pay a price in performance.
GUIDs in Replicated Databases
By and large, the most common use for GUID values is in replicated databases, where users insert new records and update copies of the same records on different servers. At any moment, two database replicas could contain new records, so a row or rows might contain identical primary key values. Adding a GUID value to each record would guarantee that all replicas (as a group) contain a unique primary key, reducing the synchronization problem to a series of simple append queries.
However, updating existing data poses a more difficult problem?users could change the same record in different replicas. Simply replacing an incremental primary key with a GUID primary key won’t solve this problem, because you still have n different versions of a row identified by the same GUID. For this reason, Access doesn’t convert primary key fields to the GUID datatype in replicated files. It adds GUID values to each record as version trackers. Access changes the incremental primary key to random, which minimizes (but doesn’t eliminate) the possibility of duplication within replicas. Then, during synchronization, the Replication Manager automatically detects and resolves primary key collisions, reassigning new primary keys to any duplicated numbers. However, that change handles only inserts. So what about updates and deletes? To that end, the Replication Manager adds four new columns to every table (see Figure 1).
|Figure 1: The Concept Behind SQLXML|
These new columns are read-only. If you change a value, Access tracks the change by changing the value in the s_Generation and the s_Lineage columns, accordingly. If you modify a column’s definition (which you can do only in the design master), Access updates the s_ColLineage column. Thus both data and definition changes are automatically tracked and can be propagated successfully during synchronization.
You might expect replication to incur the same performance penalty as GUID primary keys, but it doesn’t. Access searches these extra columns only during synchronization. In addition, typically only a few rows are added or changed between synchronizations, so the process isn’t lengthy. (Of course, the longer the interval between synchronizations, the longer the process takes.) You can also use GUID values to randomly select rows from a table or view, and you don’t have to create a uniqueidentifier column to take advantage of this technique. You can use the NEWID() function to generate GUID values in a query. To illustrate this unique use, execute the following query in SQL Server Desktop or SQL Server:
SELECT TOP 10 dbo.Products.ProductID, dbo.Products.ProductName, dbo.Products.UnitPriceFROM dbo.ProductsORDER BY NEWID()
The resultset contains 10 seemingly random product records (see Figure 2). This technique is beneficial when featuring random products on a Web site or selecting random records for an audit.
|Figure 2: Using GUID Values to Select a Random Set of Records|
Add a GUID Column
Because the GUID datatype is native to Access (95 and later) and SQL Server (7.0 and later), generating GUID values is a very simple process. Access GUIDS are called Replication ID’s. To generate a GUID in Access, you simply define a Replication ID AutoNumber field. In SQL Server, you set a uniqueidentifier column’s Is RowGuid property to Yes. Creating a GUID primary key column in Access, SQL Server Desktop (MSDE), or SQL Server is very similar:
- Launch your database system and create a new table.
- Name the first column.
- Access users: select AutoNumber from the Data Type column. (Notice that the default Field Size setting is Long Integer.) SQL Server users: choose uniqueidentifier.
- Access users: select Replication ID from the Field Size property dropdown list (see Figure 3).
Figure 3: Choose the Replication ID Field Size Setting
- In SQL Server, change the Is RowGuid property setting to Yes. Both SQL Server Desktop and SQL Server will set the Default Value property to NEWID(), the Transact-SQL function that generates a new GUID.
- Click the Primary Key tool on the Table Design toolbar in Access. In SQL Server, click the Set primary key tool. (Figure 4 shows an Access project, and Figure 5 shows a similar table in SQL Server’s Enterprise Manager.)
Figure 4: Change the Is RowGuid Property to Yes in an Access Project Figure 5: The GUID Column Defaults to the NEWID() as the Default Value Property
- Add at least one other column so you can add data and watch the GUID primary key respond (see Figure 6).
Figure 6: Add Data So You Can Watch the System Generate GUID Values
The GUID datatype is here to stay. Its presence in Access and SQL Server makes both products viable systems for the largest of projects, even if most of us never use it. Fortunately, the technology behind the GUID datatype is simple to understand and implement?just being aware of it and its functionality opens the door to easier solutions. Not many database developers hit the 2-billion-record ceiling, but if it could happen to you, design for success with GUIDs.