Login | Register   
LinkedIn
Google+
Twitter
RSS Feed
Download our iPhone app
TODAY'S HEADLINES  |   ARTICLE ARCHIVE  |   FORUMS  |   TIP BANK
Browse DevX
Sign up for e-mail newsletters from DevX


advertisement
 

Consolidating Tables for Easier Database Maintenance

There is a way to reduce the complexity of a system by reducing the number of tables. This reduction is accomplished by consolidating tables based on their domains.


advertisement
n designing a relational system, the simplicity of each individual relation belies the complexity engendered by the multiplicity of tables created by implementing that model. However, simply by observing "good housekeeping"—meaning, observing the old saw that there's a place for everything and everything in its place—you can arrive at a serviceable design.

What Is a Serviceable Design?
By "serviceable design," I mean a design where updates that must occur do so in one table. Similarly, deletions and insertions that must occur do so in one table. The problem is that the number of tables necessary to isolate those database changes soon overwhelms you. Fortunately, there is a way to reduce the complexity of a system by reducing the number of tables. This reduction is accomplished by consolidating tables based on their domains.

Representations and Operations
For a quick refresher, we can say that a domain is a conceptual pool of values—meaning some generalization of a representation, like a string of length n or an integer—and the operations allowed on them. (Hm, this sounds suspiciously like an object, but that's another discussion!)



Knowing what we know about domains, we can tell what operators it makes sense to use on its members. Can we use less-than or greater-than tests to compare members of the domain? Are addition and multiplication appropriate against the members? There are other properties of the domain that can be exploited to improve the fidelity of our representation of reality. Is there a format or mask applied to the member, like dashes or dots in a phone number (914-555-1212) or social security number (123-45-6789)? At the other end of the spectrum, we can hone our model by recognizing that the possible values we need to represent are really an enumerated type, a restricted list of values. For example, the only things that make sense in recording a hospital patient's status may be limited to "admitted," "discharged," and "died."

Honing Domains with User-Defined Types
There is an assumption implicit in the characterization of a representation that, once cast, it stands for only the column defined. For example, it's hard to picture working on the specification of the representation of a phone number—with its individual segments for area code, separators, exchanges, numbers, and extensions—and then trying to use that specification to represent addresses. Yet there are times when a general definition of a domain into a user-defined type is useful if the definition is going to share some property, like its nullability or the presence of a default value. This can be illustrated by applying a default value against a non-null column using a user-defined type—say, by using VARCHAR(20) and naming it dom_Med_String (with a default of def_Unknown) instead of just dom_City. Now you can get the non-null with the default value bound to dom_Med_String and use it to represent "Last Name" as well as "City."

The moral of this story is "ya pays yer money and ya takes yer choice" on the general applicability of your definition versus its specificity.

How Does This Reduce Complexity?
With the preceding discussion under your belts, where's the 10-Minute Solution in all this? I started out talking about the multiplicity of tables in a relational model, and then got to talking about user-defined types as a way to refine your representation of data, hinting at both the one-to-one correspondence between a domain definition and its applicability to a particular column in your model, as well as the power of a more general definition allowing you to leverage and apply the definition over more than one column.

A Little More Than 10 Minutes
The meat in all this is recognizing the similarity of construction between two tables so you can exploit the repetition of the same domains by combining tables. How often have you noticed that several tables in your model are constructed identically, only the columns are named differently?

For example, let's say you have a table of Salutations and a table of Titles, both of which are used as foreign keys into your Person table. The schema for salutations is this:

Salutations [SalutationID Int, SalutationDescription Char(30)]

Some sample values might be as follows:

ID

Description

1

Mr.

2

Mrs.

3

Ms.

4

Governor

The schema for titles is this:

Titles [TitleID Int, TitleDescription Char(30)]

Some sample values might be the following:

ID

Description

1

Junior

2

III

3

IV

4

Adjunct Professor

You can quickly see that, but for the names, both are founded on the same domains. Each table has an Int column that is used as its primary key column and a description column that is Char(30). You could say that the general form of the code table is:

Code [ID Int, Description Char(30)]

The Next Step
If you embed the contributing table name in a new table, you can create a table of "virtual tables." You can call the new attribute of your expansion for the general form of the code table "Code Type." Its values will come from the table names you've identified as having the same general structure as your ideal code table. That is, it has an ID and a description, and now you are going to add CodeType to your table, so the new schema is this:

CodeMaster [ID Int, CodeType Char(10), Description Char(30)]

The new column carries the old names of the tables you are consolidating. Some sample values in the new table would be this:

ID

CodeType

Description

1

Salutation

Mr.

2

Salutation

Mrs.

3

Salutation

Ms.

4

Salutation

Governor

5

Title

Junior

6

Title

III

7

Title

IV

8

Title

Adjunct Professor

There is a little cheating going on here, because you took advantage of a surrogate key in your new table. The "natural" key is CodeType and Description, meaning that their combination cannot be repeated in the table, but you could have "Professor" appearing as both a salutation and a title, depending on the enterprise's usage for that element. So while you did denormalize the table, you know you have a little insurance in unambiguously identifying a row in the table.

Some Caveats
Before you go off combining all your codes into one table, there are a few things to be aware of:

  • More complicated triggers: Depending on your use of natural or surrogate keys, the triggers to cascade changes from the code table could be more or less difficult to write. If you use natural keys, then changes made to the code types or the descriptions will have to be propagated to the child table that uses them—in other words, a mess. If you use surrogate keys, then the child tables are insulated from changes that users may feel they need to make to the descriptions of their codes.
  • Extra joins: Because the paths for the foreign keys are concentrated in one table, you will find it necessary to use aliases in the select list of your queries so that you can reference the code table more than once. To keep with the example of the new CodeMaster table contributing both a SalutationID and TitleID to the Person table, you would have to use CodeMaster twice in the query to resolve both foreign keys. Let's use the table aliases "S" and "T" to stand for salutation and title as the two uses of the code table:

    SELECT S.Salutation, P.FirstName, P.LastName, T.Title FROM CodeMaster CodeMaster S, Person P, CodeMaster CodeMaster T WHERE P.SalutationID = S.ID AND P.TitleID = T.ID AND T.CodeType = "Title" AND S.CodeType = "Salutation"

  • Self-joins to organize codes in use: Another wrinkle you will encounter is in reporting on all the codes in use in your system, because it requires you to master the techniques of the self-join to produce those reports. You will need one instance of the code table to act as the Master (the code types) and another instance of the code table to act as the Detail (the descriptions) so you can keep track of which items belong to which code types:

    SELECT M.Codetype, D.Description FROM CodeMaster CodeMaster M, CodeMaster CodeMaster D WHERE M.Codetype = D.Codetype ORDER BY Codetype, Description

Other Things to Try
You may find it profitable to add columns to the code table, so that you can bracket the in-service and out-of-service times that a particular code is in use. This will also preserve the value of a particular code through time, potentially saving you hours in conversion for retrospective studies. You can also consider attaching factors or other numeric descriptions to your code table for even more flexibility in calculation-intensive applications or, as mentioned above, attaching factors on code that goes out of date.



   
Drew Georgopulos is the Database Architect for GoAmerica Communications. He moonlights as an Adjunct Professor of Computer Science at Marymount College of Fordham University in Tarrytown, NY teaching systems analysis and relational database design. He holds a certification in systems analysis and relational design from Columbia University, has presented at SQL Connections on SQLDMO, and is a member of the Worldwide Institute of Software Architects.
Comment and Contribute

 

 

 

 

 


(Maximum characters: 1200). You have 1200 characters left.

 

 

Sitemap