f the many challenges in creating, distributing, and obtaining consensus for documentation, I find the chief challenge is keeping the statements in the documentation synchronized with changing business conditions, conditions that change due to competitive forces (sales and acquisitions) in the marketplace or regulatory changes.
Typically, someone above your head promulgates a change that demands that your product comply with the new specification (e.g., Sarbanes-Oxley, HIPAA, or an acquisition). The first challenge is keeping that specification present and, if possible, together with your product so that they stay related.
The second challenge is similar but more complicated. The documentation preceding development is either foundational or prospective. The documentation supporting a report is after the fact, so instead of guiding goals and behavior, it’s used to remember the assumptions you used to create an ad-hoc report. If the report is canned, then changes to its lineage are well understood, because it probably was run from a stored procedure, a stored job, or a DTS package. Since you can follow the chain out to the presentation layer (Crystal or Report Services), you can understand what created the report. If the report was in response to a special situation like an acquisition or a regulatory change, however, chances are there is no code base that represents the new problem. So you develop your solution in Query Analyzer (QA), save its text to VSS, and send your work to senior management. This presents another wrinkle in the synchronization problem: the work you did to create the result gets separated from the result. You then need to remember?or create a naming convention that helps you remember?how things were related at the time of the report.
Solutions to this problem range from version control software and COTS help file generators all the way to homegrown documentation engines that work a lot like JavaDoc (i.e., read tags in the source and create HTML out of them). (As an aside, did ansi.sys and a batch file ever equate to IIS and HTML? End of aside, with apologies to Date
The problem also has been broached from other directions, such as full text indexing. These tools either help control code changes or help you find what is already there, but they do not help maintain the link between an ad-hoc report result and the code that produced the result.
This 10-Minute Solution demonstrates how I took advantage of MS SQL Server’s extended properties, storage areas attached to schema objects in which you can store strings specific to the table and column you want to document. You can name them and describe them according to your own convention, but you have to fill in some required fields in the property definition (the names of the table and column, who’s schema they belong to anyway…grist for the mill so the properties are associated with the right database object; See Table 2. String Creation Elements). By using extended properties, you can name and describe your objects according to the source specification, with minimal loss of fidelity of the topic. (You can consider other things to do with these strings later, like maybe execute them.)
The solution to the second problem?documenting ad hoc report source code?is to take advantage of the comment box in Excel: take the contents of the QA screen, copy them to the clipboard, and paste them into the comment. This lets you store the assumptions used to create a result with the result, so that they stay together and document exactly which filters were applied to the data to generate the result.
How do I maintain the synchronization between my documentation, database objects, and the code that produced a particular result?
Leverage SQL Server’s extended properties, storage areas attached to schema objects in which you can store strings specific to the table and column you want to document, and MS Excel’s comment box.
State Mandated Change to Data Elements
The project in which I applied these techniques started when the New York State Health Department sent out a document that specified the data elements they wanted in record structures. The document prescribed the acceptable sources and procedures you could use to fill its regulatory reports, leaving you responsible for assuring that your schema and procedures satisfied their requirements.
This immediately called for coordinating three dimensions:
- The specification’s name for the data element
- The data source (database, table and column) where the specification’s data element could be found
- The implementation’s table and column names, from which the data was going to be conditioned, processed, and reported (see Table 1 for a sample of the specification)
The “meta” aspects of the problem were:
- What is the specification’s requirement conceptually? (What is its meaning?)
- From where in your current schema is this information available? (What is its source?)
- To where and in what format must you map that attribute output? (What is its destination?)
|Table 1. Sample of the New York State Health Department Specification
To coordinate these aspects, I used extended properties. I saved the state’s specification as text and then re-opened it in Excel so that I could more easily expand the table with attributes we would need further on in the process. The Excel file adds the attributes necessary to actually do the mapping from this document to the data sources that satisfy them to the implementation stores used to report them. Table 2 presents the names of all the columns in the Excel sheet.
|From the Specification
|Destinations for Reporting
|String Creation Elements
|” ‘user’,” ” dbo,”
|” ‘table’,” Institutional
|Admit date for the stay
|Table 2. The Names of All the Columns in the Excel Sheet
As mentioned previously, I collected the data in these columns by completing two simple steps:
- Copying the Word document into Excel
- Filling in the values by running
information_schema.columnsin the database that contains the transaction tables and the database that contains the implementation tables
The work is all between the ears here because you have to know your data well enough to effect the mapping. It is tedious, but without it, knowledge of the meaning of the tables and columns in your organization is reduced to oral history, not science.
Fill in the Blanks of Your Worksheet
Once you have sweated out the connections between what the state is asking for and where you think you can get it, you are prepared to fill in the blanks. Your resulting Excel product will look something like Figure 1, which shows the “money columns” for changes to the schema
above.ConcatMedsForLookup, a calculated column from the state’s specification. Implementation table is the table it will report from, and mapped from is the source of the data.
|Figure 1. Excel Product After Mapping Is Complete
Once the Excel worksheet has the facts recorded in it, you can use it to create a string of the form required to create an extended property. To create an extended property for the meaning of an attribute, you could run the following statement in Query Analyzer:
sp_addextendedproperty "'Meaning'," "A=Adjusting, V = Voiding a previously submitted record." " 'user'," " dbo," " 'table'," CommonDetail " 'column,'" RecordType
Make sure that you are referencing things you have in your schema before trying this, otherwise an error will complain about its non-existence. Alternatively, you could add the extended property by right-clicking the object and using the interface provided. Figure 2 illustrates the interface, which is a convenient way to enter a property or two, however, if you have a lot of properties to add or edit, the script method is better.
|Figure 2. Singleton Extended Property Entry
You can retrieve the properties you create by using
select * from
fn::ListExtendedProperty(default, default, default, default, default, default, default).Next time, I will talk about how you can animate other processes by using the strings stored in the extended properties.
Since the above approach binds the documentation to the schema, how do I keep track of the chain between what was reported and the assumptions underlying the report in ad-hoc report situations? Look back at Figure 1 and you’ll notice a mapping to something called Calculated rather than a table and column combination, and a cell named CODE (in boldface) in place of a column name. The purpose of this artifact is to store the SQL necessary to fill that requirement. In other words, I copy the query text out of QA and insert it into a comment in Excel. As a result, I don’t have to remember which combination of stored procedures and/or DTS or other interventions I needed to create that result. I can just store the code with the report.
In Figure 3, the
ClaimEncounterIndicator for the Dental Segment is calculated with the SQL in the comment field.
|Figure 3. SQL That Calculates the
ClaimEncounterIndicator for the Dental Segment
I applied the above technique to document our fulfillment of the state’s definition of this attribute, but you can use the SQL in the comment field to document a report result, which is what I do with it 99 percent of the time.
Little Islands of Intervention
The good thing about extended properties is that because they are part of the database object they are easy to keep with the database object. The bad thing is that because they have no interface someone has to write the statements to add and edit the properties. As I stated earlier, you can get Excel to do most of the work.
I hope this discussion has given you ideas about how to leverage extended properties. These little “islands of intervention” have plenty of uses beyond documentation or mapping. Next time, I’ll look into storing executable statements in extended properties and retrieving them in ActiveX scripts in DTS.