Database Design Issues
For audit trails to be effective and useful, the following database design issues must be considered:
- User identifiers
- Schema design
- Date and time handling
- Bulk data transfer
- Binary data storage
- Archiving and restoring
The database codes that identify users should be unique, not only when the user is active on the system but up until the records are retained. These identifiers (typically USER_IDs) should never be reallocated or deleted. USER_IDs that are not in use may be marked as retired or inactive.
Clear policies about how to deal with records that were 'owned' by retired (or inactive) users should be established during the design stage. With long-term data retention policies and archives of records, the software system should provide explicit functions to reassign the retired (or inactive) user's 'work item' to another user, since the original user may not be around to perform the operations.
Using a running sequence of numbers that allocates USER_IDs is not a good idea. With the advent of mergers and acquisitions, contract operations, and outsourcing, you can't guarantee of the uniqueness of numbers. Cases where USER_IDs in two organizations are the same render audit trails worthless.
Appending a location or department in the USER_ID field is also a bad idea. For instance, the USER_ID UK-12326 may change when the employee transfers to another location. User location is clearly a separate attribute and should have no dependency to the USER_ID attribute. Also, the latter part of the identifier (12326) should never be subject to rollover. A better alternative may be the user's initials followed by the employee code (e.g., MNS-9953) or his or her joining date (e.g., MNS-20040512).
The USER_ID assigned to a user should never be updated. (In general, avoid updating primary keys all together.) Systems with long data-retention policies (10-15 years) do not keep all their data online at all times. They archive some data. If the system allows updates, it updates only the records currently in the database. The records that have been archived would have a different USER_ID.
The organization may use (or may decide to migrate to) authentication infrastructures like LDAP, NTLM, NIS, or Active Directory. System designers must take such possibilities into account and keep user identifiers in a format that is compatible with these technologies. The record management system should also blend into such infrastructures.
The USER_ID is also the key to attestation or electronic signatures. All attestation actions should be audited with a reason. Without this, the action will not be treated as an attestation and will not have the validity of a paper signature.
Schema Design and the Audit Trail
System designers need to consider reporting requirements for the audit data. When you de-normalize tables, you end up with columns that don't really belong in the tables they occupy. When such tables get updated (only in these de-normalized columns), an audit trail shows up for the table. Users are likely to get confused when viewing reports from this audit trail.
A similar situation exists for fields that contain summary values. For example, if the LIBRARY_USER table has a BOOKS_ISSUED_SO_FAR column, it is subject to updates on every book issue. This generates an unnecessarily large audit trail for the LIBRARY_USER table. Such a column may be required to take care of restrictions like "books issued per user should be limited to 6".
Audit trails in tables that are the result of many-to-many relationships tend to grow faster than audit trails other tables. Reporting on such tables also is a tricky affair. For example, group membership changes to USERS, GROUPS, and GROUP_USERS tables will be reflected in the audit table for GROUP_USERS.
Date and Time Handling
Dates have always been a tricky issue with database applications. Books that deal with SQL always have a chapter or two dealing with dates and their difficulties in logic and operations. Dates are also important when considering audit trails and record management. Applications should be designed to correctly capture the data and time of an operation, as well as when the data was manipulated.
Dates formats should remain uniform throughout the lifetime of the application or else additional logic has to be built in when manipulating data. The date formatting should be unambiguous, say, YYYY-MM-DD. If a timestamp is also involved, it also should have a uniform standard throughout the application. Applications where date formats changed for some reason, say, Y2K, require extra testing to ensure that historical data archived prior to the format change can be correctly imported into the system again. For the sake of reporting and consistency, the audit trail should consistently be of the same time zone, across all the years the system has been in operation.
Some PDA-based (and some PC-based) applications support an offline mode where data entry takes place on the PDA and then the PDA data and the system data merge (or synch). In such cases, it is better to have multiple columns for RECORD_DT and MERGE_DT. This ensures that reports fired prior to the merge and after the merge are consistent.
Bulk Data Transfer
In an electronic record management system, the audit trail is as integral a part of the data as the data itself. Not only must audit trail information be correctly transferred, but (if the system specification so demands) the auditing operations should also be performed even in bulk transfer modes. When data is transferred to or imported from another system, provisions must be made for transferring audit information at the same point in time as the rest of the data as well. Because Fine Grained Audit Control (FGAC) interferes with export/import operations, the user who performs these operations must have privileges to access all the data in the system.
The manner in which audit trail information (and the data) is transferred should be consistent with the original NLS settings. All character string data, number data, and dates should be transferred in a manner that does not cause corruption. In Oracle, direct-path operations bypass Oracle Label Security and constraints, and NLS translation happens at the client.
Binary Data Storage
When designing an application that stores binary data (like PDF files), the choice is between BFILEs and BLOBs. Because BFILEs don't reside in the database, revisioning is implemented on the file system itself. For audit trails (and data integrity), BLOBs are preferable to BFILEs.
Dealing with BLOBs has its own issues though (not really related to audit trails, but related to record management in general). One issue relates to using SQL*Loader to load BLOBS. Failure to load a LOB does not reject the record. Instead, it results in an empty LOB in the record column. An integrity check should follow an SQL*Loader operation.
Also, even for minor changes, you need to store the complete binary object again, resulting in much larger space usage.
Archiving and Restoring
Any system that maintains records usually comes with a utility to archive records that are not required for day-to-day operations. Various archiving strategies exist, such as the following:
- Retaining only summary records and archiving detail records
- Retaining the current version and archiving previous entries
- Retaining records for the quarter and archiving those in previous quarters
The archival utility may also be used to archive the audit log table. Archiving programs should always archive a consistent state of the database records. Depending on the duration of the archiving program's run and the requirement to allow simultaneous normal operations, the developer may choose to set the TRANSACTION ISOLATION LEVEL. If the application is of 24x7 type, it is advisable to SET TRANSACTION ISOLATION LEVEL SERIALIZABLE and opt for SET TRANSACTION ISOLATION LEVEL READ ONLY only as a last resort. The serializable mode will allow inserts/updates/deletes, while the read-only mode will prevent any updates.
As with archiving, the restore should also ensure that the system transports from one consistent state to another consistent state. This includes restoring it with the audit trail too. Restoring only part of the data compromises the integrity of the data in the system.