Java and XML: Learn a Better Approach To Data Validation

s concern has grown over the security and efficiency of Web-based applications, validation of user input has increased its importance in turn. Relying on scripting for client-side validation is unmanageable, inefficient, and non-portable. Hard-coding data validation rules into server application code ties critical business logic to the presentation tier and makes maintenance extremely troublesome. Web developers need an efficient, secure, and flexible server-side data validation mechanism.

A Java/XML-based data validation approach separates the implementation of common data validation reasoning code from the business rules and criteria data used to validate user input. The validation reasoning code is implemented in Java, while the business-specific rules and data are specified in XML. This approach provides a powerful and flexible way for application developers to specify data validation in a manner that is secure and easy to manage, while decoupling validation rules from the server-side business logic implementation.

Common Approaches to Data Validation
Before delving into the mechanics of Java/XML-based data validation, let’s examine some common approaches to handling this problem, along with their relative strengths and weaknesses.

Client-side Scripting: Inflexible and Non-portable
Client-side data validation was the first generation of user input data validation technology for web applications. It is still widely used today. The data validation logic is typically implemented in HTML and Javascript and embedded in the Web pages transmitted to the browser. While it provides faster response time by reducing the number of round trips to the server, this approach has the following shortcomings:

  • Successful data validation relies heavily on client-side configuration. An end user might choose to have Javascript disabled in her browser for security reasons or simply to avoid popup advertisements. Also, different browsers support subtly different feature sets of Javascript; it’s often necessary to use different APIs in different browsers to do the same thing.
  • Complex data validation logic usually needs to access data stored on the server. For example, a client-side snippet of Javascript is able to check whether a ZIP code is well-formed, but a more sophisticated application would want to further validate the ZIP code by searching a ZIP code database stored on a Web or database server.
  • The data validation logic, an integral part of the business logic, is removed from the business layer and combined with the presentation layer. This violates the principles of three-tiered architecture, introducing unnecessary coupling that reduces flexibility.
Java-based Validation: Cumbersome and Hard to Customize
A more robust way of handling data validation is to embed data validation logic on the server in a multi-tier architecture. The ability to code the validation logic in Java directly provides unmatched flexibility to deal with very complex logic. However, for sizable forms, this approach soon becomes cumbersome. It generally takes many lines of code to validate each input parameter. The coding process will be labor-intensive, and the final code will be repetitive and lengthy.

The fatal weakness of this approach is the inability to customize the validation rules once an application is in production; the entire software development process (code, debug, test, deploy) needs to be repeated to modify the Java code that implements the rule changes.

Database Retrieval: Inefficient and Expensive
Storing the data validation rules in a database is a different server-side data validation approach?one that is both flexible and easy to customize. The business logic tier retrieves the rules from the database at runtime and checks the data items against them. Compared to the pure Java approach discussed above, this approach sacrifices some flexibility for manageability and ease of customization. However, if we consider the nature of data validation rules, which tend to undergo far less change compared to typical data items stored in a database, the performance cost can be quite high. Also, to be able to manage the rules stored in a database, either an administrative user interface needs to be developed, which increases development cost, or technical personnel with different skill sets (i.e. DBAs) are needed, which increases maintenance cost.

Data Validation Musts: Flexible, Scalable, and Portable
What we need is a data validation approach with the following features:

  • Independence from any client-side operations and the ability to support client types beyond traditional web browsers (e.g. multiple-channel applications that support HTML/WML/voice clients from the same server code base).
  • Seamless integration into the business logic tier of a J2EE-based Web application.
  • Sufficient power to handle typical data validation logic.
  • Ease of customization and administration in production, ideally without necessitating a system outage.
  • Efficiency, incurring only a minimal change in performance.
A Java/XML-based approach satisfies all of these criteria. It proposes a meta-language, or grammar, for application developers to specify rules in XML format, and to parse and execute them at runtime with a Java library that supports this grammar. The rest of this article will explain this approach in detail.

A Java/XML Server-Side Data Validation Engine

Capabilities and Architecture
Our Java/XML-based data validation application will include the following:

  • Definition of a form and its fields (through an XML configuration repository or through direct manipulation of form and field definitions)
  • Instantiation of form instances based on a form definition
  • Update of forms from text-based input sources
  • Validation of input criteria, including field-specific error message generation
  • Support for field attributes beyond validation criteria (such as field labels)
  • Support for localization of all field validation criteria and attributes
  • Support for customized field-specific error messages, including localization of message text
  • User-customizable field types (definitions)
The actual definition of a form that can be submitted is maintained in a form definition repository XML file that describes all the forms in an application. For each field in each form the following properties can be specified:

  • The expected type of this particular field?plain text, integer, decimal, Boolean, choice, multi-choice, etc.
  • A key to identify this field
  • Whether this field is required (i.e., user input is mandatory for this field)
  • Type-specific criteria such as minimum/maximum length for text fields, inclusively/exclusively maximum/minimum boundary values for numerical fields, valid date range for date fields, etc.
  • Error messages that can be displayed when certain rules are violated.
Automation of the handling of forms can then be achieved by programmatically invoking the validation engine from the servlet code that handles the form submission. If any of the data is invalid, the program responds with the specific error message associated with that field. A set of error message expansion keywords is defined to dynamically generate error messages based on field constraints.

A Meta-language for Validation Rules
Table 1 shows the XML definitions for the text field types in the application. The other field types (not included) have similar attribute sets and semantic structure:

Text Field Definition
Description A simple text field with length validations.
XML Element
Parsed Valuejava.lang.String
Class com.cysive.framework.forms.TextDefinition
Field Element Attributes
key The unique ID of this field (only needs to be unique within a form).
required If “true”, the field is required. If trim is also specified, the field must contain something other than whitespace. If “localized”, the resource property keyed by “required” must be “true” for the field to be required, all other values are considered false.
trim If “true”, the field will be trimmed of all leading and trailing whitespace prior to validation. If “localized”, the resource property keyed by “trim” must be “true” for the field to be required, all other values are considered false.
Nested Elements
The label (name) for the field.
The default text for the field. It is considered valid to supply a default that is invalid according to the validation criteria.
The minimum length in characters for this field. Must be >= 1 and <= max-length. Values outside of integer range are wrapped.
The minimum length in characters for this field. Must be >= 1 and >= min-length. Values outside of integer range are wrapped.
Error Message Keys
missing The field is required and no value has been supplied.
length-out-of-range-min-maxThe number of characters in the field is out of range and both min-length and max-length have been specified.
length-out-of-range-minThe number of characters in the field is out of range and only min- length has been specified.
length-out-of-range-maxThe number of characters in the field is out of range and only max- length has been specified.
Error Expansion Keywords
${label}The field’s label.
${max-length}The maximum length specified for the field.
${min-length}The minimum length specified for the field.


The DTD file for the form definition XML files can be found at http://cymbio.cysive.com/dtds/formdef_1_0.dtd or email the author at [email protected].

The Java Implementation
A high-level class diagram of the validation engine is shown in Figure 1:

Figure 1. Validating User Input: This high-level class diagram shows the relationships between the classes that will perform the validation for each user input field.

The FormDefinitionRepository class maintains the repository of all form definitions loaded for an application.

The FormDefinition class maintains the definition of a form. A form definition consists of a set of individual field definitions. Each field definition specifies validation and other details about the field.

The FieldDefinition class provides the abstract definition of a field in a form. A field definition describes the validation and parsing criteria for a form field, as well as additional display attributes.

Each of the subclasses of FieldDefinition implements concrete behaviors for the type of field it implements.

The Form class represents an instance of a form for which input is being received or whose data is being presented. A form has one or more Field instances that contain the actual field data. Each form is backed by a FormDefinition and each field in a form is backed by a FieldDefinition. All parsing and validation tasks are delegated to the definitions, therefore making a form and its fields a non-type specific container for form data (see Figure 2).



Extending The Data Validation Engine
An application developer can also add more field types or more complex validation logic as needed with the above architecture. For example, the user input to a ZIP code text field might need to be validated against a regular expression that formally defines the validity rules of ZIP codes. In this case, the developer may subclass the CustomField class to provide her own ZipCodeDefinition class that handles ZIP code field definitions such as:

                        ^d{5}(-d{4})??$         

Figure 2. Forming a Sequence: This sequence diagram shows activity in the validation engine at request processing time.

The behaviors of existing field definition types described in the Java implementation section above can also be customized by extending the actual definition implementation class and specifying an extra “class” attribute in the field definition element in the XML file. For example, one application might need to verify that an input to a decimal type “Amount To Transfer” field has to be less than the existing funds in the account of the current customer. Instead of directly hard-coding this piece of validation logic in the servlet code, a developer may extend the existing DecimalFieldDefinition class in a new TransferAmountDefinition class, which involves reading the customer account balance from a database and validating the transfer amount value against that data.

Sample Form Definition
A sample forms definition file is shown below (the DTD specification has been omitted):

   
40 40 25 18 60
As you can see, the form has an attribute “key,” as does each field. These values provide the unique identifier with which to reference forms and fields. Here are detailed explanations of the field properties specified:

  • label
The label is used by the engine when generating error messages and can be used to specify the actual label text when rendering the form to submit.

  • required
The required attribute specifies that some input must be supplied for the field to be considered valid.

  • default
The default attribute specifies the text that the field should default to.

  • max-length
Applies to text fields. In the example, it is used to specify the maximum length in characters for the First Name and Last Name fields.

  • min-inclusive
Applies to integer or decimal fields. Specifies the minimum acceptable value for the Age field in the example.

  • max-inclusive
Applies to integer or decimal fields. Specifies the maximum acceptable value for the Age field in the example.

Inter-Field Validation Rules
It is common for applications to need to apply validation rules to a certain field depending on the result of validation on another field. For example, an application might validate a ZIP code field only if a user specifies that she is inputting a U.S. address. The DTD for the meta-language we described above can be augmented to allow inter-field rules such as:

   
40 10
In the example above, the “required” attribute of the “zip-code” field is obtained at validation time by evaluating rule “zip-code-rule,” which checks whether the input value of the “country” field equals “United States.”

The Balanced Approach
Data validation must be flexible, scalable, and portable; typical solutions are deficient in at least one of these key areas. The Java/XML-based data validation engine presented in this article addresses each of these concerns and yields the following benefits:

  • XML meta-language handles typical data validation logic and can be easily extended.
  • Easily and seamlessly integrated into any application that needs to validate data input. This is especially valuable for applications that need to support multi-channel clients (browsers, mobile devices, voice, and Web services) from a common codebase.
  • XML-based repository makes modifying validation rules and criteria easy and intuitive.
  • Localized validation rules and error messages simplify internationalization.
  • XML-based repository is lightweight and efficient.

Share the Post:
Share on facebook
Share on twitter
Share on linkedin

Overview

Recent Articles: