Browse DevX
Sign up for e-mail newsletters from DevX


Build an XML Based Scheduling Utility

Complex applications often consist of many individual tasks, each of which may depend upon the successful completion of other tasks. For example, you may want an application to execute only if a preceding series of steps occur without failure, in a specific sequence. Managing such dependencies sequences manually quickly becomes a burden. Learn to automate process sequence dependencies with this XML-based scheduling utility.




Building the Right Environment to Support AI, Machine Learning and Deep Learning

omplex applications such as data warehousing need a strong operational support infrastructure to manage the daily tasks that keep these systems running. Typically, such applications rely on multiple programs that must execute in sequence and on a specific schedule. Scheduling utilities manage that process and are an important part of the infrastructure. A basic scheduling utility runs a task, checks the return code and then (depending on the return code) either runs the next task or exits. Ideally, a scheduling utility also captures operational metadata, such as execution time, CPU and I/O usage, task output and error codes, making continual improvement and process control possible. You can use almost any data structure—even ASCII flat files—to capture the process flow metadata required for such an application, but XML is a better choice, because it captures hierarchical information naturally.

In this article, you'll see how to use XML to structure the process flow information of a sample data warehouse application. The application has several jobs that must run in a certain order. A simple Apache Xerces Java Parser implementation of an XML based scheduling utility uses an XML file to control the job flow. The article also discusses some possible ways to enhance for the scheduling utility using XML tools and techniques. Define a Sample Data Warehousing Application
Imagine a Data Warehousing Application that has the following tasks, called units, which must run in a specified order. Table 1 shows the task descriptions and names.

Unit Description
Unit Name
Initialize the SystemINITIALIZE_SYSTEM
Archive Data from previous RunARCHIVE_DATA
Run Analytics on the Data (Call made to a Core Engine)RUN_ANALYTICS
Process the Information obtained from the previous Run Analytics job and store it in a DatabasePROCESS_ANALYTICS
Free the SystemFREE_SYSTEM

Each unit contains subunits, which ensure the success of the parent unit. For example, the PROCESS_ANALYTICS unit consists of programs that perform the following activities, each of which is a subunit.
Subunit Description
Subunit Name
Drop Type 1 Table Indexes
Drop Type 2 Table Indexes
Drop Type 3 Table Indexes
Load Tables with data from ASCII files(Load_Tables)
Aggregation Program (Aggr_Prgm)
Build Indexes (Build_Index)

Further, the subunits may have complex run dependencies. For example, the Load_Tables Job should run only if the subunits (Drop_Index1) and (Drop_Index2) succeed. The Aggregation Program (Aggr_Prgm) should run only if the (Load_Tables) and (Drop_Index3) jobs succeed.

The above example illustrates the typical hierarchical nature of process flow metadata. XML is a good candidate for storing hierarchical information of this type and can be used to model the process flow visually.

Comment and Contribute






(Maximum characters: 1200). You have 1200 characters left.



Thanks for your registration, follow us on our social networks to keep up-to-date