art I of this two-part series exploring DTS tasks illustrated how to chain different types of tasks together to collect the information you need to process data contained in a file location table. It’s a technique I used during a SQL Server project in which I built a solution that automatically refreshes check register data from Excel for a client’s Finance department.
The department needed to maintain a current copy of a check register that was subject to subsequent edits. The organization’s business rules allowed users to reopen check registers from a prior period and make corrections or additions. I had to figure out how to import an indeterminate number of items from a directory (the check register directory) into a SQL Server staging table, so that the records would be stamped with the directory from which they came and the time of the import.
The solution I chose was using the DTS Execute Process task to execute a batch file that redirects the result of a dir command to a text file. I then used the DTS Connection for a Text File Source to pipe the contents of the text file into a Transform Data task. This filled a table that held the names of the files I wanted to process.
This second and final installment of the DTS task series completes my solution by illustrating how I used multi-step DTS packages to consolidate fifty Excel workbooks worth of check register data in one SQL Server table.
How can I import an indeterminate number of data items from an Excel directory into a SQL Server table?
For your source, use the Excel 2000 driver from the dropdown on the Excel Connection. To transform an Excel workbook or a text file, open the DTS Designer and drag an Excel connection onto the work surface. Fill in the information needed to complete the connection on the design surface.
The Elements of a Data Move
Moving data from one place to another takes three elements: a data source, a data transformation, and a destination. The source can be anything for which you have an OLEDB driver, the transformation is the parsing of the data into a SQL Server table, and the destination is your SQL Server. In this case, for your source, use the Excel 2000 driver from the dropdown on the Excel Connection, which lets you point to the thing you want to convert. To transform an Excel workbook or a text file, open the DTS Designer and drag an Excel connection onto the work surface. You fill in the information needed to complete the connection (the path and filename for the Excel source) on the design surface.
All in all, these data-import activities are no more complicated than the following steps:
- Drag two connection objects from the designer (one for the Excel connection and one for the SQL Server connection).
- Fill in the folder and file name of the source connection.
- Select a Copy Column transformation from a Transform Data task.
- Indicate the destination on the Destination Tab of the Transform Data task.
If the destination table does not exist, DTS lets you create it right on the spot. The only issue I had with the default settings was that they mimicked the Excel data types, which can be a tad funky in terms of the data types and the lengths that appear as the defaults. Fortunately, you can edit the create table definition to types that are more reasonable for SQL Server. For example, I doubt you want varchar(254) or floats in your table. You can edit and fix the table definitions easily before the target table is created.
Another caveat here is that Excel looks at a small portion (eight or ten rows) of the worksheet to determine the correct data type for the column. Most of the time it is correct, but other times it gets confused either by missing values or ambiguous ones (numbers in text columns). You can override this “excess intelligence” in the connection properties window.
Finish the task by connecting the source connection to one end of the transform data task and then connecting the other end to your SQL Server connection. Hit the little green arrowhead to execute the package, and you’re done. The data moves from the Excel workbook into the table you defined. This is the essence of the task, and it works great for an ad hoc, one-time move. However, my solution calls for an indeterminate number of worksheets to process.
Once Is Not Enough
My business problem, as you’ll recall from Part I, was that I had about fifty workbooks to import, each of which needed to be refreshed because any one could have been updated since the last time it was imported. In effect, I needed a total refresh of the Excel data as kind of an all-or-nothing snapshot, so that the checks would match up to the claims each time the Finance department wanted to perform a check reconciliation. So I had a working model, but it worked only once!
I needed a way to repeat this import process so I would catch the changed registers. I also needed to wrap this process in a loop so that it would be executed once for each Excel file. My instincts told me that creating fifty such packages was not the answer. I needed a header package that would read the file location table I created in Part I and send that file name into this little transformation mill, so that it would do the same extraction for each Excel file. Figure 1 shows the header package I created.
|Figure 1: The Header Package I Created|
In this figure, the Execute SQL Task task issues a Select statement against the file location table from the previous installment. By using the checkbox on the last page of the task that tells the task to act as a rowset provider, I got the result set to persist. This is the feature that lets me manipulate the result via ADO later on in the process.
I also learned that for this implementation a Package can have Package Variables, which wasn’t obvious at first because they appear right on the Designer’s work surface (not a menu). You can access the Package Variables by right-clicking anywhere on the DTS Designer and picking them from the menu that appears. This brings up the page shown in Figure 2, where you can create a Global Variable that the package can reference.
|Figure 2: The Global Variables Dialog|
I created a Global Variable for each of the following:
- The file name that was retrieved from the table (providing an example of one)
- To hold the result parsed out of the file name, which indicates its file type
- To hold the current date and time of the transformation
- To hold the record count from the file location table
With these Global Variables in place, I assigned the Global Variable holding the file name to be Outer Variables, meaning they would be visible to the child process (or any other process that wanted to use them). Think of them as “exposed”.
On the receiving package’s side, the child process, I had to create a Dynamic Properties task for the child package. This acts as the receiver for the Outer Variable’s values. Once the file name was inserted into the child package, I was almost home!
The File Name Extraction
So the global variables enabled the plumbing to feed the stream of Excel workbook file names into the child package. As mentioned above, the Dynamic Properties task acts as the receiver for the value contained in the Global Variable gcFileNam (see Figure 3). Every iteration of the process feeds a new file name to the child. The Dynamic Properties task lets you reference the Parent’s Outer Variables, which the Parent assigns in the header package and which the child reads as “incoming”. The contents of the Global Variable are used to dynamically change the data source property of the Excel connection (see Figure 4). Sweet.
|Figure 3: Specifying the Named Package Variables as Global|
|Figure 4: Specifying the Data Source for the Child’s Dynamic Properties Task|
Specifying the Data Source for the child’s Dynamic Properties task, Dynamic, because they can be changed during package execution via the Global Variable.
The problem remaining was how to use these dynamically assigned data sources to invoke the child package’s extraction routine. Enter the Active X scripting task. Active X scripts are as useful as they are unfriendly (i.e., the script will parse correctly but that doesn’t mean it will do as you intend). There is no IntelliSense available and you can’t include other objects by References. In other words, any objects you need have to be created by CreateObject rather than New. All in all, it is a pretty hostile environment.
Luckily, you don’t have to stay there long. You need only create a reference to the child package, load up the Global Variables, and send it. The following is the code for the Active X script:
'**********************************************************************' Visual Basic ActiveX Script'************************************************************************Function Main() dim rst set rst = createobject("adodb.recordset") set rst = dtsglobalvariables("gncount").value rst.movefirst Dim oPackage Dim sServer, iSecurity, sPackageName sServer = "NTSRV2" iSecurity = DTSSQLStgFlag_UseTrustedConnection sPackageName ="DrewExperiment" set oPackage = CreateObject("DTS.Package2") oPackage.Name = sPackageName oPackage.LoadFromSQLServer sServer, , , iSecurity, , , ,sPackageName for i = 1 to rst.recordcount oPackage.GlobalVariables.Item(2).Value=rst.fields(0) oPackage.GlobalVariables.Item(1).Value=rst.fields(1) oPackage.Execute rst.movenext next oPackage.Uninitialize Set oPackage = Nothing Main =DTSTaskExecResult_SuccessEnd Function
The result of the select statement in the parent package is used as an index for the for next loop, and is transmitted by the Global Variable gncount.
DTS Package for the Child Package
The next issue is to create an instance of a DTS Package that represents the child package, whose functionality I described at the beginning of the article (basically, an Excel connection, a Transform Data task, and a SQL Server connection). Once the references are created, the for next loop refreshes the globals with the data contained in the file location table and executes the child package (see Figure 5). This way, you can feed the extraction mill with the information it needs to perform the data transfer from as many tables as exist in the Excel directory.
|Figure 5: Active X Script Transformation|
The last step was to create another small script in the child package to update the contents of the scrubbing table with the contents of the Global Variables. This one does not need a loop, because you are controlling the child package execution from the parent. It fires only once every time it’s called. Figure 6 shows the result of the import.
|Figure 6: Consolidation of Check Register Data in One Table|
This figure shows the finished product: a consolidation of fifty workbooks worth of check register data in one SQL Server table.
Notice that over 11,000 lines of Excel spreadsheets have been consolidated into a SQL Server table. This seems to be crying out for some application development, since SQL Server is much more secure and robust. But you know Finance guys, they love them spreadsheets!
Finally, I created a wrapper for the parent and child process so that they could be executed with one click (or one call from a GUI). I accomplished this by connecting two Execute Process tasks with an “On Completion” precedence constraint (see Figure 7).
|Figure 7: Two Execute Package Tasks Communicating with Each Other|
This figure shows two Execute Package tasks communicating with one another via a precedence constraint, demonstrating the ability to modularize complex DTS packages.
The Execute Process tasks are very simple to invoke. You only have to provide the name of the task to run. Using these tasks lets you make multi-step DTS packages more modular and, paradoxically, easier to debug. For some reason, the error handling and messages are better from these top-level tasks. Whereas I found the lower-level error reporting, shall we say, inscrutable (see Figure 8 and Figure 9)?
|Figure 8: Execute Package Dialog Asks for the Name and Location of the Other Package to Run|
|Figure 9: Execute Package Dialog Asks for the Name and Location of the Other Package to Run|
Good Solution?Could’ve Been Better
So that’s how I managed to import an indeterminate number of Excel worksheets into SQL Server. Having the check register data available in the database allowed me to give the Finance guys an automated report that isolated mismatches or missing checks, which saved them hours of manual searching through paper registers or spreadsheets.
As usual, after I finished the work, I got a couple more ideas for tightening it up. One big improvement to the process would be to use the lookup tab on the Data Driven Query task to check the imported line against the contents of the table. This would change the total refresh into a differential one, where the lookup would import the row only if it did not exist in the check register table.
Lookups are not without their own limitations, however. The biggest of which is the extra time they take to run, since they don’t work by sets but by a row at a time. Testing would establish its applicability. If the number of changes were small, it may improve execution speed, but if many changes occurred, it would be best to manage them in a set-based solution.
Finally, I would like to thank my colleague, Sesha Chivakula, for her help in creating this solution.