Browse DevX
Sign up for e-mail newsletters from DevX


Coordinate Your DTS Tasks, Part II: Import an Indeterminate Amount of Excel Data into SQL Server-2 : Page 2




Building the Right Environment to Support AI, Machine Learning and Deep Learning

The Elements of a Data Move
Moving data from one place to another takes three elements: a data source, a data transformation, and a destination. The source can be anything for which you have an OLEDB driver, the transformation is the parsing of the data into a SQL Server table, and the destination is your SQL Server. In this case, for your source, use the Excel 2000 driver from the dropdown on the Excel Connection, which lets you point to the thing you want to convert. To transform an Excel workbook or a text file, open the DTS Designer and drag an Excel connection onto the work surface. You fill in the information needed to complete the connection (the path and filename for the Excel source) on the design surface.

All in all, these data-import activities are no more complicated than the following steps:

  1. Drag two connection objects from the designer (one for the Excel connection and one for the SQL Server connection).
  2. Fill in the folder and file name of the source connection.
  3. Select a Copy Column transformation from a Transform Data task.
  4. Indicate the destination on the Destination Tab of the Transform Data task.

If the destination table does not exist, DTS lets you create it right on the spot. The only issue I had with the default settings was that they mimicked the Excel data types, which can be a tad funky in terms of the data types and the lengths that appear as the defaults. Fortunately, you can edit the create table definition to types that are more reasonable for SQL Server. For example, I doubt you want varchar(254) or floats in your table. You can edit and fix the table definitions easily before the target table is created.

Another caveat here is that Excel looks at a small portion (eight or ten rows) of the worksheet to determine the correct data type for the column. Most of the time it is correct, but other times it gets confused either by missing values or ambiguous ones (numbers in text columns). You can override this "excess intelligence" in the connection properties window.

Finish the task by connecting the source connection to one end of the transform data task and then connecting the other end to your SQL Server connection. Hit the little green arrowhead to execute the package, and you're done. The data moves from the Excel workbook into the table you defined. This is the essence of the task, and it works great for an ad hoc, one-time move. However, my solution calls for an indeterminate number of worksheets to process.

Once Is Not Enough
My business problem, as you'll recall from Part I, was that I had about fifty workbooks to import, each of which needed to be refreshed because any one could have been updated since the last time it was imported. In effect, I needed a total refresh of the Excel data as kind of an all-or-nothing snapshot, so that the checks would match up to the claims each time the Finance department wanted to perform a check reconciliation. So I had a working model, but it worked only once!

I needed a way to repeat this import process so I would catch the changed registers. I also needed to wrap this process in a loop so that it would be executed once for each Excel file. My instincts told me that creating fifty such packages was not the answer. I needed a header package that would read the file location table I created in Part I and send that file name into this little transformation mill, so that it would do the same extraction for each Excel file. Figure 1 shows the header package I created.

Click to enlarge
Figure 1: The Header Package I Created

In this figure, the Execute SQL Task task issues a Select statement against the file location table from the previous installment. By using the checkbox on the last page of the task that tells the task to act as a rowset provider, I got the result set to persist. This is the feature that lets me manipulate the result via ADO later on in the process.

I also learned that for this implementation a Package can have Package Variables, which wasn't obvious at first because they appear right on the Designer's work surface (not a menu). You can access the Package Variables by right-clicking anywhere on the DTS Designer and picking them from the menu that appears. This brings up the page shown in Figure 2, where you can create a Global Variable that the package can reference.

Click to enlarge
Figure 2: The Global Variables Dialog

I created a Global Variable for each of the following:

  1. The file name that was retrieved from the table (providing an example of one)
  2. To hold the result parsed out of the file name, which indicates its file type
  3. To hold the current date and time of the transformation
  4. To hold the record count from the file location table

With these Global Variables in place, I assigned the Global Variable holding the file name to be Outer Variables, meaning they would be visible to the child process (or any other process that wanted to use them). Think of them as "exposed".

On the receiving package's side, the child process, I had to create a Dynamic Properties task for the child package. This acts as the receiver for the Outer Variable's values. Once the file name was inserted into the child package, I was almost home!

Comment and Contribute






(Maximum characters: 1200). You have 1200 characters left.



Thanks for your registration, follow us on our social networks to keep up-to-date