devxlogo

Sync Your Database to SharePoint Using SSIS

Sync Your Database to SharePoint Using SSIS

ustom SharePoint lists are like tables in a traditional database; in some ways they’re even better. But as organizations adopt this new data store, they introduce additional complexities in integrating new list-based solutions with existing database applications. This article shows you how to solve the data mismatch problem by using the Extract Transform and Load (ETL) capabilities of Microsoft SQL Server Integration Services (SSIS) and Microsoft’s new Collaborative Application Markup Language (CAML).

Why Yet Another Data Store?
If you aren’t aware of the power of SharePoint custom lists yet, the topic is worth investigating. SharePoint custom lists combine the ease-of-use that has made Excel Spreadsheets and Access databases so successful among “knowledge workers” in many organizations with the benefits of traditional databases. They offer Access-like simplicity of both administration (creating and modifying lists) and data access, while overcoming one of Access’ biggest challenges: decentralization.

In addition to centralizing data on a server, SharePoint lists benefit from other traditional database features such as referential integrity, indexes, and views. They also benefit from SharePoint-provided features such as native support for workflows and automatic versioning.

And if that isn’t enough to convince you of the likelihood that you’ll interact with them in the near future, consider how seriously Microsoft is pushing the technology by integrating SharePoint and Microsoft Office 2007. Even if you aren’t using lists today, the chances are good you will be before long.

The Bane of a New Data Store: Integration
As capable as custom lists may be, they suffer from a problem common to all new technologies: the need for interoperability. At some point, list-based solutions will need to interact with other systems. Organizations may want to sync customer information with CRM solutions, import financial data from ERP systems, export to data warehouses, or perhaps migrate entire list-based solutions to consolidate stovepipe systems. Regardless of where the data is coming from or going to, Microsoft offers a powerful tool for these kinds of extract, transform, and load (ETL) operations: SQL Server Integration Services (SSIS).

SSIS to the Rescue
Even if you’ve used SSIS previously, interacting with list-based data in SharePoint is a little tricky. SharePoint exposes access to almost all of its inner workings through a comprehensive web services API.

While SSIS is still the best tool for the job of moving your data (far better than its obsolete cousin DTS), you’ll need to learn a few new techniques. Most of these techniques are based around using a new XML-based language called Collaborative Application Markup Language (CAML).

The remainder of this article provides you with basic tools for retrieving and storing list data using CAML including:

  • Calling Web Services in SSIS
  • Retrieving list data with CAML
  • Batch inserting data into lists using CAML

To illustrate these tasks, this article shows you how to build a SSIS application that performs a one-way sync from the Northwind database’s Suppliers table into an equivalent custom list.

Web Services in SSIS
If you’re new to accessing web services in SSIS you might be inclined to use the “Web Services Task” in the toolbox. However, the task is difficult to use, and you’ll have more flexibility and fewer problems if you generate the web services proxy class with Visual Studio’s WSDL.exe tool in a class library and access it in SSIS.

This approach has the additional benefit that you can write helper methods in C#, which you can reuse across all your script tasks. Furthermore, if you’re more comfortable in C# code, you can minimize the amount of Visual Basic (the mandatory language of script tasks) that you would otherwise need to write in SSIS.

Creating Your Web Service Proxy
To begin creating the sample application, open Visual Studio and create a new project of type “Class Library,” and name it something like “SharePointServices” (see Figure 1).

Next add a web reference to the Lists service of your SharePoint site by following this procedure:

  1. Right click on References and select “Add Web Reference.”
  2. In the URL field, enter http://[servername]/[site name]/_vti_bin/Lists.asmx.
  3. Optionally, enter a reference name and click “Add Reference” (see Figure 2).
?
Figure 1. Class Library Project: Create a new class library project and name it “SharePointServices.”
?
Figure 2. Adding the Reference: Use this dialog to add a reference to a Web service.

Visual Studio automatically runs WSDL.exe in the background to create a proxy class that shields you from the Web services communication details and also provides simple asynchronous access to the Web services methods.

At this point you can add a class with some convenience functions. You’re welcome to use the classes available in the downloadable solution, and that you can also find in Listing 1. This article will discuss the helper functions in more detail as it uses them.

Deploying the Class Library to SSIS
To enable SSIS to use your class library, you must sign the assembly, copy it from SharePointServicesinDebugSharePointServices.dll to C:WindowsMicrosoft.NETFrameworkv2.0.50727, and install the assembly to the Global Assembly Cache (GAC).

?
Figure 3. Creating a Strong Key: Enter the project name in the Create Strong Name Key dialog.

If you aren’t familiar with the process for signing an assembly, follow these steps:

  1. Right click on your project.
  2. Select Properties.
  3. Select the “Signing” tab.
  4. Select from the “Choose a strong name key file” dropdown.
  5. Enter the name of the project in the “Create Strong Name Key” dialog box (see Figure 3).
  6. Uncheck “Protect my key file with a password.”
  7. Click OK.
  8. Compile the project.

Then, to install the Assembly to the GAC:

  1. Open up a “Visual Studio 2005 Command Prompt” from the start menu.
  2. Change to the directory where you copied the assembly (cd “C:WindowsMicrosoft.NETFrameworkv2.0.50727”).
  3. Install it by executing the command:
       gacutil --i SharePointServices.dll

Now when you open up a script task in SSIS you can add a reference to SharePointServices. If you make a change to the file you’ll need re-copy the changed file and reinstall it to the GAC.

Set Up SSIS for Success

?
Figure 4. Integration Services Project: The Integration Services Project template option appears only after you’ve installed SSIS.

You’re now ready to create an SSIS package to migrate data. To do this, open Visual Studio and create a new project, selecting the “Integration Services Project” template that you’ll find under “Business Intelligence Projects” (see Figure 4). Note that this option appears only after you install SSIS from the SQL Server 2005 install disk.

Next, create a connection to Northwind. You’ll use this connection throughout the SSIS project, so if you change databases you only need to make the change in one place. Figure 5 illustrates the steps shown below:

?
Figure 5. New SSIS Connnection: The circled numbers show the progression of steps required to create a new SSIS database connection.
  1. Right click in the Connection Manager pane.
  2. Select “New OLE DB Connection.”
  3. Click “New” in the “Configure OLE DB Connection Manager” window.
  4. Enter your server name in the Connection Manager window.
  5. Select your database.
  6. Hit OK to close both dialogs.

You’ll want to create variables to hold your CAML XML commands (see Figure 6). You’ll use these to build up insert, update, or delete statements and then execute them all at once. Here are the steps to create the variables:

?
Figure 6. Create Variables: The circled numbers show the steps to create variables to hold your CAML XML commands.
  1. Right click somewhere in the workspace and select “Variables” to ensure the variables are scoped at the Package level (incorrectly scoped variables can be a real pain to debug in SSIS, so be careful).
  2. Click the New Variable toolbar button.
  3. Enter a variable name to hold an XmlDocument (xmlDoc) and one to hold a batch of commands for each list (elementSuppliersBatch).
  4. Unfortunately there are not many predefined data types in SSIS, so you’ll have to create variables of type Object and cast when you use them.
Author’s Note: For simplicity this solution maintains all data in variables in memory. If you have a large amount of data (perhaps 10,000 records or more), then you would want to build the XML files on disk. However, most techniques in this article will all still apply.

You must initialize the variables in a Script Task, which allows you to run Visual Basic code. Primarily the code will create a Batch CAML command:

    

That command will contain all the individual insert commands. Continue is the most common value for the OnError attribute, but you may also set it to Return. Figure 7 illustrates the steps described in the following procedure:

?
Figure 7. Script Task: Initialize the variables in a new script task by following the numbered steps.
  1. Drag a Script Task from the Toolbox onto the Control Flow Design Surface.
  2. Double click the Script Task to open up the Script Task editor.
  3. Click the Script tab.
  4. Enter the variables from step 3, comma separated to enable this script task to update their values.
  5. Click Design Script. The Visual Studio for Applications window should open.

To complete the task, you’ll need to add a reference to System.Xml, SharePointServices, and System.Web.Services by right clicking on “References” in Solution Explorer, just as you would in the regular Visual Studio environment. Then, add the following Imports statements:

          Imports System.Xml          Imports SharePointServices          Imports SharePointServices.NorthwindSync   

Finally, paste in the following code:

   Public Sub Main()     Try       ' Create an XmlDocument        Dim xmlDoc As XmlDocument = New XmlDocument()          ' Create two XML elements that look like        '        Dim elementSuppliersBatch As XmlElement = _          ListHelper.CreateBatch(xmlDoc)          Dim listService As Lists = ListHelper.GetListsService()       ListHelper.AddDeleteAllItemsBatch( _         listService, "Suppliers", xmlDoc, elementSuppliersBatch)          ' Save the results back to the SSIS variables       Dts.Variables("xmlDoc").Value = xmlDoc       Dts.Variables("elementSuppliersBatch").Value = elementSuppliersBatch          Dts.TaskResult = Dts.Results.Success     Catch Ex As Exception       ' Write out an error       Dts.Events.FireError(1, Ex.TargetSite.ToString(), Ex.Message, "", 0)          ' Return gracefully, never return with an        ' exception from a script task       Dts.TaskResult = Dts.Results.Failure     End Try   End Sub   

Retrieving and Deleting Data with CAML
While the Main subroutine code shown in the preceding section is documented in-line, the following two lines are worth noting because they add commands to the element to delete all existing records from the Suppliers list. They also demonstrate how to retrieve list items from SharePoint.

   Dim listService As Lists = ListHelper.GetListsService()   ListHelper.AddDeleteAllItemsBatch( _       listService, "Suppliers", xmlDoc, elementSuppliersBatch)

The first line retrieves a Lists object, which is the object you will interact with for all list-based requests. The Lists class was generated by Visual Studio when you created the web reference. The helper method GetListsService simply instantiates an instance of the Lists class, then sets the login credentials and the URL:

   public static Lists GetListsService() {       Lists listService = new Lists();       listService.Credentials = CredentialCache.DefaultCredentials;       listService.Url = "http://[server]/[site]/_vti_bin/Lists.asmx";       return listService;   }

The second command is a call to the ListHelper, which adds commands to delete all existing list items. Unfortunately, there is no Truncate statement in CAML, so AddDeleteAllItemsBatch() retrieves all list items in the list and adds a “Delete” command for each. The following line retrieves all records:

   listService.GetListItems(      strListName, null, null, null, "1000000", null, null);

While the details of the GetListItems() function are beyond the scope of this article, if you use it in your own code you’ll most frequently interact with the third argument, which takes a Where CAML clause. However, to return all records you can simply pass in null.

One trick worth mentioning is that this method retrieves about 20 items by default (or whatever is specified in your default view). Consequently, you must pass in a large number for the fifth argument (which specifies the maximum number of results to return) to prevent your results from being limited unexpectedly in production.

The GetListItems() function returns an XML node containing the query results. The AddDeleteAllItemsBatch() method then loops through these results using the following two lines of code:

   XmlNode nodeRsData = nodeListItems.ChildNodes[1];   foreach (XmlNode nodeRow in nodeRsData.ChildNodes) {

It then retrieves the ID fields, which are the hidden primary key for all list items with this line:

   nodeRow.Attributes["ows_ID"].Value;

Finally, the method appends Method elements (which are individual CAML commands) that look like the following:

         [Id]   

The Method CAML command’s ID attribute provides an identifier (which incidentally does not need to be unique to the batch) that SharePoint will return to you when you commit the batch, along with a notification of success or failure.

The Cmd attribute of the Method element identifies this command as a Delete statement. The two other valid values are Insert (which you’ll see later) and Update.

Finally the Field element in this context is like a WHERE Id = ‘[Id]’ command in SQL. You’ll see the Field element used in a different context shortly.

Streaming Data to CAML
After deleting all existing suppliers, you need to create a data flow task that streams the updated data from the Suppliers data table into a script component that will append Insert CAML statements to the batch. The numbered circles in Figure 8, Figure 9, Figure 10, and Figure 11 correspond to the numbered steps in the following procedure:

?
Figure 8. Data Flow Task: Create a new Data Flow task and connect it to the Script Task.
  1. Drag a Data Flow Task onto the Control Flow design surface.
  2. Connect the script task to the Data Flow Task.
  3. Double click the Data Flow Task to get to the data flow screen, where SSIS will manage the stream-based flow of data from the Suppliers table to the elementSuppliersBatch variable.
  4. Add an OleDb Source.
  5. Double click to get to the OleDb Source Editor.
  6. Select a Data Access Mode of “SQL Command.”
  7. Enter the SQL to retrieve some data from the Suppliers table: SELECT SupplierId, CompanyName FROM Suppliers;
  8. Click OK.
  9. ?
    Figure 9. Create an OleDb Source: The circled numbers illustrate the process to get the Suppliers data from an OleDb source.
  10. Drag a Script Component onto the Data Flow design surface
  11. When the dialog prompts for the script type, select “Destination,” so that SSIS knows that the flow of data will go no further than this task.
  12. Connect the OleDb Source to the Script Component
  13. Double click the Script Component to get the Script Transformation Editor. Notice that this is similar to but not identical to the Script Task in the Control Flow.
  14. Click both input columns that SSIS retrieved from the OleDb Source to indicate to SSIS that you will access them in code (in the background SSIS generates code to make these available).
  15. Select the Script tab.
  16. ?
    Figure 10. Choosing a Script Destination: The numbered steps show the process of connecting the data to a controlling script.
    ?
    Figure 11. Designing the Script: In the Script Transformation Editor, select the “ReadOnlyVariables” option.
  17. In the script window enter the XML Document and the Element you will write to as ReadOnlyVariables (see Figure 11). Don’t use the ReadWriteVariables option, because SSIS locks the variables in that category, which prevents you from using them in other tasks. In this case, because you won’t be changing the variable stack references?only their data on the heap?you can get away with using ReadWriteVariables, and make things easier down the road.
  18. Click Design Script, add references to System.Xml and SharePointServices, add the lines Imports System.Xml and an Imports SharePointServices and then enter the follwing code:
   Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)     Try       ' Retrieve XML variables       Dim xmlDoc As XmlDocument = CType(Variables.xmlDoc, XmlDocument)       Dim elementSuppliersBatch As XmlElement = _         CType(Variables.elementSuppliersBatch, XmlElement)          ' Create an "insert" CAML statement under the  element,       ' which will look like:       '        Dim elementMethod As XmlElement = ListHelper.CreateMethod( _         xmlDoc, elementSuppliersBatch, "New", Row.SupplierId.ToString())          ' Create a two  elements under the new  element,       ' which will look like:       ' [SupplierId]       ListHelper.AppendField( _         xmlDoc, elementMethod, "SupplierId", Row.SupplierId.ToString())       ListHelper.AppendField( _         xmlDoc, elementMethod, "CompanyName", Row.CompanyName)     Catch ex As Exception       Me.ComponentMetaData.FireError( _         1, "Script Component", ex.Message, "", 0, True)     End Try   End Sub

The preceding code is commented, but basically it adds the following CAML statement for every record in Suppliers:

         [SupplierId]      [CompanyName]   

These should look familiar because they’re similar to the Method statements you saw when deleting list items. This time, though, the Cmd attribute specifies an Insert command (New), and the Field elements specify data to insert rather than acting as a where clause.

One gotcha is that the Field element’s Name attribute should refer to the internal name of the field?and that’s not necessarily the same as the display name. If you are unsure of an internal field name, the easiest way to find it is to navigate to the list in SharePoint, go to the List Settings Page, click on the column, and copy and UrlDecode the value after the Field= parameter of the URL. The most common problem here is that SharePoint converts spaces to _x0020_.

Incidentally, don’t forget the Try/Catch block, this is even more important in a Data Flow for debugging if something goes wrong.

Executing the Batch
The last step is to write the code to execute the Batch of delete and insert commands in SharePoint. Figure 12 shows the completed code flow:

?
Figure 12. Complete Code Flow: Here’s the completed code flow in the designer.
  1. Add a second script task following the instructions given previously.
  2. Connect the Data Flow Task to it and then open the script task.
  3. Under the “ReadOnlyVariables” enter elementSuppliersBatch.
  4. Edit the code, and add a reference to and Imports statement for System.XML and SharePointServices
  5. This time also add a reference to System.Web.Services
  6. Paste the code in Listing 2.

The code in Listing 2 is commented and fairly straightforward, but there are two important things to note. First, if you execute a CAML statement batch that has more than about 500 statements in it you will receive the error “Some part of your SQL statement is nested too deeply. Rewrite the query or break it up into smaller queries.” I’ve documented this problem in the blog post, “CAML: Nested Too Deep,” but fortunately, there’s a workaround. Because of this potential error the code in Listing 2 uses a helper function called ExecuteLargeQuery() that groups CAML statements into smaller batches and executes them.

If you don’t think the size limitation will be a problem after your application reaches production then you can call Lists.UpdateListItems() directly. The function takes two arguments: the name of the list to execute against, and the XmlElement that contains your CAML statements. UpdateListItems() returns XML that specifies the success or failure of every command in the batch.

?
Figure 13. Successful Migration: When you run the SSIS script, you’ll see the 29 suppliers appear in the list.

The second thing to note in the code is the Try-Catch block. If UpdateListItems fails on something other than an individual method command, the error message isn’t in the Exception.Message property as with a traditional exception. Instead you will need to catch a SoapException and retrieve the SoapException.Detail.InnerText property to retrieve the error message.

Let’s Run It
At this point you can run the SSIS script and, hopefully, watch SSIS insert the 29 suppliers into your list. And, if you run it again you should still have 29 suppliers because SSIS deletes the first 29.

The downloadable code includes a command line application that can create a Suppliers list (also using CAML) in the SharePointServices solution (see Listing 3).

So now, using the techniques outlined in this article, you should have the foundations to integrate data between SharePoint and just about any other data source. Good luck!

devxblackblue

About Our Editorial Process

At DevX, we’re dedicated to tech entrepreneurship. Our team closely follows industry shifts, new products, AI breakthroughs, technology trends, and funding announcements. Articles undergo thorough editing to ensure accuracy and clarity, reflecting DevX’s style and supporting entrepreneurs in the tech sphere.

See our full editorial policy.

About Our Journalist