Problem: your company is going through yet another "process re-alignment" and you've been given the unenviable task of merging multiple databases into one Damn Big Mega System (DBMS). To make the project even more unappealing, your sales group has been using Informix, billing is in SQL Server, the new company you just acquired is married to Oracle, and some R&D guys have been keeping everything in Derby. Oh, and the Powers That Be want it all converted to DB2.
Before you pull out the whiteboard and order cases of Red Bull, take a moment to assess the situation. The hardest, probably the most important part of a project like this is how you start it. How will you get a birds-eye view of all of the schemas? How are you going to map the fields of all these different systems? Most of all, how long will all of this take?
Got a few minutes?
Ok, a few minutes might be a slight exaggeration. But once you discover what IBM has in store for you next, you can start planning a few extra "work from home" days after the project launches ahead of schedule. Rational Data Architect is the tool you've been waiting on for years. IBM, what took you so long?
Introducing IBM Rational Data Architect (RDA)
RDA is a data modeling tool designed for discovering, mapping, and integrating your databases. With RDA, you can create fresh schemas or connect to data sources and automatically generate models of existing schemas. Use RDA's intelligent mapping technology to discover how data is related between heterogeneous databases. This way, you can quickly model a solution based on integrated information or even design a federated data solution.
First, RDA uses JDBC to connect to a variety of data sources, then communicates with them using native queries. Probably the friendliest database tool on the market, it'll hook up with just about anything:
DB2 Universal Database for Linux, UNIX, and Windows, Enterprise Edition and Workgroup Edition
Version 8.1
Version 8.2
DB2 Universal Database for iSeries
Version 5 Release 2
Version 5 Release 3
DB2 Universal Database for z/OS
Version 7.1
Version 8.0
Derby
Version 10.0
Informix Dynamic Server
Version 9.2
Version 9.3
Version 9.4
Microsoft SQL Server Enterprise
2000 SP2
Oracle 8i
Release 3 8.1.7
Oracle 8i Enterprise Edition
Release 3 8.1.7
Oracle Enterprise Edition 9i
Version 9.2
Oracle 10g
Sybase Adaptive Server Enterprise
Version 12.0
Version 12.5
Second, after your data sources are connected, you can create data models based on your existing databases, then create mapping models that analyze your schemas and discover possible relationships. The mapping model provides a number of algorithms that can be used to do the discover function.
Third, with the mappings in place, you can add some transformations using a prebuilt set of functions. Generating SQL scripts that can be deployed in the platform of your choice completes the integration.
This is just one type of process that RDA gives the harried database developer. When designing a new database solution, the bulk of the work goes into creating and agreeing on the architecture. Use RDA to quickly draw and review your schema. Automatically generate PDF or HTML reports. Fiddle with it until you get it right, then pop it into SQL and make it happen. RDA has a way with all the little tasks that go into planning and executing these types of projects.
Eclipse Lovers Rejoice
RDA is part of a larger suite of tools called the IBM Rational Software Development Platform (RSDP). Based on Eclipse, RSDP makes good use of the incredibly rich UI of this open source favorite. While viewing your Workbench, or working environment, all the information you need is organized into Views, discreet windows of metadata and interactivity. Stick these views just about anywhere on the screen, shuffle them around, then save the entire Perspective, or layout, for use later. RSDP ships with multiple Perspectives that load automatically based on what kind of project you start. Swap or tweak your layout on the fly or customize the entire Workbench to your heart's content.
New users will especially appreciate RSDP's use of Capabilities. Essentially, the more advanced features are hidden but accessible when needed. RSDP remembers what's been used and offers those options later, but without the all-or-nothing tact that a lot of IDEs take. Think of it as progressive robustness, an IDE that keeps pace with your level of knowledge.
Handholding Done Right
As one of its most impressive, and rare, characteristics, RSDP seems to embrace new users with open and loving arms. With a friendliness that is, quite frankly, very un-IBM, RSDP provides newbies with huge amounts of resources, starting with its Welcome Page, seen in Figure 1.
The clean, elegant Welcome screen gives you the following options:
- Overview: descriptions of core concepts, including Data Modeling, Federated Data Sources, and Metadata Modeling the kind of basic information most products assume you already know.
- What's New: a description of the full RSDP family, dubbed The Rational Advantage.
- Tutorials: a complete set of step-by-step walkthroughs in a variety of learning styles, i.e. "Watch and Learn" (sit back and watch the movie), "Play and Learn" (watch a little, test your skills with an emulator, watch some more), and "Do and Learn" (walk through a detailed set of tasks using the actual product).
- Samples: launch one of the many sample applications that come with the tool.
- First Steps: baby steps to help you get started on a particular type of task, such as "Creating a data design project" and "Connecting to a database".
- Web Resources: not just a page of links, but a complete suite of online resources, from the developerWorks Training site to a UML Resource Center to online tech support.
Unlike most developer products on the market, including a lot of other IBM tools, Rational Data Architect makes no assumptions about your level of knowledge, but rather welcomes everyone. New users can start from page one and go through the tutorials and resources they need to understand the tool. More advanced users can skip right to the features they need without wading through newbie stuff.
A particularly nice treat: RDA installs with an instance of Derby, along with a sample database. It's like thisyou're going through one of the hands-on tutorials and get to the point where you need to connect with a database. Anyone who has gone through the task of installing and configuring an open source database, heck any database, will be shocked to discover that they already have a fully functioning sample ready to play with. Just follow the tutorial step by step. Believe it or not, it works beautifully right out of the box.
The only problem I encountered while testing the beta came when I was trying to switch between the tutorial and the workbench while I had a wizard or dialog box open. Sometimes the IDE was reluctant to give up control of my screen. Usually this wasn't a problem, but when I had to scroll down in the documentation but couldn't, I sometimes had to make an educated guess as to which button to push next in the wizard. Overall, I would sometimes run into a minor UI problems like this, but for the most part this was a surprisingly strong beta.
Quickie Walkthrough: Model and Create a DB2 Database
Enough chit chat. Let's take a look at this thing. If you download RDA from IBM's software site then you can follow along. But keep in mind the following breezethrough is intended just to give you an idea of what the product looks like and how it works. The tutorials included with the product will give you a much more extensive hands-on experience of RDA's features.
A reader recently asked me if I knew of some software that would visually model a database and then generate creation scripts for DB2. That pretty much sounds like a description of RDA, so that's where I'll start. RDA lets you visually model databases using Logical Data Modeling, which is not based on a particular database, or Physical Data Modeling, which is. Here are the basic steps to creating, for example, a DB2 database. In this example, I want to create a Physical Model, then export the SQL creation scripts, which I can run in DB2.
Step 1: Start by Setting Workbench Preferences and Creating a New Data Modeling Project
- Enable the Capabilities for Data Modeling. Window -> Properties -> Workbench -> Capabilities. Check the two options under "Data". (See Figure 2.)
- After you finish with the Workbench preferences, expand the preferences for Data and take a look at your Transform options. Under Data Type Map change the Database vendor type to "DB2 UDB" (see Figure 3).
- Create a new Data Design project. File -> New -> Data Design Project (Figure 4). In this example, I'm calling the project "DB2Model".
Step 2: Add a Physical Data Model to the Project
- Right-click on the Data Models folder in the Data Project Explorer and choose New -> Physical Data Model (see Figure 5). I'll call the model "DB2 Model".
- You should see a couple of additional panes for your new DBM diagram (see Figure 6).
Step 3: Create a Visual Model
We don't want to get too fancy, so this example represents an over-simplified client management system, with a Clients table linked to Contacts, Campaigns, and Invoices.
- In the palette on the right, click Table and draw four tables. You can also hover over the diagram workspace to get "Add New Table" and "Add New View" icons, a nice little feature. Call these tables Clients, Contacts, Campaigns, and Invoices.
- Hover over the table name to see a pop-up of options, such as "Add New Key" and "Add New Column". For each table, add an ID field as the Primary Key, such as ClientID for the Clients table, ContactID for Contacts, etc. Use Integer as the Data Type for each of these.
- Add some new columns to each of the tables, specifying the fields in Figure 6 or some of your own. For Contacts, Campaigns, and Invoices, add a ClientID column.
- Click Identifying from the Palette to create Identifying relationships between the Clients table and the other tables. Alternatively, you can hover over the Clients table until two stubs come up, then drag and drop the outgoing stub to each of the other three tables. You'll notice that when you create the relationship, RDA identifies the field you're relating to and pops it to the top as a Foreign Key.
Step 4: Create the SQL Scripts
In the Data Project Explorer, if you expand New Database, you'll see all the objects you just created.
- Right-click on New Database and choose Generate DDL… to start the SQL creation script wizard (see Figure 7).
- As you go through the wizard, just accept all the defaults for now. You'll end up with a series of CREATE statements for your tables, as well as Primary and Foreign Key definitions. At the end of the wizard, you'll have a .sql file that you can execute right there from the Scripts folder or suck into your database of choice. RDA also includes a rich, full-featured SQL editor, as well as a visual SQL builder. (Note that you may not have much control over where that file ends up. By default, you can find it in the project folder in the workspace you specify when you first start up RSDP. In my case, it was "C:\Documents and Settings\Justin\IBM\rda6.1\workspace\DB2Model".)
For extra credit, be sure to explore the advanced features of the Physical Data Model, particularly when modeling a DB2 database. This includes using the Properties view to alter a number of different aspects of the database and its objects, such as:
- Alias Properties
- Column Properties
- User-Defined Types
- Indices
- Materialized Query Table (MQT) Properties
- Multidimensional Cluster Table (MDC) Properties
- Stored Procedures and their parameters
- Triggers
Quickie Walkthrough: Map to a Derby Table
RDA allows you to map the relationships between data sources by generating models of existing tables and mapping them to one another. Mapping helps you get a quick understanding of the relationships involved and potential complications. In addition, by visually building transformations, you can get a jumpstart on some tricky SQL code when it comes time to merge your data sources.
As mentioned, RDA comes pre-installed with a working Derby sample database, called SAMP. No configuration necessary. So for this next trick, I'm going to map to an existing table called PROJECT, which I'll pretend is the equivalent of the Campaigns table created in the previous walkthrough. In a multi-database scenario like the one at the beginning of this article, you can use this same basic process once you add connections to your various databases.
Step 1: Generate a Model for PROJECT
- If you didn't already have the connection to your sample Derby database, you could add it by clicking New Connection in the Database Explorer. This brings up a wizard like the one in Figure 8, which I'm including here just for your edification.
- Since you already have the Sample database connected, expand it in the Database Explorer view and drill down to the PROJECT table, as in Figure 9.
- Drag and drop the PROJECT table to the Data Models folder in the Data Project Explorer. This will automatically generate another Physical Data Model for the Sample database (see Figure 10).
Step 2: Create a New Mapping Between Campaigns and PROJECT
- Right-click on the Mappings folder and click New -> Mapping Model. This starts a wizard that leads you through creating a new Mapping.
- Name the mapping. I'll call this one ProjectMap.msl.
- Specify the Mapping Source: this will be the schema for the table containing data you want mapped to another table. In this case, I'll select sample.dbm.
- Specify the Mapping Target: this should be the schema you want to map into. In this case, I'll specify the DB2 Model.dbm.
- The final stage of the wizard shows you a summary, as in Figure 11.
- Click Finish to create the new mapping (see Figure 13). When you first create it, the mapping won't have any relationships, so the middle area will be blank.
Step 3: Draw Some Mappings
If you were working on a more complex project, you would have many more fields and thus many more potential parallels between your tables. For example, if merging an Employees table from your original company with an EmployeeList table from a recently acquired company, you would have several natural matches between fields like LastName, Address, etc. One of the best features of RDA is its ability to discover these relationships and give you a big head start on your mappings. To do this, you would just right-click the Mappings space between the two tables and select Discover Relationships, as in Figure 12.
You can learn more about this process by using the tutorials on Integration Modeling included with RSDP. For now, since the tables in question are so light, you can just create the mappings manually.
- Click on the PROJNO field in the Source pane on the left.
- Right-click on the CampaignID field under the Campaigns table in the Target pane on the right. Choose Create Mapping.
- Click on PROJNAME and right-click on CampaignTitle, then Create Mapping. You should now have two mappings in the middle pane (see Figure 13).
Step 4: Create Transformations
If you look closely at the data types, you may notice that right off, you have a problem. Your original key is CHAR(6) while your new schema specifies INTEGER. A quick way to solve this problem is to Transform the PROJNO field en route to its new identity as CampaignID.
- Right-click the mapping between PROJNO and CampaignID and choose Transform -> Add.
- It's a bit of a cheat, but should get the job done: in the Functions list on the right, expand the String functions and double-click ASCII(CHAR) – INTEGER.
- In the Transformation Expression, you'll see "ASCII()". Put your pointer inside the parentheses and double-click PROJNO on the left. You should end up with the Expression "ASCII(SAMP.PROJECT.PROJNO)", as in Figure 14.
Step 5: Pull It All Together
Now that you have a mapping, what do you do with it? For a start, right-click on the mapping you created in the Data Project Explorer, in this case ProjectMap.msl. Then choose Generate Script. You'll see that the transformation you just created can be described as SQL code, which can be integrated into your migration solution. In addition, now that you have a solid visual for your inter-database relationships, you can communicate this to other team members or generate additional SELECT and INSERT scripts to assist with data migration.
Now, imagine this same process but with four or five databases, each with dozens of tables, each of those with dozens of fields. You can see how much pain RDA will save you. You can take that Informix database from Sales, that SQL Server db from billing, the new Oracle system, and even the little Access tool from Marketing Guru, and automatically generate data models, automatically discover relationships, and quickly create transformations. When you're done, generate SQL script and you have the beginnings of a solution that will impress the boss and make you a hero in the eyes of adoring and relieved co-workers.
Where to Go Next
You just got a birds-eye view of RDA. This article touched on some of the core features, but touched on them very lightly. For a more detailed explanation of the product and its potential, download it from here (415MB download). Also, be sure to read Davor Gornik's article, "Use Rational Data Architect to Integrate Data Sources."