Object-Relational Mapping – Taking the Horror Out of Data Access
1. Horror stories and the object data divide
Let’s face it, the world is getting more and more OO, but we are still pretty happy with our relational databases. In fact the RDBMS is the foundation upon which basically all of our data driven applications reside. Having said this we are pretty aware of the fact that the relational data models we deploy are limited. Sometimes they aren’t much more than a bunch of tables that are increasingly cursed due to limitations and design flaws.
To develop great applications, we need a solid foundation based on a great data model, a data model that should matches real life as closely as possible, the so called domain model. How many database schemas actually resemble the real-world problem? Data access has become a time-consuming horror story that can take the fun out of any development project. But it doesn’t have to be this way.
2. Benefits of OO in stateless business methods
Even though relational theory is great for storing and retrieving large amounts of structured data, it hardly makes for readable business methods. The reason being that the relational data model contains very little information (despite what the name implies) about the relationships and intended use of the data. It has a low level of abstraction in the same way as C compared to VisualBasic. A properly implemented object-oriented data model on the other hand gives you the full picture and allows business methods to contain more straightforward, readable, code. Comparing two code snippets proves our point. The first code snippet interfaces the relational data model through SQL, the second interfaces the object-oriented data model through OO-notation (in this example we use the API of an O/R DAL generated by Pragmatier Data Tier Builder, se reference at the end of this article):
Public Sub GiveRaise(ByVal ConnStr As String, _
4. Explaining object-relational mapping
Now that we have convinced you about the benefits… let’s take a look at object-relational mapping in practice.
• Database coupling
• Repetitive coding
What is the design goal of an O/R DAL?
The main reason why you would want to hide the columns and tables in your relational database is mainly that of improving productivity. The increased abstraction allows more natural modelling of data and simpler handling of complex relationships between data entities. Thus, the one-class-matches-one-table-mapping that some mistakenly believes is O/R mapping turns out to be pretty useless – all overhead, no abstraction.
The other important goal is to remove the dependency of your business methods and UI of the underlying database and database schema. There are several good reasons to do this. An application might after extensive use require refactoring to remove deadlocks or performance bottlenecks. You might want to upgrade or even switch the RDBMS (never happens some say, but that is often a confusion of cause and effect. If your application is closely coupled to your database, you would never be tempted to suggest switching it…). What this amounts to is greatly simplified maintenance. Any changes to the data source require a limited maintenance effort that is confined to a well-defined part of the application.
What is the penalty?
The benefits of the O/R DAL are all very well but there is a cost. Both at design time, during implementation and run-time. Firstly you need to know elementary OO-modelling techniques including the fine art of NOT over-engineering your object-oriented data model. Just like painting, which takes a minute to learn and a lifetime to master, OO data modelling is an art that requires experience. But, it is great fun and the alternative, pure relational data modelling, is really not a serious option anymore.
The biggest drawback is the cost of implementing of a good O/R DAL. Whereas the business methods will benefit immensely, a typical method being half the size and practically self-documenting, actually writing the O/R DAL takes years of experience and a lot of patience. There are a lot of pitfalls to those who choose a DIY approach, and if you do, make sure you keep the O/R DAL really, really, really simple. Even simple O/R DAL’s can be useful as long as you don’t loose out on the abstraction (remember what we said about one-object-maps-one-table…). At the end of this article you will find resources that will help you if you choose to write your own O/R DAL.
On the other hand, there have emerged excellent tools that generate the O/R DAL for you. Just make sure that the tool you choose also generates the database schema, returns objects in relationships so you can navigate with the OO-dot syntax and allows you full flexibility to generate and map different database schemas. If it isn’t flexible enough you could end up in trouble, but if it is you’ll be laughing all the way to deadline (on time). Not to mention maintenance, which is a joy with a standardized and accurately documented API.
Another issue that is raised when introducing an O/R DAL is that of performance. A well-written O/R DAL typically adds less than one percent of overhead to the corresponding dynamic SQL or calls to SPROC. However, there are some cases where a SPROC is a lot faster, such as ”update all product prices by 10%”. Since all data is stored in the RDBMS, you could easily bypass the O/R DAL and write a SPROC to take care of this use case. Just don’t forget to ask yourself if this performance optimisation is really necessary. Even if the update takes a couple minutes longer using the O/R DAL, it might be such an unusual event that it doesn’t matter. The rule is ”don’t fix what isn’t broken”, and the priority should always be maintenance over performance, as long as performance is good enough.
Functionality over performance
The design principle is to develop your entire application using the O/R DAL first. When you have verified that you have met your functional requirements, implemented a ton of changes and made sure your customer is happy with the application, then you can solve performance issues. You basically use a Model | Implement | Test | Tune approach.
When you tune your application you start by identifying business methods that are performance bottlenecks, estimate how these limitations affect the functionality of the application and implement a performance fix if you consider it necessary. Since performance fixes usually are geared towards special circumstances this approach retains flexibility as long as possible.
What about caching?
Data caching is really, really important when creating an O/R DAL. Not only because of performance, but also to achieve consistency during a transaction. In fact, you will be eating dust if you don’t implement data caching and later try to support real transaction management.
Imagine if you display a value, such as the number of page visits, at the top and bottom of a web page. If the value isn’t cached during the life span of the page, it is quite possible that the two numbers will be different. This is unacceptable. To avoid this you would declare local variables in your ASP/ASP.NET-page, but if the O/R DAL supports caching, this is something you don’t have to think about. Very neat, save LOC, and reduce bugs.
If the O/R DAL supports global caching, you can improve the performance of your entire application by an order of magnitude if there is a lot of consecutive reading of the same data. Special considerations should be taken to avoid inconsistency between caches on different servers in a web farm. If you only implement a transaction cache that is flushed after each transaction and is kept isolated from other caches then you won’t have any problem. On the other hand, if each server has a shared cache it should probably only be invoked on read-only data such as articles or user data.
Should we use lazy load?
Lazy loading improves scalability and performance by reducing the amount of data that is read when just navigating around the data object hierarchy. The idea of lazy loading is that you in any given code snippet only use a subset of properties of an object and this might change according to run-time conditions. Thus, deciding which data to actually read is a good idea to post-pone as late as possible. In those cases you benefit from a chunky recordset, you might choose to bypass the O/R DAL, but don’t fix what isn’t…
When implementing your own O/R DAL you really want to choose the features you support carefully. Keep it really, really, really simple. Something that is neat is inheritance. This allows reuse within your object-oriented data model and is a great improvement over a non-reusable model.
Another neat feature is polymorphism. Without it you will loose the possibility of creating generic methods to process objects inherited from the same base class.
Late binding let’s you store any object in a property or collection allowing a generic bookmark feature.
Finally, introspection allows you to write more flexible, and robust, applications by allowing you to dynamically inspect the structure of a given class. Could be useful if you want to implement a generic service that takes an object, searches for a certain set of properties in that object and only manipulates them.
5. Fitting the O/R DAL into your N-Tier application
The O/R DAL slips logically right on top of the database. This means that you encapsulate any CRUD operations that you might have coded in SPROCs. I say logically because if the DB is placed on a separate server you would physically deploy the O/R DAL components on the same server as your business logic components. The reason for this physical partitioning is that the overhead of marshalling objects over a network is substantial. The SQL statements that are sent between the O/R DAL and the DB is much more chunky and better suited for network traffic.
Note that you should avoid coding business logic in SPROCs or the O/R DAL, such as automatically calculating taxes or rebates. The reason is that this is part of the business logic, not the data access. All decisions that are made by the business should be confined to the business logic layer. If the business rule is likely to be changed frequently, or contains variables such as interest rates or taxes, they should be out broken in a manner that is easy to update, such as using VBA, ASP-scripting or application preference objects that are stored in the database.
fig 4. The components of the O/R DAL are physically deployed on the same server as the components that access it. If the UI partly accesses the O/R DAL directly, it too should be deployed on the same physical server.
This architecture allows later partitioning of the database on several machines. The O/R DAL mapping can be updated accordingly and suddenly you have a database that scales out, albeit with some natural limitations.
Get all the benefits of .NET typed datasets and more
To those who haven’t yet switched to .NET it is notable that once you have created the O/R DAL, even if it is a simple one, you basically have a .NET typed dataset. If you implement data caching, it mimics the disconnected feature of the .NET dataset. If you have implemented inheritance in VB6, you will also improve your future migration to VB.NET where OO-features such as inheritance is rudimentary.
A VB6 code example
This is some sample code from a business component in VB6. The business method accesses a fully featured O/R DAL and for the sake of this example we use the API generated by Pragmatier Data Tier Builder.
Public Sub OrderItemsInShoppingBasket(ByVal CustomerID As Long)Dim Customer As CustomerDim Order As OrderDim OrderLine As OrderLineDim BasketItem As BasketItem'Fetch the customerSet Customer = ObjectFactory.GetCustomer(CustomerID)'Create a new order(by sending True to the second'parameter, called "Create")Set Order = ObjectFactory.GetOrder(, True)'set the customer of the order to our customerSet Order.Customer = Customer'Iterate through all the basket items'in the customer's shopping basketFor Each BasketItem In Customer.ShoppingBasket.GetAllBasketItems