How Does the (Relational) Database Fit In?
I honestly am starting to feel that relational databases are overrated when it comes to applications. For some time, I've been in the camp that thinks databases are just a necessary detail in application designwe need to persist the data somewhere, and hey, relational databases are a good way to do it because they've got so many tools to get at the data for other purposes.
|We need to realize that the underlying domain in business software is process-centric.|
The problem is that thinking like this leaves us thinking that our relational database is, first, the right place for other applications to get at our data and, second, that the database should be the center of the application's world. This latter misconception is quite rampant among developers today in my experience and is, as I alluded to in my "I Object" article (CoDe Magazine, Jan/Feb 2006), contributed to by the recordset mentality that was popularized during the VB and ASP era.
Once we allow for the possibility that the database is notand indeed should notbe thought of as the center of our applications, it allows us to think more in terms of how the business thinks, chiefly how to get things done. Put another way, it enables us to think in terms of the business process rather than the business data.
If we allow ourselves to think this way, in a process-centric way, we see how the database, like the data, should serve the application, and not the other way around. This is a truly revolutionary way to think about applications if you come from a data-oriented background. Just stop for a minute and let it sink in and ponder the implications.
One implication is that relational databases may not be the best solution for data persistence in your application. Maybe an object database would be better, or maybe just POXML (plain ol' XML) would work. For instance, I can tell you that my blog software, dasBlog, does just fine using XML as its backing store, and folks far more prolific and popular than I use it as well without a problem.
Even SQL Server 2005 has made XML a first-class citizen, which speaks to the validity of storing data as XML. In fact it is so good at XML that it's tempting to not even bother splitting the data out into separate columns. And the .NET Framework itself treats XML very, very well, which leads one to think twice that maybe XML is an okay medium for persistent storage.
The point here is not to talk about the virtues of XML, though they do abound. It is simply to suggest that there are other viable alternatives to an RDBMS for application data persistence. In fact, one might suggest that the alternatives play nicer in an application world, particularly in a distributed and service-oriented world.
I'm not about to herald the demise of the RDBMS, though. Even if we all agreed that this technology is over the hill, it will still be with us for an extremely long time. At the same time, maybe it is time to seriously consider alternatives in your application design.
If you do want to use a relational database with a process-oriented application, consider taking advantage of XML even if it is within the context of the RDBMS. And if you still want to break out your objects' individual properties in the database, I suggest the following:
Everything that can possibly be nullable should be. This goes back to putting the validation where it properly goesin the validation layer/service.
The only constraints you should have should be relational, i.e., those proper in a relational database. They shouldn't be enforced or checked; they should really just be there for informational purposes; otherwise, you're putting validation where it shouldn't be.
Fashion your data model as closely to your object (domain) model as possible. As Ted Neward summed up in a controversial blog post that likened the Vietnam War to object-relational mapping, ORM is just not clean, architecturally speaking. If you can't avoid it altogether (as I have suggested above), minimize the impedance mismatch.
Some good reasons to countermand these guidelines:
Many, many records (in the millions or more) being reported on in real time means that you need the most efficient data retrieval for data that your application actively modifies. As far as I know, a relationally-designed database is still the best choice for a situation like this.
You have a DBA that makes you do things in a specific way or controls the design himself. If he or she can't come up with a good reason other than that it is an unfamiliar approach or against policies, consider pressing your point about it if you can. If they have valid reasons that make sense to you, go with the flow.
You have an existing database you have to use. Here, I'd at least consider the possibility of having a transient data store to use during application processes and only stick it in this final reference store when the process is complete, if it does complete. Otherwise, go with the flow.
Note that these guidelines only apply to application databases, meaning those designed expressly as a persistent store for data used in an application. I think we too often try to multi-purpose databases when having multiple databases would make more sense.
In particular, I'm thinking of reporting and business intelligence. If you're not familiar with BI, I strongly suggest you gain at least a passing familiarity with it. BI is an extremely important aspect of business automation that isn't best served directly off of a transactional business application. By this I mean that you should consider creating a real data warehouse for your application and do reporting and analytics off of it, not your transactional store, which should be basically a persistence medium for a process-driven application (as I've been talking about up to this point).
If we design our applications to persist data in the manner that I am advocating, we won't be pushing validation checks off to what is arguably a necessary evil, that is, persistent storage. Rather, we will be enabling our application to proceed in a way that makes sense to the process and validate data only when it needs to be validated and how it needs to be validated within that process.