Are You Ready for Enterprise Systems That Fix Themselves?

Technology to monitor and maintain distributed computer systems is going through a metamorphosis. The dream: computer systems that can quickly heal themselves in response to a wide array of fault and performance conditions, even changing business conditions. Get a realistic appraisal of today's self-healing technology.




It is more important to understand the problem than the solution.
—Albert Einstein

Decades ago, engineers who wanted to build more resilient networks coined the term "self-healing system" to describe computer networks that would be able to remedy error conditions without human intervention. Since then the self-healing concept has been researched for application in areas such as robotics, control systems, programming languages, software architectures, fault-tolerant computing, and neural networks.

What does it mean? Manufacturers are freely creating their own definitions of "self-healing," which makes a single overarching definition difficult, but for the purposes of this article self healing is the ability of a software system to adapt at run time to changing user needs, system faults, and resource variability. A functional self-healing system would locate and isolate problems and then execute a remedy. The goal of self-healing computer networks is to be fault-tolerant and high performing.

A worm is the epitome of a self-healing system—cut it in half and the head end will usually survive, regenerating a new tail end for itself. Of course, the more complicated the organism, the less likely it is that "self healing" can be achieved.

Try to imagine the organic equivalent of today's computer infrastructure. You might imagine it constructed with, say, a pig's heart, raccoon's body, duck feet, a turkey's brain, and a weak immune system—every part is made by a different manufacturer. Clearly, the self-healing gauntlet is not a simple one.

