Are You Ready for Enterprise Systems That Fix Themselves?

Are You Ready for Enterprise Systems That Fix Themselves?

It is more important to understand the problem than the solution.
?Albert Einstein

Decades ago, engineers who wanted to build more resilient networks coined the term “self-healing system” to describe computer networks that would be able to remedy error conditions without human intervention. Since then the self-healing concept has been researched for application in areas such as robotics, control systems, programming languages, software architectures, fault-tolerant computing, and neural networks.

What does it mean? Manufacturers are freely creating their own definitions of “self-healing,” which makes a single overarching definition difficult, but for the purposes of this article self healing is the ability of a software system to adapt at run time to changing user needs, system faults, and resource variability. A functional self-healing system would locate and isolate problems and then execute a remedy. The goal of self-healing computer networks is to be fault-tolerant and high performing.

A worm is the epitome of a self-healing system?cut it in half and the head end will usually survive, regenerating a new tail end for itself. Of course, the more complicated the organism, the less likely it is that “self healing” can be achieved.

Try to imagine the organic equivalent of today’s computer infrastructure. You might imagine it constructed with, say, a pig’s heart, raccoon’s body, duck feet, a turkey’s brain, and a weak immune system?every part is made by a different manufacturer. Clearly, the self-healing gauntlet is not a simple one.

Self-healing Objectives
In order for a self-healing computer system to do its job it must subscribe to one of two approaches for creating diagnoses: it must either learn the appropriate reaction to a stimulus/problem or it must use a pre-defined set of instructions for reacting to a stimulus/problem.

More specifically, the de facto objective among industry contributors seems to be to create an infrastructure that can:

  • Define/discover itself and export that definition to external systems; for example the Intelligent Platform Management Interface (IPMI), which defines hardware platform specifications for management, or Web-Based Enterprise Management Initiative of the Distributed Management Task Force (WBEM/CIM), which defines the data model for management data and the APIs for the exchange of this data.
  • Detect faults and publish these faults via standard mechanisms; for example, Simple Network Management Protocol (SNMP) or WBEM/CIM.
  • Take unattended (i.e. machine automated, without relying on human interaction), yet auditable corrective actions based on either:
    ?faults and performance events “published” by the infrastructure components
    ?new demands (e.g. additional business load)
  • Know the system’s historical response to new business demand; for example, today statistical baseline models of historical norms at different times of day and/or month are used as reference points for flagging ‘out of norm’ conditions for corrective actions. As the data updates the statistical model the baseline models grow more refined.
  • Replace resources that are defective without prompting; for example, fault-tolerant systems such as Stratus and HP’s Himalaya have redundant components that automatically failover to a spare if a primary component fails.
  • Adapt to peaks and valleys of demands; for example, workload balancers sense the transaction activity on a multi-resource system and can distribute that workload across those resources as priorities dictate. The tricky part here is the setting of priorities because priorities can change with business circumstances that are not necessarily related to the machine resource infrastructure.

Today’s State of the Art
Problems that self-healing systems are designed to solve are classified in two categories: those that cause fault events and those that cause performance events.

Faults normally can be identified and located with well-known correlation techniques. The diagnosis, however, can be challenging due to the fact that exact context for the fault is not always available. Therefore, the state of the art for fault control today avoids root cause diagnosis altogether by simply replacing the failed component with another component. This doesn’t work, however, for software that fails due to unexpected data conditions. For these situations, companies such as InCert deliver specialized products that can trace and package up the set of data that led to a failure and then ship it over the Internet to a diagnostic site for analysis.

The state of the art for performance bottlenecks is problem location and isolation. Software is available today that measures real and synthetic transactions across distributed environments to isolate transaction bottlenecks and failures. OpenView Transaction Analyzer (OVTA) from Hewlett-Packard is a good example of transaction management software. Flamenco Networks is another company in the Web services management space that can trace transactions over a distributed environment.

In order to comprehend the complexity of self-healing technology, you must first have a deep understanding of the causes of unplanned application downtime. Independent estimates verify my own experience: 20 percent of all downtime is due to hardware, OS, network, and environmental factors; 40 percent is due to bugs and performance issues at the software layer; and 40 percent is caused by operator errors.

Figure 1. Causes of Unplanned Downtime: Each of the three general categories comprises scores of variables, all of which must tracked and analyzed in order for a diagnosis to be made.

In a typical business environment the number of variables that can affect any of these three categories of error is myriad. Therefore, the full set of possible problems and possible diagnoses rises factorially as opposed to linearly, making autonomous self-healing a challenging vision. Fortunately, some break-through technology exists today and, through research and development, it is being improved and extended to achieve even higher goals of machine-based systems management.

The Building Reality
The next evolutionary step in the self-healing vision is diagnosis, which up until now has been the weak link in most management systems.

Diagnosis involves analyzing and correlating of all critical system activity and changes in state related to a problem and selecting the best corrective option.

Of course, in order to do a diagnosis of an event a system must first do an analysis. Customers are looking for ways to simplify the analysis process and to get just the data they need to help them keep the mission-critical components of their business running?which really means up and performing correctly. To accomplish this requires viewing the threads of execution across a series of components in a distributed environment and then correlating fault and bottleneck location data with detailed CPU, memory and I/O metrics.

Savvy CTOs will recognize the importance of a system that provides “just enough” analysis and diagnosis data to comprehend and prioritize systems and network operations problems. Exposing too much data can be counterproductive. When computer resources are well instrumented and documented?with error information, performance information, and workarounds for every type of problem readily accessible?it is easy to slip into pedantry. Similarly, when critical failures occur, it’s more important to repair the fault quickly than it is to let the business process languish while employees search for the root source of the problem. Businesses need to be able to prioritize issues and a good self-healing system will facilitate that effort rather than work against it.

Ultimately, self-healing will become so advanced that it will achieve “business virtualization,” a term used to describe an infrastructure that evolves intelligently as needed to make the system resistant to faults and performance-related downtime. There are already examples of these kinds of infrastructures in use. A new class of utility data center software and hardware can virtualize the physical resources of the infrastructure by allowing a developer to choose from a menu the hardware, operating system, middleware, and applications in use, which causes all the necessary components for operation and fault and performance management to download. By masking the complexity of building and running n-tier enterprises, business virtualization constitutes a genuine tactical advantage for CTOs and IT departments.

Figure 2. Toward Self-healing Systems: Most of the current research and development effort around self-healing systems is in the area of location, isolation, and diagnosis. Some technology exists today, but enterprises lack a management system that is fully diagnosis-capable.

What to Look For
There are many software management solutions in the marketplace today that can easily recognize a fault or performance problem. The next logical technical plateau of management systems is locating, isolating, and diagnosing problems in a distributed environment.

As a CTO or IT manager your first step is to decide whether (and when) the advantages of self-healing are required in your organization and whether they can provide a reasonable return on your investment. The larger the resource infrastructure and the wider the distribution of those resources, the more important it is to consider the benefits of self-managing features. Then you must determine what level of sophistication your requirements demand.

In most legacy corporate IT environments, human brain power is still the predominant method of isolating and fixing network elements that cause failures or performance degradation. A person with sufficient domain knowledge will use hunches, educated guesses, past experience, and the knowledge of co-workers to test hypotheses and deduct failed assumptions. Today, this process has not been extensively duplicated by computers, even through the use of knowledge bases and artificial intelligence. From our experience to date, we can suggest CTOs take their next step toward self-healing systems by looking for vendors with a track record of improving problem location, isolation and diagnosis, while concurrently advancing the state of control architectures like fault-tolerant systems and utility data centers. Experience also suggests:

  • The most basic self-healing system should be expected to locate and isolate faults and performance bottlenecks in mission-critical applications. Performance bottleneck analysis requires the system to have capabilities for tracking a transaction’s execution across a distributed environment and find out how much time is spent at each node. Once the problems are located, the system must be able to “drill down” and expose further information about the problem at that location?without unduly taxing performance. An accepted industry heuristic is that management solutions should have less than 5 percent overhead on a managed node.
  • Products from some vendors offer valuable workarounds for more detailed drill-down analysis. Look, for example, to see if there are features that can re-initialize failed nodes or add capacity to bottlenecked nodes.
  • Look for products that include correlation engines. Correlation engines can analyze problems from the bottom-up in addition to top-down by using CPU, I/O, and memory data to create well-known problem sets that can aid diagnosis.
  • Ask the vendor about implementing a pilot solution. The sophisticated technologies used in service-centric self-managing systems are costly and complex, yet justifiable for large enterprises. A vendor should be willing to do the legwork to prove the ROI of the system it wants to sell. Small to mid-sized businesses will likely have simpler needs and require less complexity and more ease of use. These companies typically manage less than 500 nodes and need to isolate fundamental network problems quickly within concentrated, local area networks in just a few geographic locations. As such, solutions that offer limited history, network mapping, and basic network diagnosis, in conjunction with alerting capabilities and some basic performance graphing, will demonstrate a better ROI than more complex management solutions. As the company grows, however, the infrastructure becomes more complex and application-specific management tools will be required.

The operational-management software industry is focusing on problem location, isolation, and diagnosis and making great strides. Self-healing technology is reaching an advanced state. Still, customers should not assume that self-healing systems will find and fix every possible fault and bottleneck because the possibilities are literally astronomical. A dose of reality will insulate you?and your enterprise?against the negative effects of over-excited expectations.

devx-admin

devx-admin

Share the Post:
USA Companies

Top Software Development Companies in USA

Navigating the tech landscape to find the right partner is crucial yet challenging. This article offers a comparative glimpse into the top software development companies

Software Development

Top Software Development Companies

Looking for the best in software development? Our list of Top Software Development Companies is your gateway to finding the right tech partner. Dive in

India Web Development

Top Web Development Companies in India

In the digital race, the right web development partner is your winning edge. Dive into our curated list of top web development companies in India,

USA Web Development

Top Web Development Companies in USA

Looking for the best web development companies in the USA? We’ve got you covered! Check out our top 10 picks to find the right partner

Clean Energy Adoption

Inside Michigan’s Clean Energy Revolution

Democratic state legislators in Michigan continue to discuss and debate clean energy legislation in the hopes of establishing a comprehensive clean energy strategy for the

Chips Act Revolution

European Chips Act: What is it?

In response to the intensifying worldwide technology competition, Europe has unveiled the long-awaited European Chips Act. This daring legislative proposal aims to fortify Europe’s semiconductor

USA Companies

Top Software Development Companies in USA

Navigating the tech landscape to find the right partner is crucial yet challenging. This article offers a comparative glimpse into the top software development companies in the USA. Through a

Software Development

Top Software Development Companies

Looking for the best in software development? Our list of Top Software Development Companies is your gateway to finding the right tech partner. Dive in and explore the leaders in

India Web Development

Top Web Development Companies in India

In the digital race, the right web development partner is your winning edge. Dive into our curated list of top web development companies in India, and kickstart your journey to

USA Web Development

Top Web Development Companies in USA

Looking for the best web development companies in the USA? We’ve got you covered! Check out our top 10 picks to find the right partner for your online project. Your

Clean Energy Adoption

Inside Michigan’s Clean Energy Revolution

Democratic state legislators in Michigan continue to discuss and debate clean energy legislation in the hopes of establishing a comprehensive clean energy strategy for the state. A Senate committee meeting

Chips Act Revolution

European Chips Act: What is it?

In response to the intensifying worldwide technology competition, Europe has unveiled the long-awaited European Chips Act. This daring legislative proposal aims to fortify Europe’s semiconductor supply chain and enhance its

Revolutionized Low-Code

You Should Use Low-Code Platforms for Apps

As the demand for rapid software development increases, low-code platforms have emerged as a popular choice among developers for their ability to build applications with minimal coding. These platforms not

Cybersecurity Strategy

Five Powerful Strategies to Bolster Your Cybersecurity

In today’s increasingly digital landscape, businesses of all sizes must prioritize cyber security measures to defend against potential dangers. Cyber security professionals suggest five simple technological strategies to help companies

Global Layoffs

Tech Layoffs Are Getting Worse Globally

Since the start of 2023, the global technology sector has experienced a significant rise in layoffs, with over 236,000 workers being let go by 1,019 tech firms, as per data

Huawei Electric Dazzle

Huawei Dazzles with Electric Vehicles and Wireless Earbuds

During a prominent unveiling event, Huawei, the Chinese telecommunications powerhouse, kept quiet about its enigmatic new 5G phone and alleged cutting-edge chip development. Instead, Huawei astounded the audience by presenting

Cybersecurity Banking Revolution

Digital Banking Needs Cybersecurity

The banking, financial, and insurance (BFSI) sectors are pioneers in digital transformation, using web applications and application programming interfaces (APIs) to provide seamless services to customers around the world. Rising

FinTech Leadership

Terry Clune’s Fintech Empire

Over the past 30 years, Terry Clune has built a remarkable business empire, with CluneTech at the helm. The CEO and Founder has successfully created eight fintech firms, attracting renowned

The Role Of AI Within A Web Design Agency?

In the digital age, the role of Artificial Intelligence (AI) in web design is rapidly evolving, transitioning from a futuristic concept to practical tools used in design, coding, content writing

Generative AI Revolution

Is Generative AI the Next Internet?

The increasing demand for Generative AI models has led to a surge in its adoption across diverse sectors, with healthcare, automotive, and financial services being among the top beneficiaries. These

Microsoft Laptop

The New Surface Laptop Studio 2 Is Nuts

The Surface Laptop Studio 2 is a dynamic and robust all-in-one laptop designed for creators and professionals alike. It features a 14.4″ touchscreen and a cutting-edge design that is over

5G Innovations

GPU-Accelerated 5G in Japan

NTT DOCOMO, a global telecommunications giant, is set to break new ground in the industry as it prepares to launch a GPU-accelerated 5G network in Japan. This innovative approach will

AI Ethics

AI Journalism: Balancing Integrity and Innovation

An op-ed, produced using Microsoft’s Bing Chat AI software, recently appeared in the St. Louis Post-Dispatch, discussing the potential concerns surrounding the employment of artificial intelligence (AI) in journalism. These

Savings Extravaganza

Big Deal Days Extravaganza

The highly awaited Big Deal Days event for October 2023 is nearly here, scheduled for the 10th and 11th. Similar to the previous year, this autumn sale has already created

Cisco Splunk Deal

Cisco Splunk Deal Sparks Tech Acquisition Frenzy

Cisco’s recent massive purchase of Splunk, an AI-powered cybersecurity firm, for $28 billion signals a potential boost in tech deals after a year of subdued mergers and acquisitions in the

Iran Drone Expansion

Iran’s Jet-Propelled Drone Reshapes Power Balance

Iran has recently unveiled a jet-propelled variant of its Shahed series drone, marking a significant advancement in the nation’s drone technology. The new drone is poised to reshape the regional

Solar Geoengineering

Did the Overshoot Commission Shoot Down Geoengineering?

The Overshoot Commission has recently released a comprehensive report that discusses the controversial topic of Solar Geoengineering, also known as Solar Radiation Modification (SRM). The Commission’s primary objective is to

Remote Learning

Revolutionizing Remote Learning for Success

School districts are preparing to reveal a substantial technological upgrade designed to significantly improve remote learning experiences for both educators and students amid the ongoing pandemic. This major investment, which

Revolutionary SABERS Transforming

SABERS Batteries Transforming Industries

Scientists John Connell and Yi Lin from NASA’s Solid-state Architecture Batteries for Enhanced Rechargeability and Safety (SABERS) project are working on experimental solid-state battery packs that could dramatically change the

Build a Website

How Much Does It Cost to Build a Website?

Are you wondering how much it costs to build a website? The approximated cost is based on several factors, including which add-ons and platforms you choose. For example, a self-hosted