Don DeMarco, IBM Director
s companies become more and more dependent on information, the business-continuity tolerance for information loss becomes less and less, particularly in e-business, says Don DeMarco, Director, IBM Business Continuity and Recovery Services. It's a lesson learned from the Y2K compliance issue and ERP (Enterprise Resource Planning) in recent years. Although recovery management (maintaining an IT-based contingency plan and IT recovery plan) is an element of the systems management discipline, DeMarco explains that "the decision as to the acceptable amount of risk for information loss must come from upper management."
|DeMarco has noticed a "chasm between business units and the IT community."
During his speaking engagements at conferences and industry events, DeMarco has noticed a "chasm between business units and the IT community" in regards to business continuity. The two sides are not always on the same page because "IT might not understand what's going on the business side." DeMarco asserts that business units must set which are the business-critical objectives and applications within the company and IT should make sure systems procedures meet those priorities. If any of these objectives are not clear, DeMarco says it's IT's duty to engage the senior management staff to learn those priorities. Which business processes matter most? What should the level of redundancy be? Should servers be load-balanced to assure optimal performance for end users?
IBM classifies two objectives that management must consider in determining their business-continuity tolerance, Recovery Time Objective (RTO) and Recovery Point Objective (RPO). RTO is the one that readily comes to mind for recoveryhow soon must the business be up and running following an outage? Management must determine a time, whether it's an aggressive 48 hours or longer so IT can set up its procedures accordingly. RPO, on the other hand, is something that people tend to neglect, says DeMarco. How much data can the business afford to lose in the interim between an outage and recovery? How fresh must the data you're recovering be? If you're only as good as your most recent backup, how valuable is that data when it's one or more days old? With synchronous mirroring, for example, a financial company could recover its data with only one transaction missed.
"When IT is allotted its budget each year, it must take business continuance expenditures into account and leverage these costs to identify and address the most important risks to the enterprise," says DeMarco.
While research firms posit the percentages of IT budgets for business continuity, IBM holds no opinion on the acceptable amount of risk a company should have. "The only real honest answer is 'it depends'," says DeMarco. He explains that even with two companies that are identical in size and revenue, it's not safe to assume they'd have the same business continuance plan. There's the human elementdifferent people accept different levels of risk, just like two people driving the same type of car can carry different insurance deductibles.
|IT must know the business-critical priorities of their companies and apply them to the technology and application recovery priorities within their own units.
According to DeMarco, companies have three ways to dealing with risk: ignore it, accept it, or transfer it to a third party. It's up to upper management to decide which option to chose but they must know of the risk. Ignoring it means it's an acceptable risk that the company is willing to tolerate. Accepting it is acknowledging the risk and putting procedures in place to address it. The transferring it option is outsourcing a business continuity provider like Sungard, Hewlett-Packard, or IBM to fully design, maintain, and manage the recovery services for you. The role of any business continuity professional, which a company may choose to employ, is to determine how important information is to the company and coach the company accordingly as to the acceptable level of risk.
While the concept of IT recovery may conger up images of hurricane damage or terrorist attacks, DeMarco says another risk, performance degradation, is as equally challenging to understand and manage as a complete outage. Citing a company phrase, 'two clicks and you're fired', he explains that a user on the Web clicks once in a site's search engine and with the next click the user has replaced that site if its performance isn't robust enough.