Sunday, March 16, 2008

Business Continuity and Disaster Recovery Planning [3] Chief Security Officer and DRP

Business continuity and disaster recovery planning is a tedious company-wide effort. For it to succeed, it needs a sponsor at the executive level. That person is usually the chief security officer (CSO). The responsibilities of a CSO are:
1. Oversee the plan
2. Provide input and support
3. Put plan into action during emergency.

A chief security officer needs to address the need for disaster recovery through analysis and documentation of the potential financial losses. They should work with company’s legal and financial departments to document the total losses per day that the company would face if the company was not capable of quick recovery. By thoroughly reviewing company’s business continuance and disaster recovery plans, the gaps that may lead to a successful recovery may be identified.


Case Study

A good example of the importance of disaster recovery planning is illustrated by the case of Northrop Grumman during Hurricane Katrina in late August to early September 2005. Northrop Grumman, a major defense contractor with annual sales of $30 billion, is based in Los Angeles, CA. Their Ship Systems division, however, is based in Pascagoula, MISS and was directly hit by Hurricane Katrina. The 13,500 employees based there did not have a means of communicating with their company because power and utility infrastructures were heavily damaged. Northrop Grumman’s corporate buildings in Pascagoula were flooded with water, rending most of the office space unusable. Their data center, which housed some 300 servers and associated networking and storage equipment, was completely destroyed.

Here is where Northrop Grumman committed a major mistake. They had a backup site for their Pascagoula IT operations, but it was located only 100 miles away in Avendale, LA. Since the backup site was also in the path of the same hurricane, it was rendered useless as well. With both the main and backup IT operations down, it took the Ship Systems division several weeks to recover. Had they located the backup site further away, Northrop Grumman could have recovered their IT operations in a matter of hours. This is a classic mistake many firms make. In the effort to save time and money, organizations tend to favor close proximity when picking locations for backup sites. But natural disasters like hurricanes often span hundreds of miles. So the strategy of maintaining a nearby backup location may indeed save money in terms of faster data replication and cheaper equipment transport, but it also exposes the organization to the danger of having no safety net at all ! In this case, there’s no information on the long-term damage to Northrop Grumman’s Ship System division. But in the worst case scenario, if important data involving major defense contracts were permanently lost, the damage could have totaled several hundred million dollars! Even if they managed to fully recover, the news of such bad planning results in negative publicity that could undermine customer confidence.

One thing Northrop Grumman did do right was their coordination of workers after the disaster. The hurricane caused many employees to not only lose their workplace but also their homes, family, and livelihood. In response, the company distributed food, water, and clothes to help their workers get back on their feet. These efforts were essential because the psychological damage of a disaster can be tremendous on employees. Devastated employees are not productive, and a company’s success depends on the productivity of their employees. Therefore, when designing a disaster recovery plan, firms must always remember the people factor. Human beings are not like machines, where a damaged one can be swapped by a new one. Human beings are emotionally and psychologically fragile; so firms must help their people to recover if they’d like the operations to recover. Overall, Northrop Grumman learned four key “survival tips” from this experience:
• Plot out backup sites.
• Connect with suppliers.
• Find your people.
• Pick a point person.

But the most important lesson: despite all the planning and simulation for potential disasters, an actual disaster will still expose weaknesses in your disaster recovery efforts. In Northrop Grumman’s case, they had to throw out a lot of the original DR plans because it didn’t address the needs of the Hurricane Katrina disaster. Luckily, they managed to get three of the four survival tips right during the recovery process. That led to their Pascagoula operations ramping back up to full speed within six weeks.


Globalization and Outsourcing

Globalization and outsourcing has further heightened the need for good disaster recovery and business continuity planning. In the manufacturing sector, a number of firms have suppliers and subcontractors in China. In the services sector, many firms have direct or outsourced operations in India. Yet, both China and India are developing nations that present numerous risks. In China, for example, the SARS epidemic of 2002-2003 caused a nationwide panic and discouraged people from traveling to and within China. More recently, toy manufacturers have had to recall products tainted by lead paint when their Chinese suppliers failed to adequately safeguard manufacturing standards. In India, flooding in the city of Mumbai during the summer of 2005 caused a slowdown to some outsourced IT and call center operations. The following year, the same city experienced a terrorist bombing that killed at least 170 people. These examples illustrate the dangers of maintaining operations in foreign destinations that do not have the stability of government, law, infrastructure, and society as a nation like the United States. But because of cost advantages, talent shortages and business opportunities, firms in developed nations continue to do significant business in riskier destinations. So what strategies can companies employ to mitigate their risk in these locations?

Again, the answer lies in good DR and BC planning. The most fundamental concept in business continuity planning is to not place all your eggs in one basket i.e. always have an alternative plan ready. In the case of slowdowns in Mumbai’s IT operations, a good firm should have had operations in other locations that could temporarily take over the responsibilities of the Mumbai site. In fact, some companies that had call centers in India rerouted customers to their Philippines operations as a temporary measure. But sometimes it’s not possible to escape the dangers of a disaster, and finding an alternative solution is infeasible. In these situations, companies should consider insuring against disasters. Similar to a homeowner buying homeowners insurance to guard against risk of fire or burglary, organizations often have the option of purchasing insurance to hedge against the risk of disaster. Thus, even if a disaster takes place, at least the firm will gain some monetary compensation to offset the costs of the disaster. Technology also plays a major role in mitigating the effects of disasters. For instance, since the SARS epidemic caused a fear of traveling to China, video conferencing technology could have solved part of that problem. Video conferencing allows people in remote sites to communicate with one another and is a good alternative to face-to-face meetings. Likewise, VPN technology allows people to connect to their workplace from outside locations such as a worker’s home. So if a worker can’t physically be at the office for whatever reason, the worker can always log on from home and continue to do work. These are just some of the examples of how corporations can deal with disasters in risky overseas operations.


Conclusion

In the risky world we live in, disasters will always occur. Whether they are man-made (e.g. terrorism) or natural (e.g. earthquakes), disasters are a fact of life that we can never avoid. And with the movement towards globalization and outsourcing, disasters more even likely to occur in the future. The best thing to do is to acknowledge their existence and try our best to prevent, deal with, and recover from disasters. To achieve this, organizations must first do a business impact analysis to identify their critical assets. Then they should create a detailed disaster recovery plan to deal with the dangers of disasters. It’s also important to test and simulate disaster situations because no amount of pre-planning can realistically prepare for the actual event itself. When a real disaster does take place, organizations should learn from the experience and update their DR plans appropriately. This way, past mistakes are not repeated in future disasters. By combining all these techniques, corporations will be in good shape to combat almost any type of disaster – human, technological, or natural.



To be continue, please catch other parts of this posting series....
------------------------------
You can read other parts of this posting list from the list below.

This posting series provide information about Business Continuity and Disaster Recovery Planning. It is included the BCM Model, Business Impact Analysis and a lot of idea on Disaster Recovery Planning (DRP) that are useful for chief security officer (CSO)
Please note that : This posting is copied from a report of the software security class that I attened at San Jose State University in Fall 2007

0 comments: