BCP, or Business Continuity Planning for long, is the art practiced to insure that when disaster strikes, your business doesn’t crumble. It can, and should include such things as site evacuation plans, key communication information, details on business processes you or others may have to follow from scratch should the worst happen, any dependencies for the smooth operation of those business processes, recovery time frame requirements, and, of course, information on key computing technology configurations to make recovery of required services possible…if not easy. While all of these things are important aspects of BCP, and deserve your focus and attention when putting together a comprehensive plan, I would like to spend some time talking specifically about business continuity and disaster recovery for your computing infrastructure, and what you absolutely MUST have squirreled away somewhere safe to reduce headaches and downtime when tornadoes land, the ground quakes, or hard drives fail.
While this information is probably most useful for those of you operating your own small or medium-sized business, it is equally useful for home or larger organization practice. This is not meant to be a comprehensive list of stuff, but contains general tips for reducing recovery time and consultants’ bills when having to “recover blind” after disaster strikes. BCPs are very organization-specific, and while best practices exist, only you, and your business stakeholders can decide what level of risk management is best for your situation. If you are looking into BCP for regulatory compliance, you may want to consult with a planning specialist to insure success.
Systems and Network Audit
I always get a fair number of raised eyebrows when I hit upon auditing the existing infrastructure before talking about things like backups, and documenting configurations, but when you think about it, it makes perfect sense. You need to know what you have before you can know what you need to backup or document. The audit is the foundation on which all the rest of your BCP building blocks will rest. Often this can be sort of like finding money in the pocket of a jacket you hardly ever wear: you may discover resources you never knew, or forgot that you had.
Key things of which to take special note are:
- Brand, model, serial number, and warranty/service/support information for ANY networking equipment (DSL modems, firewall/gateway devices, switches/hubs, computers, printers, scanners, etc.).
- Network configuration details, including Internet Protocol (IP) addresses assigned to your organization by your ISP, any username and password requirements (like for PPPoE) to connect to the Internet via your ISP, firewall rules defined to expose services to the Internet, firewall rules defined to prevent internal machines from connecting to services on the Internet (like blocking access to multimedia content, etc.), remote access setups, wireless networking details (including WPA/WEP encryption pass-phrases or keys), and other local LAN IP information.
- Computer configuration details, including the operating system installed (with as much detail as possible…discerning the distinction between Windows XP Home and Windows XP Pro, Windows 2000 Pro and Windows 2000 Server, etc.), hard drive space (total and used), system memory, installed applications, running services (like a web server, file sharing, printer serving, etc.), and any processes and users that rely on the machine (with why and how).
- Any information or documentation you may still have about your computing infrastructure from work performed on your behalf or in partnership with you by consultants, contractors, or vendors.
Once you have this audit information, you will find you know a lot more about your computing uses and needs, and are well on your way to meeting the technology infrastructure requirements of your BCP.
Backup of Key Data
Many smaller organizations fall squarely into one of two categories of current backup practice. Those that don’t backup at all, and those that backup everything. To those that don’t backup at all, I wish you well. Can I get you my rate sheet? The other extreme, backing up everything on every computer, including operating systems, applications, and other data that can be easily regenerated, is almost as bad, as it wastes valuable resources (time, money, and patience). While data storage media is much less expensive than it once was, more often than not, this over-zealousness results in people that eventually decide the trouble isn’t worth the pain, and they become the type that don’t backup at all. Sad, but true.
Now, what about the ‘tweens? Well, in between backing-up nothing and everything, I have seen two other strategic approaches that warrant some caution. The first is used by those that have a well-established backup plan, but it relies on human interaction, and is an often forgotten (or put-off) task which results in out-of-date backups should recovery become necessary. The second also has a well-established backup plan, but the backup task is machine automated to occur regularly. The only problem is that, as the organization grows, and data gets spread out or machines are added, the automated backup is not updated to include the changes. This results in an unexpected inability to recover critical information in a disaster.
The best plans are automated, and regularly audited. In addition, introducing some change control processes to insure that changes in data storage or usage are immediately captured in the backup process can be helpful as well. Obviously, this can be handled in various ways, the best one being determined by the organization workflows and its size.
What exactly should be backed-up? The simplest answer to this is a bit convoluted, but bear with me…any data generated by you (think documents of various sorts), data generated for you (think application configuration information containing preferences or settings that cannot be easily regenerated), messaging data (think email and IM logs/records), and data contained in databases or binary application formats (not the application or database software itself). Did you catch the theme? Data. Your data needs to be backed up. Your data are the only electronic assets you have that cannot be easily replaced by re-installation of your operating system and applications. Drives dedicated specifically to data, and the user directories contained in C:\Documents and Settings\ are a good place to start looking for these valuable resources.
Other stuff you may want to backup: You can generate an Emergency Repair Disk for most Windows operating systems, and you can optionally create a copy of the registry with this process. This can be of value when restoring a machine back to a previous known-good state, including the local machine user account information. The ERD disk must be updated anytime significant changes are made to the machine, including the installation of software, patches, and security updates, as well as the addition of new hardware.
There are exceptions to the “only data” backup rule. If you no longer have easy access to the installation files of an application you may need to re-install because the CD is lost or damaged, or it was an electronic download no longer available, you may want to backup the application and all its associated run-time files (possibly including registry keys required for proper operation).
Backing-up an entire machine, operating system, applications, and all, using standard file manipulation tools tools is not practical. You may, however, still wish to periodically “image” you hard drive, capturing all information stored on it, including operating system, applications, and configuration data, using a utility like Norton Ghost for faster disaster recovery. This software creates an “instance-in-time” copy of the hard drive in a highly compressed file format, allowing you to save the time it takes for the re-installation of the operating system and applications.
Cautions about open files and running databases: If a file is in the process of being edited when a backup occurs, it may not be backed-up properly, or with the most current data. Also, to backup a running database (a “hot” backup) requires special software or configuration consideration. “Cold” backups of the database data files themselves can take place only when the database software is completely shutdown. This also applies to some binary file formats used by applications like Quickbooks and FileMaker Pro.
Tools for successful backups: With the pervasiveness of CD/DVD writers in most modern machines, using CD/DVD-R/RW for backups is a natural leap. Remember, though, our goal is automation. Maintaining backups on CD/DVD will require human involvement, which brings with it human errors. I prefer to use CD/DVD technologies for archiving data no longer kept on the “live” file system (to make space without buying new or larger hard drives). For disaster recovery backups, I tend to opt for external, USB 2.0 connected drive enclosures containing a hard drive large enough for my needs, or a dedicated backup server easily accessible over the local network or the Internet using an encrypted channel. In all cases, I encrypt the backups to insure that, if the drives/files fall into the wrong hands somehow, the data they contain is useless without the decryption key (a topic of a future blog entry, perhaps).
Location, Location, Location! Storing your hard-earned backups in a safe place that will be accessible in the event of disaster is just as important as creating the backups. This usually implies you have off-site arrangements for storing your most recent backup sets (either at home across town, or at a commercial storage facility). In the server scenario, a perfect example is one of my colleagues. He has a server at his house, and a server at his brother’s house. They each perform backups of their own machines and store them on the server located at the other’s house. Pretty smart. The idea is to make sure your backup data isn’t destroyed by the same disaster that destroyed your working copies. This can be done as easily as keeping the external drive enclosure for backups in a fire safe when not in use, rotating hard drives, or through the more exotic means already mentioned. This all presumes you are not ready to take the leap to the more expensive proposition offered by AIT or DLT tape backups.
Also, it is useful to store all of your application media, BCP documentation, and backups (at least copies) all in one secure, accessible place. This way, your entire plan and the tools needed to execute it are available all in one location. In my experience, it has always been good to have at least two copies of everything required. One on-site, and one off-site.
Documenting Your Recovery Process
Just about everyone first approaches BCP and disaster recovery thinking they need to account for every possible disaster scenario that could come up. While it can be worthwhile taking a closer look at some “what-if’s”, having a general plan that you can execute and adapt quickly is usually of most practical use. The “biggies” are who, what, when, where, how, and sometimes why.
The who assigns responsibilities for the processes surrounding the various tasks in your BCP, and also establishes communication and authorization chains. In a small business, the who may be you. Stating it formally in BCP documentation allows the process to grow with your business without significant redesign later.
The what will define the actions taken, be it phoning vendors for replacement hardware or support, installing software, or restoring data. Each system and/or process will have its own set of whats that will be required for success.
The when defines the criteria for plan execution at the various levels. This is important to insure proper resource allocation during a crisis, as well as in making sure business processes and computing services are restored in proper order to meet recovery requirements and inter-service dependencies.
The where aspect is really only important if your recovery requirements dictate recovery from a natural disaster, fire, flood, or other catastrophe at a secondary site due the primary site’s unavailability. Having an action plan for off-site recovery, and agreements already in-place for location, equipment, and network bandwidth needs can be important aspects of the BCP.
The how documents in detail the steps one must go through for recovery. This type of process documentation should be written with the assumption the person that wrote it will not be available, and that a relative novice must perform the steps to recover the business systems. Drawings, screen captures, and very clear prose are critical to success. Every system captured in your audit should have a place here indicating a path to full enterprise recovery.
The sometimes why is a little funny, Ha, Ha. I used to work with a group of guys that swore by the “fail forward” method of system upgrades. In cases where it was nearly impossible to get money and authority to upgrade systems or software, a sure way to get management on the same page with IT was to force a failure to get a “clean slate”.
“Since the system failed, and has to be rebuilt anyway, why don’t we just…”.
You get the idea. Sometimes reevaluating what is most important in the face of a crisis can be valuable, and may force the re-prioritization of deployments previously thought to be set in stone. Of course, doing this as part of the master plan is far better than doing it as a form of management manipulation.
Testing. One, Two…
OK. You’ve got your processes documented. You’re ready to go. Now what? You need to verify that what you have written down, and the resources you’ve collected actually work to recover your systems. This is usually a process best left to when you’ve forgotten what you’ve written, or when you have an unsuspecting friend or employee available to run through the stuff without your active input.
Auditors tend to recommend that businesses go through the recovery of one critical system chain (group of systems with inter-dependencies) at least once a year, documenting the recovery process, and introducing corrections and refinements based on the experience. This can be tough without spare systems, so contracting with third parties to provide assistance can be well worthwhile.
Round, and Round We Go
Most business processes are cyclical (think payroll and taxes…yuck). Just because you finish them in one cycle doesn’t mean you’ve finished them for all cycles. BCP is no different. Once you’ve got a decent plan for business continuity and disaster recovery, you must continue to revisit it, refining it, and keeping it up-to-date with the current state of your business. A stale BCP is a BCP that will fail when you need it most. Take that to heart, and reevaluate the strength of your BCP at least once a year. This can be incorporated with your plan testing and the company barbecue to make one rip-roaring annual party.
Remember, the disaster for which you plan may be your own…