Cool Tools With Michael Tamburo
Can Your Firm Survive A System Meltdown?
Making weekly network backups is hardly a foolproof recovery plan. Know what you can do to be up and running ASAP if your system goes down.
Your firm’s ability to survive a computer disaster is directly related to how dependent you are on your computers and your data. If you’re a small or midsize firm, chances are that you depend on your system pretty heavily. Have you thought about what would happen if your network went down for a few days, or if some or all of your databases vanished? Imagine the dollars per hour in lost revenues and the potential damage to client relations. Now, think about what you can do to prevent such consequences. Proper backups, fault-tolerant computers, disk images and replacement servers are among your best safety tools. Be prepared.
Data backups are essential to any disaster prevention program. Typically, you want to back up file servers to tape on a daily and a monthly basis. The daily tapes (one for each day of the month) are used over and over from month to month. The monthly tapes (one for each month of the year) are used once and stored off-site in a safety deposit box or other secure location. You want to guarantee that copies of your data are available, regardless of what happens at the office.
While most firms already run tape backups, very few of them verify that their backups are actually good. The easiest way to verify a backup is to try restoring from it. I was called into a company to restore its data after a disaster, only to find that its backups had not been running properly for months. The company’s consultant never bothered to verify that the backups were any good. As a result, we had to restore from a tape that was months old.
That leads to the point that some firms only run a backup once a week, or even less frequently. This might be fine if your data never changed–but that’s doubtful. When you need to restore data from tape, if your most recent backup is days or weeks old, then it will be difficult to determine what you are in fact restoring. For example, one of my clients had a problem with a corrupt accounting database. They were running fine the previous day and knew exactly where they were at the end of the day with their various accounting entries. The client had a backup from the previous night from which the data could be restored, so they were able to continue their work right where they left off. If their most recent backup had been days or weeks old, they would not have been able to determine what information needed to be reentered into the system to get up-to-date.
Although there are a number of fault-tolerant drive configurations, all operate in the same way: While the server is running, the data is being written to multiple drives simultaneously. If one drive fails, the remaining drive, or drives, can still be used. The term RAID (redundant array of inexpensive disks) describes the fault-tolerance characteristics of a computer or server. The most common types are RAID 1 and RAID 5.
RAID 1. RAID 1 uses two hard drives set up in a mirrored configuration. Any data changes are stored to both drives (hence the term mirror). If one drive fails, the other drive is used to operate the computer while the bad drive is replaced. There is, though, a drawback with some RAID 1 configurations. If the drive that fails is the one from which the computer boots, you have to make configuration changes to get the computer to boot from the remaining good drive.
RAID 5. RAID 5 uses three or more hard drives set up to form a striped set. Any data changes are written across all the drives. Additional information is also saved across the drives that allows any one of the drives to be rebuilt on-the-fly. So, should one of the drives fail, the computer will continue to operate as normal. The bad drive, which is usually a hot-swappable card, can be pulled out and replaced. In many cases, the computer doesn’t even need to be powered down and users never know there was a problem.
So, your backups are solid and you are running RAID 5 servers. You are in decent shape, but other things can and do go wrong. It is not a pleasant thought, but imagine you lose an entire server owing to fire, theft or water damage. What do you do if the upgrade that was just installed to the server did not work and now the server cannot even boot up? Some companies run totally redundant systems in off-site locations, so that in the event of such a failure, they can revert to the redundant system. This typically is too expensive for smaller firms.
A practical alternative is to periodically take complete images of your servers and store those images to CD or DVD. An image is a complete snapshot of a computer hard disk. This snapshot can be used to easily set up a replacement computer with the full operating system. Once you have the operating system up and running on the replacement computer, your tape drive will be functional and you can then restore your data from the most recent tape backup.
Some backups also save the full operating system, but the problem is that you cannot get to the tape if the tape drive is not up and running. Remember, the operating system is needed to get the computer to boot and to get devices attached to the computer up and running.
Note that you may also want to save disk images of key workstations, particularly ones with special configurations. Should the particular computer fail and need to be replaced, the image can be used to set up a new computer just like the one that failed.
Some companies, as mentioned, store a replacement server at an off-site location. The replacement server is configured just like the live server, but typically does not have the current data that is on the live server. The main point is that the replacement server has the exact same hardware configuration as the live server. If the live server fails, the replacement is brought in, updated using the latest image and tape backup, and then brought online.
If you do not have a replacement server on hand and your live server fails, you’ll need to quickly purchase a new server and get it up and running. Although you can use a disk image to set up the new server, it is easier to load an image to a computer with the same hardware configuration as the computer that was the source of the original image.
Get Up and On with It
No one wants to face a computer system meltdown, but if you’re prepared, you can recover from it and get back to business. Make sure that you run reliable backups regularly and store them in a secure off-site location. Take images of servers and key computers and save those images to CD or DVD, and keep those in a secure off-site location. If you’re not running fault-tolerant servers, consider upgrading–and make sure that new servers incorporate some degree of fault tolerance. Finally, consider investing in a replacement server. Compared to the costs of losing all your data, the investment is a meager one.
Michael Tamburo (firstname.lastname@example.org.) is President of ConexNet, Inc., a privately held Chicago-based company he co-founded to provide connectivity-based solutions. He specializes in the implementation of Microsoft-based hardware and software solutions.