Business Continuity (Part 3 - final)

Posted by: Tony Peake in IT on  

Tony Peake

This is the third and final discussion point on Business Continuity and discusses simple plans to ‘work around' IT system failures.


As a refresher - we set out to develop some basic plans in advance of any mishap to help us keep the business going in the event that we had a prolonged outage of our communications (phone / fax Internet) or of our IT systems.


Let's assume our IT Systems (in this case our IT server) has failed and we have determined that it may not be available for at least two day. The reasons for this may be a hardware component failure such as a main circuit board or our data has become corrupted or perhaps something else. In any case the rescue operation must be undertaken by IT experts and we need to focus on keeping the business running.

 

Remember from the earlier articles that the plan we develop must be kept on paper and preferably ensure that we have a copy stored safely at home as well as one at work (electronic copies may not be accessible in the event of a computer failure).

 

As we have covered the loss of our communication systems in the earlier discussions, we will start with documenting our second plan - loss of IT systems.

 

Plan 2 - loss of IT Systems


Individual PC's or notebooks
We need to ensure that if we lose access to an individual PC (electrical damage, disk corruption, major virus, theft), our business can continue to function as normal, although we could work around the loss for a day or thereabouts.

Key preventative steps are :-


1.We keep all software disks and codes in a central storage area (ideally offsite) and carefully label them with the PC name on which they are installed.


2.We have a file server for storing files. We do not allow staff to store any business data on their PC's - all data is on the server where it is backed up to tape for retrieval in the case of a mishap.


3.We have a priority arrangement with our IT provider to ensure that they can rapidly recover our PC either by fixing the damaged unit or replacing it AND setting the new / fixed unit up in the exact fashion (same programs etc.) as it was beforehand.

If we have these steps in place and we know the main function of the PC, we should be able to work around it for a day or even two. Problem areas may be if it is the key PC to do accounts or payroll. In this case we can either ask our IT provider to :-


a)Put the key application/s on another PC until we have the original one back


b)Provide us with a loan unit and load the applications on that


c)Provide us with a new unit and build that with the same functionality as the damaged one.


There are other opportunities such as having ‘hot standby PC's' or stored images of the PC, but all of them involve additional spending for a generally low frequency situation.

Server


The loss of a PC can be little more than a minor inconvenience. The loss of a server could be business threatening and we MUST have a work around plan.

Again, we will assume that this is not a bank or a multi-national with millions invested in IT. We are conscious of the risk, but have minimal budget and we are prepared to perhaps lose a little of our most recent data and have our systems unavailable for up to a day. Firstly we will address the preparation we need in ‘preventative steps' then the plan to work around this loss.

Key preventative steps are :-


1.Like with the PC's, we will keep all software disks and authorisation codes in a central storage area (ideally offsite) and carefully label them with the server name on which they are installed. Do not leave the disks with the IT provider as when you change providers it is unlikely that the disks will be located and handed over.


2.We will religiously write down our server passwords and store them in a safe place - off premises or in a safe. We will update them every time we change them. We will record ALL server passwords - not just the Administrator password, but passwords to all key applications and utilities such as backup, anti-virus, ISP, databases, etc. There could be up to 10 or 12 of these - all may be required to re-establish our server.


3.As important as locking the doors when we leave at night, is the need to have adequate backup. More of this in a later article, but NEVER leave backup tapes at the work premises - the only one there should be in the server to copy the data overnight. Make sure that you have backups that can go back for at least a week - sometimes massive data corruptions can go on for days (even weeks) without being discovered so it is important that we can reconstruct our data from a starting point of a ‘clean' backup.


4.Like all things of this nature, documentation is the key to survival. There is no point trying to remember key points of information when a crisis has hit. Write down all applications that are on the server, write down who you can contact to help you (IT company, application company, Internet provider, etc.). It is no use only having this filed electronically as it may not be available if you have no server.


5.If funding allows, look at building in a certain amount of redundancy into the server. The main component failure will be the Power Supply which take 240 volt electricity and converts it into very low voltages which the computer needs. Many systems can have a redundant Power Supply which can take over if the main unit fails (sometimes seamlessly). The next most likely failure is the hard disk/s. Speak to your IT supplier regarding disk redundancy (known as RAID). For as little as a $200 it may be possible to have a duplicate disk installed so if the main one fails you have an instant copy available. There are more sophisticated systems and they are generally available for under $1,000 and have outstanind data redundancy capabilities.


6.Have a plan in place to work through this crisis. Write it down and keep a copy away from the office so you can always retrieve it. We'll discuss some possible plan scenarios in the following section.

Possible scenarios :-


a)If you simply have a data corruption issue there are some simple, but vital steps to take :-
Shut down the server before it becomes worse
Call your IT service provider for expert guidance
Fetch the data backup tapes from the off-site storage


b)Keep the old server which you replaced. It may be a simple matter to reinstate it to perform one or two key tasks (albeit it slower).


c)If you have a second server doing specific tasks, speak to your IT provider re ‘virtualisation'. This concept should allow you to load the functionality of the corrupt server onto another server and continue with key functions with minimal disruption, but perhaps much reduced performance.


d)Ask your IT provider to take a "snapshot" of your server after every major update - perhaps twice a year. This should allow them to load your configuration onto another server in minimal time. Remember that keeping a copy of the actual data (as distinct from the server's programs and settings) is your responsibility - not that of your IT provider. Also note that re- loading the data from your backups will take a number of hours, so even the most minor of failures (say to a data disk) can take up to a day to rectify.

Summary


This plan is an outline only and there are details which need to be discussed with you IT and key application suppliers to ensure that things run smoothly when they are needed.

Just remember that having an IT failure is inevitable, but having a plan will generally save the day !

Comments (0)Add Comment

Write comment
You must be logged in to post a comment. Please register if you do not have an account yet. Just enter your *Email in the green box* to your right to start the registration process.

busy