Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Disaster Recovery Plan for Management

Status
Not open for further replies.

ManagerJay

IS-IT--Management
Jul 24, 2000
302
US
The organization I work for has finally decided to document business continuity plans. I have good portion of the IT sutff documented, but was thrown a curve Friday.

Friday, I was told I need to rewrite the procedures so other people can perform the recovery. So, I asked the question, "Do you mean so a technical consultant can perform the recovery?"

I was told, "No. If you are hit by a bus and killed tomorrow, I want documentation written so any member of the management team can pickup the documentation and perform the recovery, and keep the network and all of the systems running."

The management team consists of three ex-teachers, a paralegal, a lawyer, and a "member benefits" person. Nonehave any technical experience. And, frankly, most have problems burning CDs using Windows XP.

My attempts to convince management that maintaining a network and all of the equipment is not as simple as it seems have failed.

So, the question becomes, how is dosumentation like this written?

Would it be best for me to buy a FreeBSD for Dummies, a Windows 2003 Server for Dummies and an Exchange 2003 Server for Dummines book to start the documentation, and then go on from there?

Thanks,



Jay
 
Jay,

as is usual with managers who think I.T. is simple, yours have made a simple error. From the assigned task description, they obviously think that backup & Recovery = disaster recovery = business continuity, whereas they are three distinct activities which are interrelated.

For example, if a db dies and needs to be restored and recovered this is a backup issue.

If a storm blows the roof off the server room, and it's being filled with rainwater, this is a disaster recovery situation. Having all the backups in the world will not help in this circumstance.

Making sure that customers can still get limited assistance from a one-server minimum capability standby system is a business continuity issue, and nothing to do with disaster or backups. For example, you might plan to go to the wimpy server once a month, during an off-peak time, to allow routine maintenance on your server farm. Business would continue uninterrupted, but with degraded performance.

Business continuity also has a larger remit, whereby managers need to do some strategic thinking about how they could ensure continuity, if a disaster does strike. This is more relevant to 'continuous process' businesses.

Please, please, please let your managers know what they're failing to consider here. Now, to answer what I believe is the real question:-

I suspect that your managers want the ability to press a button and have a restore from backup occur automatically. This is a good idea, since if you do encounter a bright red London double-decker, they'll probably be needing such a facility.

They have asked an impossible task, unless they are willing to accept that you can program an orderly shutdown and reboot of the entire system. How can software possibly help if a router fails? If captain sensible puts his JCB blade through the main power cable in the street, what can you do? You need to push back at them and offer the ability to automatically recover your major systems either individually or collectively, and stipulate what you cannot guard against(in terms of hardware failure and/or circumstances beyond your control). Also, once they have agreed what to do, the only kind of system worth having is one where at least once a month, one of the ex-teachers or paralegals 'hits the big bad button' and it all kicks into action.

Yours at somewhat heated length

Tharg

Grinding away at things Oracular
 
To satisfy the requirement, I would just document the basics of what you would do for the different situations. If your Management staff has any common sense, they would understand that they may need to bring in additional help.

Something else to possibly try is to walk your Manager through the process and explain the importance of what is being done. This may jar up something inside and they will begin to see the light.

Just my opinion,
Chad
 
Jay,

I just went through this last year. The exact same scenario. It's really not as hard as you think!

The first thing I did was use one of our old servers as a test server. Then I crashed it, killed it, wiped it out. (OK, so that was a little more fun than I thought it would be, but anyway...)

When I went through my restore procedures, I created screenshots and jotted down notes. I create all documentation at work as if I were making it for my husband (he plays games online and thinks that AV software is a virus). I created a fantastic step by step computer recovery procedures manual. (Note that Step One is DON'T PANIC.) I also run through these procedures at least every 6 months, step by step.

I have also put a large label on the front of my manual that states that only computer recovery procedures contained within it, and that it does not include any type of business disaster recovery (that's something completely different for another thread).

Then, I asked which manager was going to be the first one to test the "bus" theory. I've sat with two of them and they have both been able to follow my "for dummies" book.

It's not hard, but it will take a while to build it!
 
So here is your opportunity - make sure they buy you a practice suite for the non technical people to practice on!

Seriously, Dolly has a good approach. I would not do this until they agreed that once a month the people who would be expected to do this, sit down with you observing only and actually perform a full recovery from one or more specific problems. Explain to them that the time that you need the recovery is usually a critial stress period and that no one should be expected to perform it without learning in a less stressful environment.

Make sure your step by step book has the emails and phone numbers of all the support people for all the equipment that you have and possibly the phone numbers for some good techincal consultants as well. That way if you are hit by a bus just as the network dies and they can't follow the steps, at least they have the contact numbers they need.

Passwords they need should be sealed in an envelope and locked in a safe or sturdy cabinet. Change them after practice sessions that use them. Make sure you change the sealed passwords whenever you change a password or they will do no good.

Questions about posting. See faq183-874
Click here to help with Hurricane Relief
 
And look on the bright side. This is a no-lose scenario. If your management team ever need to carry out the procedure, presumably it's because you won't be there to worry about it all going wrong. In the lucky event that you are in a position to come back a week or two afterwards, and it all went bad, hopefully they'll have realised that even with clear instructions, it's still hard, and therefore your skills are valuable skills.

I really like Dolly's approach.

 
Let me know if any help is needed ManagerJay, I'd be happy to send over some examples.
 
Some examples would be greatly appreciated. I have only written for technical people in the past and have not had to write any documentation of this depth for a non-technical person.

Thanks,


Jay
 
The great part is that you really don't need to go terribly in-depth for this. Simple instructions to go with screenshots are best.

Send me a note to my posting name with an F at the end, yahoo.com, and I'll send some docs over!
 
I second Dollie's idea.

Personally, I haven't really thought much in your direction Jay, only because the places I have worked have been small businesses where I'm the only IT (as I'm sure many here are).

Part of D/R is to think of the absolute worst case scenerio and that the company needs to survive...and your dead or someplace where there is no cell phone reception and the only survivor of your company is one of the people you mentioned above....
 
'The management team consists of three ex-teachers, a paralegal, a lawyer, and a "member benefits" person.'

i am not qualified to do your job, but i would think it is as easy to write this plan as it would be for them to do the same for not missing a step in all of thier duties if they were hit by a bus. allowing in the procedures, that a non-lawyer, paralegal, member benefits person was going to use the plan to do their jobs.

my other suggestions would come from teaching bilingual persons who were not literate in any written language to perform warehouse management software interfacing, on a fork lift, wireless as400 terminal, in a paperless warehouse process management system.

i used a software called screen cam. if not familiar, it records the monitor output for playback. i then turned it on, went through the individual processes, and turned it off. i played the files on a projector using the app which is placed in the recorded file for playback only. screencam is no longer available, but there is probably an equivalent somewhere.

here is something similiar

there is a shareware version for eval.

or do a search for screen cam in a search engine.

hope this helps. might be easier to record the process on screen with a narration. suggest starting out with' if you are listening to this, i am probably under a bus, please follow along closely. for additional technical support there is a ouiji board under my desk'
or,
'your mission, should you choose to accept is to do the job which probably drove me to my demise by throwing myself, i mean falling in front of a bus. for your own safety, please do not go anywhere near a bus during this proccess'.

hope you do not mind the lame attempt at humor, it is the price of this free advice, and probably of equal value.

i do ip telephony systems, not network admin, etc., but do understand that users on the network are about the same level of tech savy as your typical voicemail user who has not recorded their name or greeting in their voicemail box until after they have to pay for the support to help because the years before this exceed the warranty period. for the record, they record these during training before the phone system goes live in my project timelines.
 
New initiative at my hospital...
Document every change to a live system. Also document steps to go back in case of failure.

This all occurred after a phone system problem caused by a vendor. Management didn't see the 10000 things that went into live over the last year with not 1 issue.

The management has to approve the change before it goes in place. The problem I have is that Management doesn't understand anything about what I do, so their approval is pointless.

At this point in my career, it would take 2+ years (full time) to document every system/application I support. Even at a technical level.

There's a reason management are management. They can not and should not touch anything technical. Leave that to the professionals they hired. If the bus comes...find a new geek!!!

Mark
 
Mark,

There is a reason management is management. They are supposed to help the owners maintain the bottom line. We 'geeks' see things at the system level. If system A goes down, it will stop x users from doing their job. Management sees that if system A goes down, x users can't do their job and the company loses $$$ money and maybe clients.

A hospital can be worse....what do your systems do? Do they keep track of a patient's medications? Don't you think that's important to have properly documented and have formal change management for?

-SQLBill

Posting advice: FAQ481-4875
 
hey, job security in that mandate, smile and be happy. the documentation mandate just added a lot of time to each of the system wide things you do. they have to, or will soon realize this as it will be apparent in times to complete. you will now need more lead time just to write up the downgrade plan, before upgrading. it sounds to me like they are recognizing the seriousness of what you do, and understanding the mission critical role you fill. this shows they see your role, and your technologies role as more important than ever before. disater recovery planning, etc. may now get the resources it deserves.

one thing, if you are salaried do not do it out of regular business hours. if not, then you may be in for some ot more frequently. you may not want the ot hours, but the money is ok, even if it does not make up for not having a life.

the other thing is this, you have to submit your plans for approval now, so they have to turn them around to you, and sign off on them. that takes more responsibility on them, and that is not always a bad thing.

i feel sorry for you if you are salaried though. this will almost assuredly mean you will get approvals so late that you should push the changes back a day, but they will say just stay and do them late. i would not do the salary thing for less than a 40 percent increase over my current rate of annual pay, including my ot. even then , i would be reluctant. i am a phone guy that does ip telephony systems myself. i have considered getting some certifications for computer proffessionals, but it seems a better plan not to, so i can not be classified out of my ot as easily. if all my certifications are phone system, and phone system related, it may be harder to classify me as a computer proffessional.



 
SQLBill,
As I agree with a certain level of change management, my situation is somewhat unusual. (I believe).

I'm 33 years old and have worked at the same hospital for 15 years. Started in XRay at age 17 transporting patients. Worked up to Lead Systems Analyst. I don't think anyone would question my loyalty to that hospital. I think more about patient safety, users' ability to connect and do their jobs, and the bottom line more so than some management. My only point was that, in my situation, management is not technical enough to make decisions on technical matters. They need to step out of the way and let the geeks do what's right.

A few years ago, management started a technical "Change Management" process in which I took part. Myself and another analyst made it about 2 meetings, then we were never asked back. It seemed that management did not like to be corrected when they misspoke. We would correct them in those cases.

Since then, I let them do their management things and I continue to keep everything running. I make improvements as needed. All are fully tested and backups occur as needed. Until now, they've kept their noses out of that side of things and nothing has gone wrong. Not I am expected to add another unnecessary layer of paperwork to the process. A TPS report if you will.

Here is a copy of the document they receive from me...
1. backup
2. do my work
3. backup
4. If there's a problem...restore from #1.

Any more detail will simply confuse them.

Mark
 
True, more information may confuse management. But the poor sob they bring to fix the problem after your gone will like not having to reinvent the wheel.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top