Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Best practices for server maintainance

Status
Not open for further replies.

johnv20

Programmer
Sep 26, 2001
292
0
0
US
Hi,
does anybody out there have any thoughts on day to day maintainance & upkeep of dell servers ?
 
Just the usual stuff, checking event logs, backup logs, etc.
Are you looking more for hw, or sw maintenance?
 
Hi,
either HW or SW. I already have checking Raid, ESM, eventlogs etc. on the HW side and sanity checking and golden syncing on the s/w side. However any thoughts at all would be appreciated or even just a list of what you do in maintainance without having to bring a server down
 
You haven't mentioned your OS, so I'll assume NT4, it doesn't change much with Win2K....
Besides the SP's, and SRP's for the OS, there may on occasion be a need to upgrade BIOS. I'm of the school where you only do it if required, but if you call DELL for suppport, their first response, besides "reboot" will be to ask if you have the latest drivers installed, including BIOS, and all that stuff.
If NT4, depending on the traffic to the server, you can expect to have to reboot either every month, or every other month (you'll find out when you wait too long and it hangs!), so you can schedule time for the BIOS upgrades then.
If running apps and shares, you'll see the frequency go up.

With NT4, or even Win2K, bringing the server down is maintenance. They are not very reliable for cleaning up after errant processes, and a reboot is often the only way.
 
Thanks,
the bios upgrade is a very good idea as is the scheduled reboots. The only problem I can see with it is that we have a HUGE amount of servers with not a lot of failover allowed.

Incedentaly the OS's invloved are NT4, W2K & Netware 4.11 & 5
 
By failover, I'm assuming you mean downtime. Same here, it's impossible to get a chance to do downtime maint., but if you do run into problems with an unsched outage, then you'll have resolution a lot quicker with Dell if you're up to date. BTW, Dell doesn't care if your server is supposed to be 24x7, they'll want reboots anyways.
If your business doesn't want extended, or any downtime, then tell them to prove it by kicking in funds for redundancy/failover boxes, or even better, clusters.
 
When upgrading BIOS don't forget to do the ESD and PERC FW as well - actually you'll get an out of date nag message on bootup if the BIOS rev gets too far ahead.

I got burned about a year ago - a RAID container on a PE4300 went bad and system wouldn't boot(can't find boot partition). Dell suggested Flashing BIOS etc etc, and after I did, I found that TWO drives of the six in the container were not there. Nothing amiss was reported beforehand, and there were no red lights on the drives.

I was later told by Dell that the drives probably didn't go down together, but nothing in boot messages, or FAST etc showed anything wrong. Now I check regularly for updates and install the latest every 6 months or as soon as it's available. Glad I have a good backup scheme and a spare server handy at all times (not using clustering though)...

My backup rotation is a modified GFS - 2 sets of mon-sat, 5 weeklies that are used in rotation on Sundays, 12 monthlies used on the last business day of a month in place of the regularly scheduled backup for the day, and a year-end tape.
Weekly, monthly and year-end tapes are sent offsite to a commercial storage place, dailies are kept onsite. All backups are full (using a PV100 SDLT drive, no danger of filling one of these tapes yet!). Finally I always back up each server to two different tapes - simplified example using 3 servers: A backs up A and B, B backs up B and C, C backs up C and A. Every other day I restore about 10 Gb just to make sure the tapes aren't blank.

Scheduled dress rehearsals for a disaster occur every six months (that's what the spare server is usually used for). It's a great feeling to know that after a total failure I can get any of our servers back on line in a couple of hours - depending on how long the restore takes.

Sorry for rambling on, but I think backups are about the most important thing a server admin can do - not only for the company, but also for his/her own job security! (I got my first server admin job after my then-boss was fired for mucking up the backups - server went down and he couldn't find a tape with a good backup newer than a week old)

I hope this helps somebody out there!

Cheers!

[morning] Darts
 
Might I add one word to Darts' comment on backups "but also for his/her own job security" - """"Sanity"""". Good evening ladies and gentleman, new member. Hope to talk to you all as time goes along.
Later!


FarOut
V-Peace-V
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top