Exchange didn't used to take forever to backup

gregmuir2 · Feb 16, 2007

Exchange 2003, Backup Exec 10d. It used to take around 12 hours to backup 100+ gigs. The current job is running for 21 hours, 94gb, and is 68% done. If I did my math right, and it's early and I'm only on my first cuppa, that means this job is going to run for 30 hours and 138gb.

I've seen directory transversals as being the cause of prolonged backups; five 10gb files would backup a lot quicker than 50gb spread across a thousand directories. That may be true but dang, just how inefficient is backupexec? Robocopy can chew through even the largest directory structures like a pitbull through a leg.

My confidence in the software is really at a low ebb right now.

Zelandakh · Feb 16, 2007

Look at the logs. What are they telling you?

Single store, multiple stores or what?

How many users?

Brick and store or just store?

Online defrags completing in Exchange?

I'm doing similar and do 210GB in 2 hours so it isn't the software.

gregmuir2 · Feb 16, 2007

GM: Thanks for replying!

Look at the logs. What are they telling you?

GM: BackupExec logs (job history) or Exchange server logs?

Single store, multiple stores or what?

GM: One exchange server, multiple stores.

How many users?

GM: 99

Brick and store or just store?

GM: Not sure what the question is here. We're backing up the entire mailbox for the users.

Online defrags completing in Exchange?

GM: Do you mean a defragging of the hard drives via the standard drive defrag in Windows or is this an Exchange-specific utility I am unaware of?

I'm doing similar and do 210GB in 2 hours so it isn't the software.

GM: That's encouraging! Some of the other servers take an age and a day to backup so my problem may not be unique to Exchange.

Zelandakh · Feb 17, 2007

BE logs.
How many stores?
99 users and 100GB? Cool.
"entire mailbox". Please state exact wording of lines in backup job for Exchange Server - Information Store, Mailboxes, both?
Online defrags in Exchange are done by the Exchange System Manager and are reported into Event Log/Application log at whatever intervals they are set to run.

If everything is slow (Exchange is my slowest backup at 100GB/hr), you have a bottleneck.

Can you detail something similar to:
Make: HP DL380,
RAM:4GB RAM,
Disks: 6 x 73GB 15k Ultra 320 as one RAID5 volume for OS and Apps
Disks: 14 x 146GB Ultra 320 as RAID5 for stores connected using 4Gb/s fibre HBA
Backup device: HP Storageworks 360 LTO-3 Ultrium 16 tape library, one drive on Ultra320 SCSI, no other SCSI device on chain, active terminator.

But check the sections in BE logs as it may run really fast for some bits and slow for others.

gregmuir2 · Feb 19, 2007

I'm pulling the rest of the info together to post here. In the meantime, I think the fragment problem is probably where we're at. The drive hosting the exchange databse is all red. I would have thought a standard windows defrag would take care of everything. Nope! The only files that didn't defrag are the database files.

From reading the link below, it sounds like Exchange defrag is it's own unique process.

http://www.petri.co.il/defragment_exchange_2000_2003_server_databases.htm

And the process involved in the offline defrag sounds like it would create a big headache but the online process is already scheduled and not doing the job. The way the server is setup, the OS is on C:, Exchange and the data files are on D:. That's a 135gb array with 29.1gb free. The stores are 44.45, 13.36, and 26.02gb respectively.

Zelandakh · Feb 19, 2007

Yes, Daniel's a thorough fellow. You may not need an offline defrag - you need to check if your online defrag is working though. Look for event id 1221.

Personally, I'd shunt the 13GB store into the 26GB store. You may even find that speeds the server up. Possibly increase the maintenance window for Exchange online to get it done and up to speed remembering that it will kill your backup speed if that is running too.

Do you have another volume on that server?

gregmuir2 · Feb 19, 2007

Aha! I see that problem you are talking about in there. 1221, The database "First Storage Group\Store1 (Corporate)" has 467 megabytes of free space after online defragmentation has terminated.

According to the KB article, that means there's 467mb of slack in the file that COULD be released by offline defragmentation but it will likely be used in the future.

This is also coupled with 1217.
The Exchange store 'First Storage Group\Store2 (Fort Myers)' has unlimited storage capacity. The current physical size of this database (the .edb file and the .stm file) is 28 GB. The only size constraint for this database is the maximum size that is supported by the database storage engine (8000 GB).

Microsoft isn't suggesting any action here, I guess this is more of a heads up if you were planning on constraining the database size and forgot to.

gregmuir2 · Feb 19, 2007

Ok, hardware info:

Make: HP DL380,
RAM:3GB ECC RAM,
OS Array: 2 x 69GB SCSI Ultra 320 as one RAID1 volume
Data Array: 4 x 69GB SCI Ultra 320 as one RAID5
Backup device: Exabyte XVA 1x10 1U SCSI (1 drive 10 tape changer) Tape drive attached to domain controller, backing up over 1gb ethernet, teamed network cards (two cards acting as one, doubling bandwidth)

Zelandakh · Feb 19, 2007

Data array is 4 x 69GB (they'll be 73GB SCSI disks then) on a RAID-5 but you say D is 135GB. That is only TWO disks worth. Check disk manglement to see how the RAID array is actually config'd - you may have a disk pair that are not being used.

1217 is either a limit of xGB (16 to 75) or unlimited (specified as 8000 to cover themselves).

So you are backing up over the network using teaming. That is probably your issue. Check all network cards in use are forced to gigabit full duplex. Check switches that they plug into have their specific ports set to gigabit full duplex. Windows update both servers, check BIOS of both servers and ensure network cards have latest drivers.

Exchange doesn't seem to have a prob though what are the 1221 events for the other 2 stores?

gregmuir2 · Feb 19, 2007

Ok, let me clarify my previous statement, I mistyped there. The array is actually 3x. I'm reading this information off of the System Management Homepage provided by HP.

Physical Drives
Port 2 Drive 2 69460 MB
Port 2 Drive 3 69460 MB
Port 2 Drive 4 69460 MB

Here's the setup of the drive taken from the Array Diagnostic Utility.
Logical Drive 2:
Configuration Signature: 0xa003ad7f
Mapping Scheme: Multiple Block
Physical Drives: 5 (number not valid after drive movement)
This Logical Drive: 3 (excluding spare drives)
Fault Tolerance Mode: Distributed Data Guard (RAID 5)
Logical Param Table: cyl=34866 heads=255 sec/track=32 xlate sig=0x0
Distribution Factor: 32
Operating System: 64768
Controller Order: 0
Additional Information: 0
Offset to Data: 0
Backed-out Write drives: 0
Stripes for Parity: 16
Distribution Mode: 0x00
Int 13h Support Enabled: Yes
Sectors on Volume: 284506560
Sectors per Drive: 142253280
Big Drive Assignment Map: 0x0000 0x001c 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
Big Spare Assignment Map: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
Array Accelerator is enabled for this logical drive.

http://en.wikipedia.org/wiki/RAID-5#RAID_5

According to the wiki article, the usable space formula is (N-1)*Smin which for me is (3-1)*69gb=138gb. So accounting for the reported disk size always being a few % smaller than the rated disk size, we have 135gb.

Hardware checks you recommended: in the process of looking now.

Additional error codes:
Here are the ones popping up as of last night's Exchange maintenance, triggering at 12am.

Event 510
Information Store (3524) First Storage Group: A request to write to the file "D:\EXCHANGE\STORE1\priv1.edb" at offset 31879282688 (0x000000076c274000) for 4096 (0x00001000) bytes succeeded, but took an abnormally long time (76 seconds) to be serviced by the OS. In addition, 421 other I/O requests to this file have also taken an abnormally long time to be serviced since the last message regarding this problem was posted 7848 seconds ago. This problem is likely due to faulty hardware. Please contact your hardware vendor for further assistance diagnosing the problem.

This repeats several times, different offsets and delay values. I sent the array's diagnostic file to HP and they said everything is fine. (I don't trust them.)

Event Type: Information
Event Source: MSExchangeIS Public Store
Event Category: General
Event ID: 1221
Date: 2/19/2007
Time: 12:18:22 AM
User: N/A
Computer: EXCHANGE
Description:
The database "First Storage Group\Public Folder Store (EXCHANGE)" has 207 megabytes of free space after online defragmentation has terminated.

Now for the next store.

Event Type: Warning
Event Source: ESE
Event Category: Performance
Event ID: 509
Date: 2/19/2007
Time: 12:49:55 AM
User: N/A
Computer: EXCHANGE
Description:
Information Store (3524) First Storage Group: A request to read from the file "D:\EXCHANGE\STORE2\Store2.edb" at offset 651264 (0x000000000009f000) for 4096 (0x00001000) bytes succeeded, but took an abnormally long time (61 seconds) to be serviced by the OS. In addition, 6 other I/O requests to this file have also taken an abnormally long time to be serviced since the last message regarding this problem was posted 6699 seconds ago. This problem is likely due to faulty hardware. Please contact your hardware vendor for further assistance diagnosing the problem.

So there are some troubleshooting steps they've suggested that I will look into.

Several more of those lag errors.

Event Type: Information
Event Source: MSExchangeIS Public Store
Event Category: General
Event ID: 1221
Date: 2/19/2007
Time: 1:11:56 AM
User: N/A
Computer: EXCHANGE
Description:
The database "First Storage Group\Public Folder Store (EXCHANGE)" has 206 megabytes of free space after online defragmentation has terminated.

Here's a weird error.

Event Type: Warning
Event Source: Perflib
Event Category: None
Event ID: 1016
Date: 2/19/2007
Time: 2:15:00 AM
User: N/A
Computer: EXCHANGE
Description:
The data buffer created for the "MSExchangeIS" service in the "C:\Program Files\Exchsrvr\bin\mdbperf.dll" library is not aligned on an 8-byte boundary. This may cause problems for applications that are trying to read the performance data buffer. Contact the manufacturer of this library or service to have this problem corrected or to get a newer version of this library.

For more information, see Help and Support Center at

http://go.microsoft.com/fwlink/events.asp.

Data:
0000: 28 dd 31 01 74 2e 00 00 (Ý1.t...

Hmm, this is weird. I'm trying to find the 1221 events for all three stores. Each store should have one, right? I'm seeing duplicates here.

Well, it seems like it would make sense to start at the bottom of this mess and work our way up. If it could be a hardware issue, I'll get on the phone with HP (sigh) and try working through it from there. If it's a hardware problem, yay. If we can rule that out, then it's time to look at Exchagne again. And if that can be ruled out, then I can start casting the evil eye at Backup Exec again.

*mumbling to self* I'm getting paid for it, I'm getting paid for it, this is more interesting than fixing the copier *mumble mumble*

Zelandakh · Feb 19, 2007

OK, that's a huge amount of progress - get some java, take a smoking break if that is your thing.

Your problem appears to stem from the array. Possibly the RAID controller (I'd guess that is the issue as disk probs would poss give you 1018 errors but just whip down the event log and check for event id 1018. Fingers crossed you haven't got one of those).

Do you have a spare RAID array you can plug in? If so I reckon we can crack it real fast, if not it could be a warranty job and that gets dicey as you need both arrays up to migrate the data.

3 disks means all is well on the disk front in terms of numbers.

gregmuir2 · Feb 20, 2007

I've been onthe phone with HP. They gave me a pile of updates to install and then some diagnostics to run. I have to do this during non-working hours since it will disrupt everything. Oh, joy.

What bothers me is that I can never get a straight answer. The guy I spoke to yesterday said that they cannot use webex and look at the server at the lower level of tech support, the only people who can do that are the expensive tech support. He gives me a number to call. I speak to the guy at that number today and he says that's flat out false, I am still under warranty and should be getting their best support. I get transferred to a customer care division where they take my info and then transfer me to another tech who seems to know less than the tech from yesterday.

One of the things he was telling me had me concerned. Now as far as I am aware, the software that resides on devices in a computer is called firmware, to differentiate it from what resides on the disk drive. The specific chips they sit on located on the device are called ROMs if they cannot be rewritten, EPROM or EEPROM depending on the technology if they can be rewritten, but most of it now is on FLASH RAM. Some manufacturers are smart and include a ROM with the shipped version of the driver on the device in case a flashing goes bad; other manufacturers don't so if you have a bad flash, you've now got an electronic brick.

What this tech was saying is that there's a jumper you can set inside the machine that will wipe the NVRAM. I asked him to clarify if he was talking about the flash ram for the firmware. He said yes, NVRAM. Ok. He says there's a jumper you can use to reset your CMOS. Ok. But as I recall, most RAID's are configured using a BIOS located on the RAID controller, said controller either an add-on card or something built onto the motherboard. If we clear the NVRAM, won't we also clear the RAID info? He says he's positive that won't happen. Hmm. Somehow, this doesn't give me that warm fuzzy feeling of certainty I like when I go mucking around in servers.

So at this point I'm running exmerge to get all the mailboxes off the server. If anything happens to break it, at least I've got a backup of the important data. It just bugs the snot out of me that I get the "take two asprin and call me in the morning" sort of treatment when dealing with tech support. The company doesn't want to take the time and spend the money to train their techs up and then they expect us to keep giving them big $$$ for these servers. I'm not a world-class expert on this stuff and I'd really like to be getting my support from someone who is. Ugh. HP: Highly Problematic.

I'll keep the thread updated as developments occur.

Zelandakh · Feb 20, 2007

To be safe, check you've got a good backup and ensure thatg the transaction logs are being written to another set of disks (C if there is room). Then you have a full play forward option if you ever hit a 1018.

Keep hassling HP - pain but required.

Pref go to the Boss and ask for a new array. They are less expensive than losing Exchange. And tell him that...

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Exchange didn't used to take forever to backup

gregmuir2

IS-IT--Management

Zelandakh

MIS

gregmuir2

IS-IT--Management

Zelandakh

MIS

gregmuir2

IS-IT--Management

Zelandakh

MIS

gregmuir2

IS-IT--Management

gregmuir2

IS-IT--Management

Zelandakh

MIS

gregmuir2

IS-IT--Management

Zelandakh

MIS

gregmuir2

IS-IT--Management

Zelandakh

MIS

Similar threads

Part and Inventory Search

Sponsor