Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

ArcserveIT 6.6 tape inventory causes NW5.1 high utilization 1

Status
Not open for further replies.

BazG

MIS
Oct 4, 1999
3
US
Tape inserted into Compaq TL891 2-drive/10-slot tape library and Arcserve tape inventory starts, loading each cartridge in a drive one at a time. This causes 100% utilization on NetWare 5.1 server which is a node in 2-node NWCS cluster. Node eventually abends because of "poison pill" condition. Setup as follows:

ArcserveIT 6.6 build 111.106
ELO 6.6 build 111.121
TLO 6.6 build 111.121
NetWare 5.1 SP2A
NetWare Cluster Services 1.01
Compaq TL892 Tape Library
Compaq FC SAN Switch 16EL
Compaq Modular Data Router

Anyone got any idea why an Arcserve tape inventory could cause such high processor utilization?


 
Yes, your problem is described in the following document
Problem Description

Constant high processor utilization on NetWare 5.x servers when ARCserveIT 6.6 performs tape inventory on tapes in TL89x tape libraries.

Symptoms

One or more of the following symptoms will be apparent:

Constant high processor utilization (in excess of 65%-70%) during time taken for ARCserveIT 6.6 to complete tape inventory;
Slow server performance;
Slow loading of NLMs on server;
Clients lose connections to server;
Server Abends;
Processor utilization returns to normal once tape inventory has completed.
Cause

THIS IS NOT A HARDWARE PROBLEM. At time of writing, the problem of high processor utilization being caused by the inventory of tapes in the TL89x DLT tape library seems to be a combination of ARCserveIT and/or NetWare and the way that they access the server's DOS partition in real mode. This problem has manifested itself in direct SCSI connection as well as fibre channel configurations. Currently, ARCserveIT will always inventory the tapes when it is started at the server console (ASTART6) and there does not appear to be any method to disable this. Processor utilization will constantly stay at 65%-70% on a server (even with no additional load) during the inventory process which can take in excess of 30-35 minutes with a full magazine of ten tapes. Processor utilization will be a lot higher with additional loading. Backup Exec exhibits higher processor utilization as well, but to a much lesser degree; tape inventory is much quicker and there is also the option of disabling the inventory upon startup.

Resolution/Workaround

THIS IS A WORKAROUND WHICH HAS BEEN OBTAINED FROM NOVELL SUPPORT AND NOT A PERMANENT FIX.

Temporarily mount the DOS partition as a NSS volume prior to ARCserveIT inventorying the tapes. Currently, it is not possible to mount a DOS partition created by Caldera DRDOS 7 as a NSS volume. Do a manual installation of NetWare 5.x using MSDOS 6.2x boot diskettes only. Do not use Caldera DRDOS 7 as supplied on the NetWare Operating System CD.

Procedure

1. Do a manual installation of NetWare 5.x using MSDOS 6.2x boot diskettes (use FDISK and FORMAT to create the DOS partition).

2. Copy updated POLIMGR.NLM v5.85 dated 21-June-2001 to C:\NWSERVER (see attached).

Note: At time of writing, it is not sure what impact this updated version of POLIMGR.NLM has on the workaround as the version shipped with NetWare 5.x does not seem to cause additional problems.

3. Install Novell NetWare 5 Support Pack 6A & Patch OS5PT2A.EXE or Novell NetWare 5.1 Support Pack 2A & Patch OS5PT2A.EXE or Novell NetWare 5.1 Support Pack 3 as appropriate

4. Install ARCserveIT 6.6 for NetWare

5. Install ARCserveIT 6.6 for NetWare Enterprise Library Option

6. Install ARCserveIT 6.6 for NetWare Tape Library Option

7. Install ARCserveIT 6.6 for NetWare Support Pack 4

8. Install ARCserveIT 6.6 for NetWare Patch LO98880.CAZ

9. Install latest NSSD

When server is up:


10. LOAD NSS (if not already loaded)

11. SET CDBE DIRTY WAIT TIME=1779

12. SET AUTO RESTART AFTER ABEND=0

Note: It is VERY important to set this SET parameter. Failure to do so may cause corruption of the DOS partition. See Novell TID 10017928 for more information.


13. LOAD DOSFAT.NSS (This will mount the DOS partition as NSS volume DOSFAT_C. If Step 12 has not been done, a warning message will appear).

14. ASTART6 (start ARCserveIT)

It is highly recommended that the DOS partition is only mounted as a NSS volume in cases of absolute need and should not be left mounted permanently or for a long period of time, because of the high risk of corruption of the DOS partition. Once the ARCserveIT tape inventory has completed, do the following:

15. UNLOAD DOSFAT.NSS (this will automatically dismount DOSFAT_C)

16. SET AUTO RESTART AFTER ABEND=1 (set back to default value)


Hope this helps

See also next

CPU Utilization Goes to 100% During Backup Type Functions in Netware

Problem Description

CPU utilization goes to 100% during backup type functions that will mainly include retention, format, erase, backup, restore.



Symptoms

Server will go to 100% utilization and may cause users to either get dropped by the server or not allow them to log in.



Cause

The issue is an architectural issue with NWPA.NLM.

All I/O requests are put in a HACB (host adapter control block) message; this includes tape requests. All of the HACBs are stored in a list that NWPA will service and send to the CDM (which goes back to NWPA and then to the actual HAM driver for the device which then goes to the device itself). Anyway, most every request made in the OS is a protected mode call (32 bits, extended processor mode, etc.). However, there are a few calls that can be made that are real mode (e.g., calls to the DOS partition to read/write data); these are 16 bit calls.

Okay, here’s comes the problem:

If we are in the middle of servicing some protected mode messages (say the long erase, erase the tape, is one of 15 or so ... busy server), and then a real mode request comes in, here is what NWPA does:

1. Queue the real mode request in the HACB list.

2. Issue NPA_Squelch_All_IO in a server thread.



This function tells the server that all I/O is now on hold. So, every new HACB message generated is received and queued, but not serviced UNTIL the real mode HACB is done. In other words, all I/O access is gone for new HACBs.

3. Service and finish all of the current HACB messages that were queued before the real mode HACB was queued.

4. Service the real mode HACB.

5. “Release” the lock on the I/O functionality in the server and begin servicing the other HACBs that have been queued since the Squelching function was called and allow new HACBs and disk requests to be made/serviced.



Hopefully you can see where this is headed...

So, did you lose LAN connectivity? No. However, you do lose your disk access. This means that when someone logs in, they must access the directory to update information ... that is put on hold until the real mode request is done. It is a HACB request queued up and waiting until the system releases the lock on the I/O channel (when the real mode HACB finishes). This will appear to users as if the server has shut down. It hasn’t, but it has ... does that make sense?

So, when you say it will happen when you issue a ‘load’ command from the system console prompt ... guess what? You’re probably making a real mode request and that (as you’ve read previously) will cause the server to “hang”.

I’ve heard of tape erases taking upwards of an hour. If that’s the case, you will be waiting one solid hour until disk access is granted again (at large). Ouch!!

Now, I talked with the architect for NWPA and he has verified everything that I’ve written above here (I wanted to make sure I wasn’t giving you bad information). He also says that this works EXACTLY the same in NetWare 4.x and NetWare 5.x; i.e., if a real mode request comes in the middle of a long protected mode request, the same “squench” functionality in the system will execute.

NetWare 5.x does happen to have more reasons to hit the DOS partition than NetWare 4 does (accessing the registry being the big one here); so this may be one reason we see it more in 5 than 4.

So, how is this fixed, or how can this be fixed?

One way and one way only. All DOS calls must be protected mode calls, not real mode calls. Once this happens, then NWPA will not call the NPA_Squelch ... function and we will not get in this hang condition.

NetWare 6 has a pretty good interface for allowing protected mode calls to get down to DOS. NetWare 5 has an okay interface, but it is not mature and there are problems with it. This interface in NetWare 5.x is called:

DOSFAT.NSS

NetWare 4.x has no such interface for 32 bit access to DOS. Obviously this NLM is part of NSS. It will hook all of the DOS calls (open, close, etc.) that are made in the OS and funnel it through this code. With that, the calls become 32 bit protected mode calls; no longer the 16 bit real mode ones that can cause problems if a long protected mode call (like a tape erase) is running. So, with only 32 bit calls going through NWPA, we get no hangs. However, like I said before, this module is not mature in NetWare 5. It is much better in NetWare 6. What I mean by this is that you should not trust it to run all of the time without giving you some problems (abends, hangs, maybe other things; whatever you want to classify as problematic). I just want to make sure that you’re warned before believing that this is a ‘perfect’ solution in NetWare 5 ... it isn’t.

You may want to load this module only when you do a tape erase, and then unload it immediately after the erase is done.



Resolution/Workaround



Resolution will be coming from Novell at a later time. this appears to be strictly a Novell issue.
 
Hi Wanda
I am suffering from abending clusternodes in an SAN environment, when I start AS jobs. It is NW 5.1 SP3 and Clusterpatch SP2 and AS 7 with all patches. The abends do not happen generally but occasionally. Can that be an issue of that case discribed above.
KR
Gery
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top