Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

L700 Robotic arm failures 1

Status
Not open for further replies.

RenC

MIS
Apr 7, 2003
49
0
0
US
We have had our robotic arm fail five times over the last few months. Storage Tek keeps telling us that it is just that residue has built up on the gripper, but we don't buy it. They have replaced the arm once, and cleaned it several times and it still keeps failing. It did not start failing until the last library firmware upgrade. I am curious to know if anyone else is having this issue and if anyone knows a root cause. Details of the problem are below.

Environment:

NetBackup 4.5 FP5
Storage Tek L700E library at firmware level 3.06.00
IBM Ultrium LTO1 drives at firmware level 3481

Failures/error codes:

Backups fail with 52 (media mount timeout) errors. This is usually our first indication there is an issue

Ran robtest and received error "Logical unit not ready, Manual intervention required" message. If you go into "Drive info" on the control panel, you receive a targeting error"

Checked the Tape Library and the red "Service Required"
light was on. Opened the door on the Tape Library and there was the robot sound asleep

When you reset the library the robot comes back to life and errors go away.

Any help on this issue would be appreciated.
 
Hi,
is the targeting error always on the same drive, or does the issue move from drive to drive? I would uplevel the drives to the latest firmware to start with. The media mount timeout could be due to a communication problem between the library and the drive.

 
The targeting error is more of a symtom. We would get a targeting error on any drive we tried, by then the robotic arm is down. Unfortunately, the CE reset our fsc logs when they reset the system. I know that the last time it happened we received a 3204 error with no component listed and we received a 3E18 error on one drive about the time the library started having a problem. Storage Tek keeps telling us that our problem is that the gripper hands need to be cleaned, but they are cleaned once a month. The last time the robotic arm died, it was only two days since it died previously and had the gripper cleaned, so I doubt that is really the problem.
 
why don't you put the old firmware back in and see if the problem disappears?
 
I had the same issues. Replaced it more than 10 times. The last replacement was a "newer" version and our tech said the old ones had problems. Ask if you can get the latest revision of the arm/hand hardware itself. We've been running just fine for over 2 months now.
 
Here's some info I found on Hand / Z assembly problems...


***********************************
Technical Bulletin

StorageTek MPSS 'Open Systems Support'



Subject: Z assembly replacement due to skipping teeth or excessive vibration in L700/180
Number: MPSS-436c
Date: 8/24/01

References:
Action: Info

Product/System: L700, L180
Contact: MPSS, Open Systems Support

************Last update: 11/04/2003***************




UPDATE to Z-assembly replacement on L700/180 11/4/03.

Problem:

We have had several field incidents involving the Z-belt skipping teeth in the L700 and L180.

Along with Z-belt slippage, there have been reports of extreme vibration during initialization.

Each time, CSE's have spent multiple service calls and swapped dozens of parts before isolating the problem.

NOTE: Due to the length of this bulletin, please read carefully and completely.

Explanation of Z-belt Slippage Problem:

There have been a small number of L180/700 units identified as having a problem with the Z belt due to a manufacturing problem of the belt itself. This has been known to cause the belt to skip teeth on the Z pulley and cause library failures. The errors generated when the belt skips varies widely, but primarily will occur when the unit has done a move from a top cell location, to say a bottom drive. The longer downward Z move has caused the belt to skip.

The problem can cause tapes to be left sticking out of drives during mounts or dismounts. It can also cause excessive retries for gets/puts at cells. The unit generally has a hard time doing anything when this occurs. A re-initialization from a door opening or an ipl allows Z to recalibrate itself. Operation is then normal until another skip occurs. A one-tooth skip, makes Z 0.118" different in location from where is should be. The library does not currently know this has occurred.

Possible FSC's observed from a failing unit are:

3381 NONE

3329 NONE

3311 NONE

30b1 REACH

309d REACH

3087 REACH

3216 NONE

3094 REACH

3304 NONE

308c Z

30bd -

** we now can detect the belt skip under most circumstances and post an FSC.

30bd ERR_SRV_BELT_TOOTH_SKIP_ERROR


Additional FSC's that have also been reported include but not all inclusive are:

3096 REACH

30A2 Z

3201 NONE

*****3204 NONE*******

3098 NONE

In addition to fsc's being generated, the statistics will also show excessive retries for gets and puts to cells and drives. The ftune data may also show reach depths at drives to be significantly shorter than they should be. Typical reach depths for DLT drives at the drive tower are -100 +/- 40. This failing unit had reach depth to drives from -150 to -230. Reach depths prior to the belt skipping were -97 to -108. Anything over -180 will most likely result in an error to the drives in some way.

Identifying the Problem:

There are two ways of identifying this failure. Doing one or both of these steps, can identify a unit that possibly had the Z belt skip, And this information must be entered into the pinnacle issue for approval review.

Either:

1)

* Take a white paint pen, or whiteout, and mark the Z belt and pulley together at a given location. When Z is returned to that location the marks should always re-align if the belt has not skipped a tooth, How many teeth were skipped ??

or

2)

* From els, enter the servo environment and gather the raw Z position when the hand assembly is held at the top of the Z channel. A unit that has not skipped will show a tach location of 20 or so depending on how hard you are pushing Z at the top crash stop. It will not vary much.

This number is very repeatable from init to init. After suspected failure, take Z number from same location and compare to previous number.

Attach to J282 on the MPC card, second port from top (RJ-45) labeled TEST on cover.

This is the engineering test port. Connecting to this port with any dumb terminal utility or hyperterm with settings 9600 baud, 8 data, No parity, One stop bit, with the supplied RJ-45 cable and adapter to a pc serial port should result in a prompt "ELS>". The proper serial port for your pc must also be selected. Typing to this port is case sensitive. All small letters should be used. It will not recognize CAPS. If the unit has 2.20.00 code or higher, the baud rate setting is 38,400.

Do not power cycle, ipl, or open and close the door prior to looking at these numbers after you think a belt skip has occurred. Open the door, move Z to the top, and take the reading.

Note: Move the hand SLOWLY to the top of the z-column. It is very easy to make the belt skip teeth if the hand is moved too fast.

From this prompt type: "gos" > Enter

You should see prompt change to "Servo>".

From the servo prompt type: "where"

This will give the current location for the hand (reach), theta, and Z.

Holding Z to the top and typing "where" will give you the current Z location in tach counts. After reading the number in the suspected failing state, you may then close the door. After allowing Z to initialize, open the door and read Z value at the top again.

Comparing these two values will allow you to determine if the belt has skipped or not.

You may also take the reading for the non-failed state at any time prior.

The previous failing unit was skipping two teeth and resulted in a Z location reading of -283. The belt typically skips such that the count goes lower from its initial perspective. This results in a negative number at the top. Numbers more than 100 counts from the initial (20 or so) number would indicate the belt has skipped teeth.

Explanation of Z-Assembly Vibration Problem:

Another issue that can cause Z Column failure is an extreme vibration.

This is caused by the rivets used in manufacturing and is not field repairable. This problem has been seen in some Z-columns that have a manufacturing date of 2002.

The problem does not cause belt slippage, rather an initialization failure and noise or vibration is observed and the following FSC(s) may appear:

3089, ERR_SRV_CANT_START__NOT_IN_STOPLOCK,

308d, ERR_SRV_MECH_DROPPED_OUT_OF_STOPLOCK,

308e, ERR_SRV_MECH_FAILED_TO_SETTLE_INTO_STOPLOCK,

Field Action:

In the event you have library that has experienced the problem of Z-belt skipping teeth or a vibration problem, the following actions should take place:

1. Open up a Clarify and assign to Open Systems T3 Tape or Pinnacle issue and assign a secondary to "OPENTECHT" for review. Make sure complete details of the problem and actions taken are included in the issue. The output from the ELS entries should also be provided for any belt slippage problem. Serial number of the library must also be included.

2. A Tech Specialist must be involved before consideration of replacement parts for this problem.

3. Once the information has been reviewed by Technical Support, and it has been determined that the library has a defective belt or vibration within the Z-assembly, a replacement Z-assembly may be ordered.

4. Once the Z-assembly has been replaced, the defective assembly must be returned as quickly as possible. There are a very limited number of field bill spares available and they need to be turned around quickly.

Long Term Solutions:

Engineering is planning to release library code, 2.4x.xx, that will have recovery function built in should a tooth slip be detected. Z-Belts are being manufactured using the 9710 formula. This formula has been proven to hold up to the L-series performance requirements. Spring tension is also being increased in the Z-assembly.

Additional questions:
Q- Can only the belt be replaced in the field?
ANS- Engineering does not recommend just replacing the belt, rather the complete assembly should be replaced.

Q- Are the FSC's listed going to occur together or are there individual possibilities?
ANS- If the positioning ends up one tooth off, multiple, different error scenarios are possible.

Q-Has the belt problem been resolved?
ANS- The problem has been resolved at the factory. The cause was due to the belt supplier changing manufacturing locations. Manufacturing purged all of the defective stock once the problem was discovered. They have also implemented accelerated test code in Systems Testing to screen out possible failing belts. Unfortunately, quite a bit of product shipped from the time of the first error until the cause was understood.

Q-How do you order a replacement Z-assembly for the L180 or L700 should one be needed?
ANS-The following field bills have been released EC Exceptional:
L700 = FB-101509
L180 = FB-101507

** When returning the failed parts, refer to the actual PN and not the FB:
FB-101507 - L180
P/N -313739501 - Z Column Assy
FB-101509 - L700
P/N-313739401 - Z Column Assy

Note:

1. A Pinnacle issue is used by STK internationl field operations.

2. A Clarify case is used by STK US field operations.

A Pinnacle or Clarify number must be provided to Code-A prior to shipment of replacement assembly and the information from at least one of the tests must be entered into the pinnacle issue for approval review.
Note:

Be careful that you have the above issues rather then the problem described in Tech Bulletin MPSS584 and mentioned here


Problem:
L700 and L180 libraries may fail to initialize with FSC-308F, Z out-of-range or FSC-3084, destination outside operating range errors.
Symptoms:
The library may fail during initialization during Z-sweep with:
FSC-308F, ERR_SRV_OPERATING_RANGE_OUT_OF_SPEC,
The range of motion on initialization is not within specification.
The library may initialize and fail later when attempting to move to the lower cells with:
FSC-3084, ERR_SRV_DEST_OUTSIDE_OPER_RANGE,
The requested destination is not within the allowable operating range
NOTE: FSC 3084 is seen with FSC-331B Audit Timeout and possible other various vision related errors (ex. 331A , 3319)
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top