Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

FC5600 RDY* state

Status
Not open for further replies.

POPKORN

Technical User
Jan 10, 2005
95
0
0
US
Hello Every one,

I just had a mayor drive failure for some unknow reason and 10 drives lighted up the amber light. I brought the system down, powered off SPS's. Waited for cache to be emptied and finally powered down the controller.

After powering up the system again, All the amber lights are now green except for the top right one of the controller.

Here si a paste of the symptoms..


07/02/2005 01:49:56
fcli> ls
Logical Unit Summary:

Raid Dflt. Unit
LUN Group Owner Type Capacity Cache State Frus
--- ----- ----- ------ -------- ----- ------- ------------------
1 1 SP-B RAID-5 82.7 GB RW- ENA 3 4 5 6 7 8
3 3 SP-B RAID-5 148.8 GB RW- RDY* 10(DEAD) 11(DEAD) 12(DEAD) 13(DEAD) 14(DEAD) 15(DEAD) 16(DEAD) 17(DEAD) 18(DEAD) 19(DEAD)

I did had 2 hot spares but I have to change the drives and they were not formated so I had to unbind them to start the process but this is rather concerning as veritas failed to backup all the work that was performed this week.

Any ideas as to how to fix this without loosing data?

or am I pretty much $cr3w3d!!

any help would be greatly appreciated.


 
Just wanted to add some info that i founf in the clariion logs just in case this helps.

Event Date CRU Event (Message) Extended Status
120. 07/02/05 01:45:31 SP B 0x6F9 (Peer Bus Soft Error) 0x02
121. 07/02/05 01:45:31 SP A 0xA11 (SP Removed) 0x03
122. 07/02/05 01:47:00 SP A 0x643 (SP Initializing) 0x00
123. 07/02/05 01:47:02 SP A 0x644 (SP Inserted) 0x00
124. 07/02/05 01:47:08 SP B 0x6F9 (Peer Bus Soft Error) 0x02
125. 07/02/05 01:47:08 SP B 0x6F9 (Peer Bus Soft Error) 0x02
126. 07/02/05 01:47:09 SP B 0x6F9 (Peer Bus Soft Error) 0x02
127. 07/02/05 01:47:09 SP B 0x6F9 (Peer Bus Soft Error) 0x02
128. 07/02/05 01:47:09 SP B 0x6F9 (Peer Bus Soft Error) 0x02
129. 07/02/05 01:47:09 SP B 0x944 (Peer Bus Hard Error) 0x01
130. 07/02/05 01:47:09 SP A 0xA11 (SP Removed) 0x03
131. 07/02/05 01:48:40 SP A 0x643 (SP Initializing) 0x00
132. 07/02/05 01:48:42 SP A 0x644 (SP Inserted) 0x00
133. 07/02/05 01:48:48 SP B 0x6F9 (Peer Bus Soft Error) 0x02
134. 07/02/05 01:48:48 SP B 0x6F9 (Peer Bus Soft Error) 0x02
135. 07/02/05 01:48:49 SP B 0x6F9 (Peer Bus Soft Error) 0x02
136. 07/02/05 01:48:49 SP B 0x6F9 (Peer Bus Soft Error) 0x02
137. 07/02/05 01:48:49 SP B 0x6F9 (Peer Bus Soft Error) 0x02
138. 07/02/05 01:48:49 SP B 0x944 (Peer Bus Hard Error) 0x01
139. 07/02/05 01:48:49 SP A 0xA11 (SP Removed) 0x03
 
07/02/2005 02:09:35
fcli> di -l
Fru Vendor Model Rev. Serial no. Capacity
==== ======== ================ ==== ============ =========
0. SEAGATE ST118202 CLAR18 350C LK754693 0x2141301
1. SEAGATE ST118202 CLAR18 AV28 LKJ11799 0x0
2. SEAGATE ST118202 CLAR18 350C LK834621 0x2141301
3. SEAGATE ST118202 CLAR18 350C LK795670 0x2141301
4. SEAGATE ST118202 CLAR18 350C LK823383 0x2141301
5. SEAGATE ST118202 CLAR18 350C LK832069 0x2141301
6. SEAGATE ST118202 CLAR18 350C LK627323 0x2141301
7. SEAGATE ST318304 CLAR18 3A90 3EL02FJN 0x2141301
8. SEAGATE ST318203 CLAR18 AV44 LR936857 0x2141301
9. SEAGATE ST118202 CLAR18 AV28 LKJ40479 0x0


All these drives are from the DPE enclosure. The drives on the DAE are not listed here for some reason and they should be.

Sorry for the reposts, im just tired and frustrated as I dont find the solution.
 
It looks like SPB failed and rebooted itself. The question is .. why.
If the amber light you are talking about is the right-most light on the DAE (on the front of the unit), this is indicating that the fibre loop is not closed. The DAE is being bypassed. The normal path is from the sp through the lcc, thru the odd disks back thru the lcc to the sp to close the loop and then it runs thru the even disks. Since a portion of the loop was hosed, I suspect that is why you got all the amber lights initially. Possibilities:
A bad disk that is locking up the fibre loop.
A bad LCC card
A bad cable (DPE to DAE)
A bad SP.

Verify that you have no fault lights on the back of the unit.
Double-check your connections DPE to DAE.
If the cables look good, swap the cables, then the lcc's then the sp's to try to change the symptoms.

A look deeper into the log might point to a culprit.

Good luck.
 
Yup, you were right. It was a faulty LLC card. Thats why it was not even seeing the other drives because the loop was broken. After I replaced the LLC card and rebooted the system everything came back to normal.

Thanks....


the raid 0 that its on this unit now, the drives were replaced with new ones. I formated the drives on sp, the binded them as a raid 0. and here is what it look like.

fcli> ls
Logical Unit Summary:

Raid Dflt. Unit
LUN Group Owner Type Capacity Cache State Frus
--- ----- ----- ------ -------- ----- ------- ------------------
0 0 SP-B RAID-0 49.6 GB RW- ENA 0 1 2
1 1 SP-B RAID-5 82.7 GB RW- ENA 3 4 5 6 7 8
2 2 SP-B HotSpare 16.5 GB --- ENA 9
3 3 SP-B RAID-5 148.8 GB RW- ENA 10 11 12 13 14 15 16 17 18 19



the question is when I try to bring it back online on solaris 9, I get bad superblock or invalid magic number. If I format the raid0 on solaris and wait for it to verify all the media will this solve the issue or do I have to make any changes on vsftab. The naming convension is still the same as it originally was and then i did format the path shown was the same as well.

any ideas?


thanks in advance..




 
I'm not a Solaris guru but I suspect you are correct in thinking that formatting via Solaris will correct the problem. Solaris is probably looking for an id on that disk that was written when it was originally formatted. Since it no longer exists, you get the errors. I'd go ahead with the format.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top