Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

AIX 4.3.3 to 5.1 migration problem

Status
Not open for further replies.

swortsoul

MIS
Jul 9, 2003
25
US
We just migrated our production servers from AIX 4.3.3 to 5.1. It worked ok on the test machines, but we had a severe problem on production. We've not been able to figure out what happened (nor has IBM). Here's our systems, a timeline of events, and the problem will follow.

2 7017-S85's in HA cluster

AIX 4.3.3 -> 5.1.0.0
SDD 1.3.3.6 -> 1.5.0.0 to communicate with ESS
HA 4.4.1

We failed over from secondary to primary. Apps running on primary.
Remove SDD from secodary.
Migration install on secondary.
Right at the end, we believe just after the auto reboot at the end of install, the primary machine lost sight to some vpaths which crashed the prod db running on primary.

Finished upgrade on secondary (including ML05, hapatches and reinstall new SDD)

Bring systems up, and failover to secondary.

Remove SDD from primary.
Migration install on primary.
Right at the end, we believe just after the auto reboot at the end of install, the secondary machine lost sight to some vpaths which crashed the prod db running on secondary. (same issue as before)

I've had my IBM software support look at my plan, and they saw no issues with it.

What actually happened, was that the machine out of the cluster at that point, put a persistent reserve on the disks being used by the other machine. What could have caused that? I thought only a varyonvg would do that, and I didn't do that. The concurrent vg's are not set to autovaryon, plus HA is the only thing that will vary those on, and it doesn't come up automatically.

Has anyone experienced this before? Any ideas? I've run out of them.

One more note, we had a hardware failure on the shark a couple weeks ago, which basically made the disks unavailable, and had the same affect (crashing the DB). So, when it first happened on the primary machine, it looked just like another hardware failure on the shark (not my machine), which is redundant, and I pushed on with the upgrade. When it happened the 2nd time, I realized that it probably wasn't another hardware failure on the shark. :)

Thanks
Tim


 
hi ,

I'm not an expert on ESS , but had something similar happen on SSA , i.e. certain disks in a loop on one machine after reboot ment those disks could no longer be accessed , one reason was the way we physically had the disks installed in the SSA CAB . Is there something similar on ESS i.e. do you have to have disks installed in particular slots number of dummmies etc..

I assume you have fibre connected from your machine to the ESS ? are there any issues here relating to card timeout , physical connections.

hope this gives you a bit more to think about
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top