Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Cannot boot up old RS/6000 box

Status
Not open for further replies.

alteh

Programmer
Aug 31, 2007
3
0
0

Hello world,

I need to boot up this 10 year old IBM RS/6000 server. I can't ping it on the network. It's supposed to be running AIX.

The LED is showing an error code of FFE.

It's one of a sequence of Firmware Checkpoint Codes.

The manual says FFE means: "No boot - critical error(s) reported by IPL ROS -or- F1 key pressed", but I did not press F1.

But it's not really helpful as I have no other explanation. What does it really mean? What could the critical error(s) be?

I think it couldn't get beyond the firmware, so disk not read hence no boot up - tape still stuck in tape drive, diskette and CD-ROM drives not working...etc., because no controller processes running.

What do I do next? Change the battery on the motherboard? which may be for the IPL ROS Flash EPROM? I don't even know if the hard drives (x2) are OK or not.

I've also got an even older green console that can be plugged in, but it's trial and error when following the screen prompts and menus. Doesn't seem to like to reboot the box either.

I'm thinking perhaps I might try plugging the hard drives in another AIX box?

Any suggestions, tips, information... welcome.

I MUST rescue the data on the box!!!!!!

Thanks,

Al

 
Putting the drives in another AIX machine is probably going to be the quickest way of recovering your data. It should be straightforward to import the volume group.

Apart from that, it would be helpful to include the machine type and model, as well as the version of AIX you are expecting on the machine.

Judging by the types of errors in the same range/family as FFE, it looks like a hardware fault. My first inclination would be to open it up and make sure all connections are firmly seated, blow out excess dust. The description for FFE isn't that any particular part has failed, so that usually points me to look for loose connections.
 
Thanks. It was a hard disk crash! Heard some clicking noise whilst attempting to boot up, so opened up front panel to check. The amber LED is not lit, another small amber LED on circuit board is on.

Got to get a replacement drive then. Found a boot tape made from smit mksysb, so put that in, and got Base Operating System on the green monitor.

BTW - has this tape got the whole world on it? There're other tapes too - rootvg, applvg, and Sybase dumps. But all no use to me as got no O/S yet.

Anyway, I eventually got the Installation and Maintenance screen, with options:
(1) Start Install Now with Default Settings
(2) Change/Show Installation Settings and Install
(3) Start Maintenance Mode for System Recovery

Any ideas? Don't know how the box is set up. Am concerned that data might be spread out over several disks? And I want to ensure existing data is intact and not overwritten by tape. Shall I just pull the other disks out? and just leave the new blank drive in? Do I need to worry about volume groups? Don't know much about them.

Need to get it up and running before planning to move everything off to another box.

I'm a developer with mainly Solaris background and Sybase, never got this involved with an AIX box, especially this old (1997-1999).
 
Volume groups are very important, yes you need to pay attention to them.

Do you have a method of identifying which physical disks belong to which volume groups?

Try pulling the dead disk out, and powering up.

If the system still refuses to boot, it was probably a member of rootvg (operating system), and you'll have to go through a mksysb process to proceed with recovery.

If the system boots, the dead drive may have been a member of another volume group, or it may have been a mirror copy of rootvg; you'll just have to figure out what's present and what's missing to make that determination.

If you have to do a mksysb recovery though, you're likely to have a problem: with rootvg missing a disk, you may not have enough disk space to recover the mksysb. If/when you get the disk replaced, the mksysb procedure should identify any existing/remaining rootvg disks, disks that belong to other VGs, and disks that do not belong to any VG (your replacement disk). You will have to make sure that the mksysb process knows which disks it can recover to.
 
The RS/6000 has got 2 internal SCSIW disks, one is 1.08GB, the other 4.3GB.

There's also an external SCSI tower with 2 SCSIW disks.

I don't know what's on which disks. I just know that AIX 4.1 must be on the crashed 1.08GB disk, because the box isn't booting up, with or without it being inserted. Thus, the small 1.08GB must be Rootvg or part thereof. I don't know if any other disk(s) is/are part of Rootvg too.

I assume the raw Sybase database device is on the fat 4.3GB disk. I've also got Applvg on another tape, with Sybase files, the applications and user files I guess. Once AIX is up and running, I can restore Applvg from tape using smit, if necessary.

The AIX installation manual says there're 3 options for restore:
(1) Complete new/blank restore/installation overwriting/erasing everything
(2) Restore over existing AIX installation (overwriting AIX files but preserving others)
(3) Migration mode from AIX 3.2 up to and including AIX 4.1.0 ??

So, I think I will pull out the other disks, put in the new blank disk, and restore the smit mksysb (assume got rootvg only) boot tape to it, reboot and see what happens. This way, it won't destroy data on the other disks. And if any of them are needed, the BOS should complain? If all fails, I can then reinsert the other disks, and repeat the process.

Only thing is, the manual just talks about hdisk0. I'm not sure if I've seen that on the green screen.

How do I refer to the disks from the machine? I know that SCSI requires unique SCSI IDs. The internal SCSIW disks both got manufacturer's labels saying SCSI ID = 0. Is it that the circuit board has got 1 SCSI controller per socket, to enable them to be distinguished from each other? I can't see any SCSI ID dip switches or pins and jumper leads. Does it matter which disk goes into which socket? Or does AIX automatically recognise them wherever you put them? I have no idea! I can't even begin to contemplate plugging them into another AIX box that I'm borrowing just to list the contents of the tapes. I don't want to have to reboot that other box and mess it up. Can the SCSI tower be plugged in to a running AIX box? Or must the box be rebooted in order for me to read those disks? How do I read those disks to find out what's on them? What are they called?...etc.


 
Without having a map that tells exactly what physical disk is which, you've already worked out the best solution.

Remove all other disks from the system, plug in the new HD so it is the only disk on the system, and run mksysb from there. After the OS is restored (and the ensuing LVM mess cleaned up), add the other disks back into the system and import the volume groups.

Managing SCSI ids will be up to you. The best recommendation is to set the replacement drive to the same SCSI id as the dead drive, that way there should theoretically be no conflicts.

It is generally a bad idea to hotplug SCSI hardware.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top