Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

updated fixes to 5.3, now the system won't boot

Status
Not open for further replies.

peterbokunet

Technical User
Apr 8, 2003
67
US
I put a fresh build of 5.3 on a 43P (TCP05287) and all went well. I downloaded the latest fixes and that went well. It appeared to bosboot just fine, so I rebooted. Now it won't boot. I'll get the:

-------------------------------------------------------------------------------
Welcome to AIX.
boot image timestamp: 04:58 06/29
The current time and date: 14:14:33 06/29/2007
number of processors: 1 size of memory: 256MB
boot device: /pci@80000000/scsi@10/sd@0:2
kernel size: 10941966; 32 bit kernel
-------------------------------------------------------------------------------

and the next that happens is the firmware banner. This just loops.

No errors appear on the serial console and I can't see the system codes as it's in another state. I'm working at coordinating an onsite tech, but I'm not sure what my next step should be once he's in front of the machine.

Any ideas?
 
you could always boot of the CD and import rootvg to check things out.

did you verify you had the latest firmware for the host?
 
Trying booting into maintenance mode, or boot from installation media into repair mode, and see if anything obvious shows up.

By obvious, I would first check the following..
full filesystems
full volume group
damaged filesystems
not enough paging space
anything in the errpt
too much software starting up for the memory size
format errors in inittab
 
I thought the TCP05287 was the latest firmware -- it was required to get the system to load 5.3 initially. I'll check this again though.

When eyes/hands gets available I'll verify the obvious:
- see if there's a error code just as it resets
- boot from CD in single user, mount rootvg:
- look at the filesystem for signs of errors
or the other abnormalities
- check/increase paging space

...now if the onsite would just ring me back.
 
It would be much easier if you could provide the LED!

Regards,
Khalid
 
I agree, but the onsite guy wasn't overly helpful.

It would be helpful if the LEDs displayed on the console once the firmware posts and hands off to the kernel.

I'm going to drive there on Monday and I'll watch the LEDs. The onsite guy had trouble reading the numbers flipping by though he was able to get me the E1DC. He was able to get cd 1 and a fresh tape in their respective drives. I have the system booted single-user using the CDROM.

I didn't see anything significant in the config, space looked fine and so forth. I opted to back the system up to a fresh tape at this point and that is still verifying. I'll at least have the user data backed up.

 
The system's last post appears to be 0517. If memory serves, this is a volume group code. As the system has been using just one drive, I'd presume it shouldn't be a quorum issue. This means it must come just as it's about to mount /.

Looking around on hd4, I don't see any apparent reason for why root wouldn't mount. From single-user off the CDROM:

Code:
# df -k
Filesystem    1024-blocks      Free %Used    Iused %Iused Mounted on
/dev/ram0           65536     46632   29%     1484     5% /
/dev/cd0            65536     46632   29%     1484     5% /SPOT
/dev/hd4            65536     46632   29%     1484     5% /
/dev/hd2          4063232    271844   94%    30981     4% /usr
/dev/hd3            65536     46724   29%       92     1% /tmp
/dev/hd9var         65536     49176   25%      390     3% /var
/dev/hd1            65536     63404    4%       18     1% /home
/proc                   -         -    -         -     -  /proc
/dev/hd10opt       196608     24892   88%     6900    15% /opt
# lsvg -l rootvg
rootvg:
LV NAME             TYPE       LPs   PPs   PVs  LV STATE      MOUNT POINT
hd5                 boot       1     1     1    closed/syncd  N/A
hd6                 paging     16    16    1    open/syncd    N/A
hd8                 jfslog     1     1     1    open/syncd    N/A
hd4                 jfs        1     1     1    open/syncd    /
hd2                 jfs        62    62    1    open/syncd    /usr
hd9var              jfs        1     1     1    open/syncd    /var
hd3                 jfs        1     1     1    open/syncd    /tmp
hd1                 jfs        1     1     1    open/syncd    /home
hd10opt             jfs        3     3     1    open/syncd    /opt
#

I'm looking in the smit.log of the SUMA (Easy/Download All Latest Fixes) task. There's really nothing glaring about there being any problem at all through the update.

I guess I could look for what caused it -- but is there a good command beyond fsck to check the sanity of the volume group or make it think it's happy?


 
you can only run 53 on some of the 43P's ... what model is it ?
 
It's a 43P-150. I guess I never questioned it's ability to run 5.3. I guess I'll needing to be doing some research...
 

You have the latest firmware and AIX 5.3 support was added in the previous level so firmware is not the issue.

LED 517: Mounting client remote file system during network IPL.

So the hdisk boot is failing and it is looking for some other device to boot from, tries the network, fails and then loops back through the boot list.

Service mode boot from CD and import rotovg, then try lppchk -v, -c, -l, etc.
 
Similar thing happened to a couple of my servers a while back while upgrading from 5300-02 to 5300-05 if I'm not mistaken.

Some filesets were left "Broken" after a ML upgrade, the servers were unable to reboot (didn't even write a boot image correctly anymore).

I had to resort to re-installing the OS. nearly pulled an allnighter on that occasion.


HTH,

p5wizard
 
Code:
# lppchk -l
# lppchk -c
lppchk:  File /home/guest could not be located.
lppchk:  File /home/lost+found could not be located.
# lppchk -v
lppchk:  The following filesets need to be installed or corrected to bring
         the system to a consistent state:

  csm.core 1.6.0.0                        (not installed; requisite fileset)

# lppchk -f
lppchk:  File /home/guest could not be located.
lppchk:  File /home/lost+found could not be located.
# lslpp -l | grep csm.core
  csm.core                  1.4.1.10  COMMITTED  Cluster Systems Management
  csm.core                  1.4.1.10  COMMITTED  Cluster Systems Management
#
Okay, that was interesting and potentially fruitful. No, this isn't a part of a cluster, but it did report an anomaly. Trying to installp the 1.6.0.11 that's on disk, it fails pre-requisites including:
bos.mp 5.3.0.60 # Fileset Update
bos.mp 5.3.0.61 # Fileset Update
bos.mp 5.3.0.50 # Fileset Update
bos.mp 5.3.0.40 # Fileset Update
bos.sysmgt.serv_aid 5.3.0.50 # Fileset Update
bos.sysmgt.serv_aid 5.3.0.60 # Fileset Update
bos.sysmgt.serv_aid 5.3.0.61 # Fileset Update
bos.sysmgt.trace 5.3.0.50 # Fileset Update
bos.sysmgt.trace 5.3.0.60 # Fileset Update

None of these were included in the SUMA update. The 'mp' ones for seemingly obvious reasons.

I may just wipe the system and start fresh.

p5wizard: when you re-installed, did you again attempt to pull the system(s) up to ML05?

 
I reinstalled with more recent media, it was installed at 5300-05, and I only needed a CSP after that.


HTH,

p5wizard
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top