Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

BOSBOOT: Error in STRIP

Status
Not open for further replies.

szewczykm

MIS
Jul 31, 2002
37
US
This is an extention of my "Broke my 7013" thread. I've done so many things to fix this system (Short of a MKSYSB restore, it's old and management really doesn't want to do it)

Recap, I installed some drivers and now my system gets to

{299}

Then gives me a flashing 888 102 700 0c8.

No, it doesn't got any further than 299.

So I've restored just the /use/lib/boot directory which restored the kernel, etc. I still fail on boot. After another reboot, I did a "bosboot -a -d /dev/hdisk0" and, like all the other times I ran bosboot, got this in the middle:

STRIP: illegal option --K

then it continues and tells me the size of the boot image in 512 blocks. There's no hard error. No "FAILURE! DON'T REBOOT UNTIL YOU FIX THE BOSBOOT PROBLEM!!!"

It finished normally. I'm wondering if this is my problem. This was an initial suspect, but not as strong as others. Having eliminated the other suspects, I'm now onto this.

I've found the source for BOSBOOT (egad!) and sifted through it. Of course, there are some strip commands inside it. I'm wondering if it's creating the boot image OK, but what's in the boot image is messed up because something wasn't stripped.

I don't think it's a coincidence that Kernel starts with a "K". I'm wondering if my BOSBOOT command is screwy.

I'm restoring it from my old MKSYSB tape now to try it out, but in the mean time:

Does anyone know of any issues with BOSBOOT on AIX 4.2.0.0? Are there other files that BOSBOOT uses that may be screwing up the command? I'm certian my syntax is correct, I've seen it a zillion different places and I'm certain that hdisk0 is the correct device.
 
use `lslv -m hd5` or `lslv -l hd5` and use the disk listed first or on top for example:

(spcws4)/spdata/sys1/install/#lslv -l hd5
hd5:N/A
PV COPIES IN BAND DISTRIBUTION
hdisk0 001:000:000 100% 001:000:000:000:000
hdisk1 001:000:000 100% 001:000:000:000:000

(spcws4)/spdata/sys1/install/#lslv -m hd5
hd5:N/A
LP PP1 PV1 PP2 PV2 PP3 PV3
0001 0001 hdisk0 0001 hdisk1
 
From the man page for bosboot:

-k Allows to specify an alternate kernel file. If this flag is not specified,
/unix is the default.

I am wondering if you don't have a /unix file so it cannot build the image.
 
I'm certain that I have a /unix file. I've tried using the -k to specify the unix file, to make sure it wasn't missing anything.

But it's not BOSBOOT that's throwing the error. It's STRIP which is called in the BOSBOOT routine:

# Function state_func4
state_func4 () {
#
# Strip required libraries
#
trap 'error_func 0' 1 2 15
cwd=`pwd`
cd /tmp
#
# Housekeep and strip libraries.
#
for blibs in libodm.a liblvm.a libcfg.a libbsd.a libsrc.a librpcsvc.a libs.a
do
$rm -f $blibs
$ar x /usr/lib/$blibs shr.o
$strip shr.o >/dev/null
$ar cq $blibs shr.o
$rm -f shr.o
done
cd $cwd
} # End of state_func4

It does this in a couple of different places. Obviously bosboot has run before. I wonder if there's a problem in maintenence mode, or because I've booted from CD, or something that's making a variable not work correctly. there's no doubt that the command is bosboot -a -d /dev/hdisko.

It doesn't end with errors, except for this STRIP error in the middle. It's quite odd.
 
There would not be any problem that I know of creating a new image from maintenance mode, but I never use -k because I just let it default, which is /unix.

I assume that it is a typo in your command where you listed `bosboot -ad /dev/hdisko` (which you have as "oh", but should be a zero) `bosboot -ad /dev/hdisk0`

A bosboot more than likely was performed when you installed the filesets for SSA, and that probably damaged the boot image.
 
From the man page for strip:

-- (Double hyphen) Interprets all arguments following this flag as file names.
This allows you to strip files whose names start with a hyphen.

I am wondering if the --k you get is extracted from the above man page. Where it is using k as a filename. What syntax did you use?
 
Try this:

rmlv hd5
chpv -c hdisk# --> for ALL disks
-c clears boot record of given pv pv
mkboot -cd /dev/hdisk# --> ..use this if the above (chpv)
command fails
mklv -y hd5 -t boot -a e rootvg 1 hdisk#
bosboot -ad /dev/hdisk#
sync;sync;sync
reboot
 
OK, thanks much for the suggestions. so far I still haven't had success.

I tried both suggestions and it all worked as advertised, but I still get the 888 102 700 0c8 right after the {299}.

Here is exactly what happens:

#bosboot -a -d /dev/hdisk0
strip: illegal option -- K
strip: -- Usage: strip [-V] {-l[-r|-x]|-r|-t|-x|-H} File ...

bosboot: Boot image is 5571 512 byte blocks.
#

There are pauses between the bosboot -a, etc and the strip: message. Also, there's a long pause between the strip: message and the bosboot: message.

It's actually doing something...

Can you provide me with the "set" list for your root account? Maybe there's an environmental variable that isn't being set in maint mode?

There are a few things that don't quite make sense. If I boot from tape, I can't mount the CDROM. If I boot from CDROM, the /dev/cd0 now appears. I mount it to the /SPOT directory and the /SPOT directory is empty...

My guess is that there are some things that need to be defined correctly for bosboot to work as advertised. Using maintenance mode isn't doing it.

Is there a way to load the root .profile after I've logged on?

I finally got smit working by adding an export statement. just setting TERM=ibm3151 didn't work. I had to export TERM=ibm3151. No one was able to give me this info until i stumbled across it looking for something else.

I'm wondering if there isn't the exact same thing going on here with teh bosboot command. Some variable isn't being set, exported, whatever to make the shell variables complete enough for bosboot to work correctly.

Any ideas on that?

BTW: Thanks so much for your attention to this so far.
 
Not fixed yet, but I did a cat bosboot and looked at the one actually running on my system, not one I found online. Lo and behold:

strip -Kernel $skernel > /dev/null

So, strip -Kernel is interpreted as strip -K.

Now, in what world is it ok to use strip -Kernel and in what world isn't it OK?
 
I went into the command, removed the -Kernel and now the boot image is smaller when it's done. omething is being stripped.

The command ran without error.

I rebooted and got the same error.

So I'm going to try AIXSPAdmin's suggestion again (I'm assuming it's killing all boot info and boot devices and rebuilding them) but following it up with my modified bosboot.

If this doesn't work, I think I'm at the end of that path. I haven't found anything yet that says 4.2.0.0 bosboot is faulty. Does anyone know of a repository of release notes for AIX?

Can I look up all the bug fixes between versions?

I mean, it had to work at least once right? They installed the OS and the drivers at least one time and made a boot image. It can't be completely broken, can it?
 
I believe this is your problem:

#bosboot -a -d /dev/hdisk0
strip: illegal option -- K

And in my earlier post from the man page for strip:

-- (Double hyphen) Interprets all arguments following this flag as file names.
This allows you to strip files whose names start with a hyphen.

So it seems that when you do the bosboot it is interpreting the "K" as a filename following -- although you just want to create a new image from /unix the default.

Do try to remove hd5 the boot logical volume and recreate it. The chpv -c wil clear a boot record from the pv if it exists, which should be done.
 
Here's another clue. I may have to drop this path I'm on for something else because I'm not making any progress here.

I did a shutdown and at the end of the shutdown I got the same 888 error.

No parameters. Just "shutdown".

Can you help me understand the significance of the:

/dev/hd1....6.. etc..

/dev/hdisk0.....3 etc..

/dev/rhd1.... etc

/dev/rhdisk??

Is it possible that the place I'm trying to write the boot image is damaged? If I do a "mount" I see /dev/hd1 {2..etc} mounted to different directories. /dev/hd5 is mounted to nothing. Can I do an "fsck" on hd5 somehow? Is it possible that I've got a corrupt disk?

I have 4 physical scsi disks in my 7013. None are flashing defunct. But maybe I've gotten corruption on one of the disks that isn't allowing the boot image to be written correctly?

I'm rebooting now so I'll check SMIT when it comes back online. I'll check DIAG as well. But how does "hd5" relate to hdisk0 ?

Is the hdisk# the physical device and the hd# a logical volume (partition?) on the device?

Any tips for going through the devices looking for something that may be messing with my boot partition? Maybe I should move hd5 of to some other disk?
 
hd5 is the boot logical volume and is on any disk that is bootable, if you have rootvg on hdisk0 only then hd5 is only on hdisk0, but if rootvg is mirrored to say hdisk1, then you have 2 BLVs - 1 each on hdisk0 & hdisk1. The LV has no mount point because it is only used for booting and has a state of closed/synced.

If you do an lslv -l hd5 or lslv -m hd5 then you will see which disks are bootable, for example on one of my nodes:
(x984)/usr/lpp/xlC/lib#lslv -l hd5
hd5:N/A
PV COPIES IN BAND DISTRIBUTION
hdisk0 001:000:000 100% 001:000:000:000:000
hdisk1 001:000:000 100% 001:000:000:000:000
(x984)/usr/lpp/xlC/lib#lslv -m hd5
hd5:N/A
LP PP1 PV1 PP2 PV2 PP3 PV3
0001 0001 hdisk0 0001 hdisk1

In the case above, hdisk1 is the mirrored copy. If you are booting off hdisk0 first, then you can break the mirror and remove hdisk0 and only have hdisk1 to boot from. If you do that you need to clear the boot copy from hdisk0 with `chpv -c hdisk0`. But since it is mirrored, whatever happened on hdisk0 probably happened to hdisk1. Have you tried to remove and make hd5 using the steps I outlined above?

I don't know what you mean with "/dev/hd1....6.. etc"; what is the ...6? hd1 is the logical volume for /home.
The rhdisk special file provides raw I/O access and control functions to physical-disk device drivers for physical disks. The /dev/hdisk block special files are reserved for system use in managing file systems, paging devices and logical volumes.
 
I have tried your suggesting with recreating the BLV. I get the same results. I can destroy and recreate hd5 with no errors but I still get the same 888 on bootup.

This is really killing me. I'm beginning to wonder if this isn't a hardware problem after all. Nothing on bootup identifies a hardware problem.

I can read and write to hdisk0 with no problem. (BTW, it's the only one in the "lslv -l hd5" list.

All the disks (hdisk1, 2, and 3) all have logical volumes (lv1, lv2, lv3). It looks as if everything resides on hdisk0. I'm wondering if maybe hdisk1-3 are empty.

I tried determining that through SMIT but haven't been able to as of yet. Maybe I can move hd5 out to another disk?

I'm really fried on this project. (VENT) I'm Windows/Intel guy. I've been off my "regular" assignement for over two weeks because I've got some Unix experience. Don't get me wrong, this has been an excellent learning experience, but I don't know what else to do.

You've got to remember, there's an application on this machine, and data that goes with it. That's what we're trying to get to. I'd love to whipe it out and start over, but then my next problem will be restoring the darned application and data. There are even fewer people using that application then those on AIX 4.2. I'll have no help at all on that one.

So what the heck do I do?

It's an RS-6000 Model 7013. PON tests all execute fine. Once we get to {299} I hit an 888 102 700 0c8.

We've rebuilt the boot image, the BLV. I've done a ton of different things but that damn thing won't boot!!!!!

How do I fix this thing!?


(End Vent)

 
I would say it goes back to when the drivers were loaded for SSA. SSA needs to have the correct microcode level or you will have problems.

You can only have hd5 on a disk that is mirrored for rootvg.

The 888-102-700-08c is usually a kernel panic or trap.

If your data resides on another volume group, then reload the operating system and import the data volume group. If you don't have data on rootvg but the apps are on say datavg. The level of AIX you are on is not supported any longer by IBM, so you cannot even get help from them for this problem. I would do the new OS load.
 
How do I tell if there is mirroring going on with the other hdisk's?

I found:

Error Description

InfoExplorer article "How to Mirror rootvg for Maximum Operating
System Availability" should advise the user that whenever the
boot lv hd5 is updated it is the users responsibility to update
the secondary boot lv (hd5x). This is done via bosboot, and
will need to be done any time hd5 is rebuilt using bosboot.

And this is specific to AIX 4.2.0

Problem Summar y

The bosboot command doesn't allow a user to specify a target
disk where a mirror of the boot logical volume resides. In
other words, if hd5 is the boot logical volume, and hd5
resides on hdisk0 and it is mirrored on hdisk1, the use can
specify hdisk0 to the bosboot command, but the bosboot
command will not allow the user to specify hdisk1, even
though
a mirror of hd5 is on hdisk1.

 
do an `lsvg -l rootvg`, the PP column will be double the LP column if it has 2 copies or tripled if 3 copies. This will tell you if it is mirrored.
 
Hmmm... I decided to make some backups of my system in it's current state. I told my management that I've lost hope of being able to fix the system without doing something that's initially destructive.

(System rebuild from tape, or re-install OS)

So I'm making a MKSYSB tape and a regular file level backup. During the MKSYSB I got the same "strip" error, which makes sense. It uses bosboot to make a boot image on the tape just like it does on the disk.

I'm going to reboot the system with the new mksysb tape. If I get the 888 error then I know it has nothing to do with my disk hardware at all and there's something messed up with my current environment.

So... that means that if I can use a different environment to create the boot image I may be able to save this thing. The question is, how?

Can I use the files on my working mksysb tape to make a boot image without restoring the tape over the current setup?

Can I do it with the CD I have? Use all those executables, kernel, etc, to make my boot image?

Or, I restored the /usr/lib/boot directory from my mksysb tape once. Maybe I missed another critical directory? Do you know what directories are used in making a boot image?

Where are all the drivers stored? Are there other components?
 
Re-install a current OS that is supported by IBM. It won't affect any data if that data is not on the root volume group (rootvg). It will only take a couple of hours to install the OS, which is a lot less than you have spent already trying to fix something you are not going to be able to fix, hence the reason for having a current backup; because there are those times when something severe happens and it cannot be repaired.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top