Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

SCO reboots every two weeks 2

Status
Not open for further replies.

damccasl

Technical User
Oct 10, 2002
9
0
0
US
We have had an issue with a server that has recently begun spontaneously rebooting every two weeks. I cannot find anything in any logs to indicate why and it does not appear to be shutdown in a disorderly manner. The hardware is an HP TC4100 and are running Informix OLS. We currently have 5 other machines out there with the same config and no issues with them. Any ideas?
 
If it comes back up clean I would suspect something in cron is doing it. You might want to look in all possible crontab scripts.

Ed Fair
Give the wrong symptoms, get the wrong solutions.
 
Ed,
Thanks for the reply...
here are our current scheduled jobs
(sorry about the format)

#
17 5 * * 0 /etc/cleanup > /dev/null
0 2 * * 0,4 /usr/lib/cron/logchecker
0 1 * * * /usr/bin/calendar -
3 3 * * * /usr/lib/cleantmp > /dev/null
1 3 * * * /etc/setclk -rd1800 > /dev/null 2>&1
5 18 * * 1-5 /usr/lib/sa/sa2 -s 8:00 -e 18:01 -i 1200 -A
0 4 * * 0 /etc/custom -V symlinks;# CUSTOM_SYMLINK_REPORT
0 0 * * 1-5 scosh cronsched -r
0 0 * * 1 scosh cronsched -wr
00,05,10,15,20,25,30,35,40,45,50,55 * * * * /users/uscss/bin/DailyEdi
30 23 * * * /users/sac500/sql/nocard
~
 
You probably have an email at root showing the autoboot times. Does anything in your crontab match the time it rebooted?

Does it reboot at the same day and time , or do the alternate times match date and time. If it is cron related they should be some matches.

Ed Fair
Give the wrong symptoms, get the wrong solutions.
 
I went back and found what you suggested...so it appears that the shutdown is not a controlled one...
System autoboot at Sat Oct 11 02:41:15 MST 2003
Mounted /stand filesystem
fsstat: /dev/users okay
Mounted /users filesystem
fsstat: /dev/images okay
Mounted /images filesystem
fsstat: /dev/temp okay
Mounted /temp filesystem
fsstat: /dev/spool okay
Mounted /spool filesystem

/dev/tmp
HTFS File System: tmp Volume: tmp

** Phase 1 - Check Blocks and Sizes
DANGER: Filesystem being checked is larger than the device in which it is
stored (/dev/tmp). The filesystem is 4184901K while the
device is 4176900K. Backup filesystem and recreate as soon as possible.
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Synchronous Write Log
** Phase 6 - Check Free List Bitmap

CANNOT READ: BLK 8355260
CONTINUE? yes

THE FOLLOWING DISK SECTORS COULD NOT BE READ: 8355260, 8355261,
disk error reading logical block 4177630
Failed to mount /tmp filesystem

/dev/root
HTFS File System: Volume:

** Root file system
NO PARTIAL TRANSACTIONS PENDING
FILE SYSTEM STATE SET TO OKAY
34864 files 361724 blocks 15831749 free

*** ROOT FILE SYSTEM WAS MODIFIED ***

*** ROOT REMOUNTED MODIFIED ***

/dev/tmp
HTFS File System: tmp Volume: tmp

** Phase 1 - Check Blocks and Sizes
DANGER: Filesystem being checked is larger than the device in which it is
stored (/dev/tmp). The filesystem is 4184901K while the
device is 4176900K. Backup filesystem and recreate as soon as possible.
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Synchronous Write Log
** Phase 6 - Check Free List Bitmap

CANNOT READ: BLK 8355260
CONTINUE? yes

THE FOLLOWING DISK SECTORS COULD NOT BE READ: 8355260, 8355261,
disk error reading logical block 4177630
 
What is the value of PANICBOOT in your /etc/default/boot file ?

Hope This Help
PH.
 
#ScoAdminInit BOOTMNT {RO RW NO} RO
#
DEFBOOTSTR=hd(40)unix swap=hd(41) dump=hd(41) root=hd(42)
AUTOBOOT=YES
FSCKFIX=YES
MULTIUSER=YES
PANICBOOT=NO
MAPKEY=YES
SERIAL8=NO
 
This is now beyond me. But I would look in /tmp filesystem to see what could be removed out of it to get the size under 90% or so. And I would monitor the filesystem sizes as time goes on until it crashes again.
I don't know the ramifications of somethig storing temp files until the filesystem is full, but it can't be healthy. There may also be a cron section not clearing temporary stuff. You could change the reporting of the clear operation to see if anything is coming up with an error.

Ed Fair
Give the wrong symptoms, get the wrong solutions.
 
Thanks for the additional replies guys...
I'm going to try to remove some items from /tmp to get it shrunk a bit. Seems kind of weird that Scoadmin says that the /tmpFS isnt mounted but I can still use it. UNIX noob here.
 
What is the output of the
Code:
 df -kvi
command ?
Seems that you have to recreate the tmp division with divvy.
 
I do believe you are right...how do I go about doing that?
Do divvy and delete it ad recreate it?

# df -kvi
Mount Dir Filesystem blocks used free %used iused ifree %iused
/ /dev/root 16715847 873390 15842457 6% 34860 4144108
1%
/stand /dev/boot 20000 9244 10756 47% 16 4992 1%
/users /dev/users 4184901 300373 3884528 8% 782 1045450 1%

/images /dev/images 4184901 193898 3991003 5% 275 1045957 1%

/temp /dev/temp 4176900 131074 4045826 4% 5 1044227 1%

/spool /dev/spool 28161945 883456 27278489 4% 5 7040483
1%
 
In other filesystems on earlier versions the /tmp if it were a separate filesystem would have a device name /dev/whatever and it would mount at root/tmp which is going to be a directory that shows whether the filesystem exists or not.
So your /tmp is probably a directory under the root. And if you were just stuffing stuff there as temporary storage you wouldn't know there was a problem.
I probably would try to find the mount point on your Emergency boot disk set and mount it to the floppy to do the cleanout, but you can do fsck on it from the floppy first to get it in order.

This isn't to tell you how, just some explanation that you may be able to use. Good luck.

Ed Fair
Give the wrong symptoms, get the wrong solutions.
 
Have you really two filesystems named tmp and temp of equal size (4176900 K) ?
The result of the df shows you that /dev/tmp is NOT mounted, so the actual data in /tmp are in the root FS.
If your tmp FS is on the same disk as the root one, simply run
Code:
 divvy
without args and create a new FS with the number corresponding to the tmp name, then quit and install
Otherwise do a
Code:
 man divvy
and a
Code:
 man HW hd

Hope This Help
PH.
 
PHV,
don't want to disagree, but he evidently has a /tmp on the hard drive that isn't mounting because it is too full. Once he clears the error it should mount correctly. The problem is going to be cleaning it out.
mount is reporting it, df isn't reporting it. So I would suggest creating it again with the start and end at the same places and as a new filesystem so it formats.
Then reboot and see what happens.

Ed Fair
Give the wrong symptoms, get the wrong solutions.
 
thi rebooting also takes place beacue of problem in smps. if power good line is not giving the correct output, the systme reboots depending on the bios setting for rebooting.

check that also

besides his root is only 6% used

-----------------------------------------------------
Mount Dir Filesystem blocks used free %used iused ifree %iused
/ /dev/root 16715847 873390 15842457 6% 34860 4144108
1%
-----------------------------------------------------
so even if /dev/tmp is not mounting then would it make a difference. any guesses ?

[ponder]
--------------------
ur feedback is a welcome desire
 
Is this machine plugged into a UPS? I have had APC UPS units reboot like this without warning. I've seen it happen every two weeks when the UPS runs a self-check. The problem is the battery is dead and the UPS tries to run from the battery which causes the power to drop off momentarily.

In fact, I've had this happen so many times with APC units that I will not be buying their equipment again.

-Jeff
 
Thanks for all of the help guys...I believe the rebooting was in fact the UPS, as the site guys confirmed that the batteries are indeed bad. We are also going to repair the /temp as time permits.
Thanks again
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top