Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

SCO 5.0.5 and Fork failed msg 4

Status
Not open for further replies.

jasboy

IS-IT--Management
Jan 9, 2003
28
0
0
US
Hi,

I'm a fairly new at Unix and could use some help.

Our Sco box has library software on it (Unicorn) that keeps all our patron info, all the books and movies etc on it. Basically, it's our life blood at our library. About a week ago we started getting this message:

Fork Failed: Command (mail)
Possibly running out of swap
Use /etc/swap to add or list
available space

System error was: Resource temporarily unavailable


After running /etc/swap -l I get this:

/dev/swap dev swaplo blocks free
1,41 0 409600 409600


This system has run for the most part error free since May of 2000.

I have SMTP turned off (due to relay problems a while ago), but still use POP to email daily reports to 4 people.

I guess it could be just what it says and I need to add more swap space (although I'm not 100% sure how to it safely?). Or maybe sectors have gone out of the hard drive? Is there a scandisk like tool for SCO?

Thanks for any help

Jasboy
 
How much RAM have you ?
Have you tried to reboot (
Code:
init 6)
?

Hope This Help
PH.
 
PHV,

We have rebooted a number times.

Sorry I didn't post my specs earlier. P2 500 MHz 256 MB Ram and plenty of hard drive space.


Thanks
 
Anyway you have about 205 Mb swap for 256 Mb Ram, so the system will never dump on panic.
You may add Ram in the hope that system will then never need swap area.
Or you may add a swapfile (
Code:
man swap
, -a option).
Or you may reinstall OSR with a better divvy layout.

Hope This Help
PH.
 
PHV,

What do you mean by "system will never dump on panic"?

I just wonder why this just started? There is no way I want to reinstall, besides I couldn't even if I wanted to. That would mean we would have to shut down the library for a couple of days. Maybe the best option would be adding a swap file and hope the problem goes away.

This server is going to be replaced in a couple of months due to Sirsi (our library software) not supporting SCO anymore. So we just need a couple more months of life with this server. But these errors end up shuting down some of our library services and we have to restart them. Just becoming a pain.

Thanks for help,

jasboy
 
[tt]/dev/swap dev swaplo blocks free
1,41 0 409600 409600[/tt]

This swap -l output indicates that none of your swap space on disk is used, so I don't think you have a shortage, and adding more swap would not make a difference.

Have you checked space on the filesystems, and whether they have enough inodes free, using df?

How many processes are running? Use ps -e | wc -l to find out.

Annihilannic.
 
Annihilannic,

Here are my results from ps and df. Also, using df -v show that we have plenty of space on the hard drive. Could adding more memory possibly help?


# df -i
Mount Dir Filesystem iused ifree itotal %iused
/ /dev/root 63579 448421 512000 13%
/stand /dev/boot 17 11503 11520 1%
/s /dev/sirsi 39081 1459183 1498264 3%
/u /dev/user 6 758778 758784 1%
/v /dev/hyperion 594 1172910 1173504 1%

# ps -e | wc -l
156
#
 
What's the output of sar -r 5 10? (It'll take 50 seconds to generate...)

Annihilannic.
 
Here's the output.

# sar -r 5 10

SCO_SV spllibsv 3.2v5.0.5 i80386 10/08/2003

13:59:59 freemem freeswp availrmem availsmem (-r)
14:00:04 39029 4096000 60432 549194
14:00:09 39034 4096000 60426 549087
14:00:14 38867 4096000 60429 549035
14:00:19 38968 4096000 60429 549035
14:00:24 38979 4096000 60429 549035
14:00:29 38979 4096000 60429 549035
14:00:34 38981 4096000 60431 549067
14:00:39 38937 4096000 60431 549066
14:00:44 39005 4096000 60431 549066
14:00:49 39005 4096000 60431 549066

Average 38979 4096000 60430 549069


Thanks for the help!

jasboy
 
Hmm, I'm running out of ideas. Sounds like you have plenty of memory free too.

Is CPU usage fine too? sar 5 10 to see that.

How about ipcs?

Annihilannic.
 
Everything (from what I can tell, not knowing much about unix) seems to look ok. I just hope we can at least limp along til we get our new server.


# sar 5 10

SCO_SV spllibsv 3.2v5.0.5 i80386 10/08/2003

17:03:51 %usr %sys %wio %idle (-u)
17:03:56 0 1 0 99
17:04:01 0 1 1 98
17:04:06 1 0 14 85
17:04:11 1 1 9 89
17:04:16 0 0 0 100
17:04:21 0 0 1 99
17:04:26 1 2 33 64
17:04:31 6 4 85 5
17:04:36 13 11 37 39
17:04:41 12 3 18 67

Average 4 2 20 74
# ipcs
IPC status from /dev/kmem as of Wed Oct 8 17:09:12 2003
T ID KEY MODE OWNER GROUP
Message Queues:
q 500 0x000005dc -Rrw------- sirsi staff
q 501 0x000005dd -Rrw------- sirsi staff
q 502 0x00000000 -Rrw------- sirsi staff
q 22503 0x00000601 --rw------- sirsi staff
q 4 0x00000000 --rw------- sirsi staff
q 6005 0x00000000 -Rrw------- sirsi staff
q 6006 0x00000000 -Rrw------- sirsi staff
q 6007 0x00000000 -Rrw------- sirsi staff
q 6008 0x00000000 -Rrw------- sirsi staff
q 6009 0x00000000 -Rrw------- sirsi staff
q 6010 0x00000000 -Rrw------- sirsi staff
q 6011 0x00000000 -Rrw------- sirsi staff
q 9012 0x00000000 -Rrw------- sirsi staff
q 7013 0x00000000 -Rrw------- sirsi staff
q 6014 0x00000000 -Rrw------- sirsi staff
q 15015 0x00000000 -Rrw------- sirsi staff
q 6516 0x00000000 -Rrw------- sirsi staff
q 6517 0x00000000 -Rrw------- sirsi staff
q 6018 0x00000000 -Rrw------- sirsi staff
q 6019 0x00000000 -Rrw------- sirsi staff
q 6020 0x00000000 -Rrw------- sirsi staff
q 22021 0x0000064d --rw------- sirsi staff
q 17022 0x00000000 --rw------- sirsi staff
q 6523 0x00000000 -Rrw------- sirsi staff
q 6524 0x00000000 -Rrw------- sirsi staff
q 6025 0x00000000 -Rrw------- sirsi staff
q 6026 0x00000000 -Rrw------- sirsi staff
q 6027 0x00000000 -Rrw------- sirsi staff
q 6028 0x00000000 -Rrw------- sirsi staff
q 6029 0x00000000 -Rrw------- sirsi staff
q 10530 0x00000000 -Rrw------- sirsi staff
q 26031 0x00000000 -Rrw------- sirsi staff
q 20032 0x00000000 -Rrw------- sirsi staff
q 6033 0x00000600 --rw------- sirsi staff
q 17034 0x00000604 --rw------- sirsi staff
q 11535 0x00000603 --rw------- sirsi staff
q 27036 0x00000602 --rw------- sirsi staff
q 37 0x000005fe --rw------- sirsi staff
q 14038 0x00000632 --rw------- sirsi staff
q 11039 0x000005ff --rw------- sirsi staff
q 21540 0x0000064e --rw------- sirsi staff
q 31541 0x00000624 --rw------- sirsi staff
q 10542 0x00000000 --rw------- sirsi staff
q 7543 0x0000064f --rw------- sirsi staff
q 30544 0x0000062b --rw------- sirsi staff
q 21045 0x00000605 --rw------- sirsi staff
q 25046 0x00000000 --rw------- sirsi staff
q 22547 0x00000653 --rw------- sirsi staff
q 2048 0x00000651 --rw------- sirsi staff
q 27049 0x00000000 --rw------- sirsi staff
q 13050 0x00000660 --rw------- sirsi staff
q 10551 0x00000000 --rw------- sirsi staff
q 32552 0x00000000 --rw------- sirsi staff
q 10553 0x00000000 --rw------- sirsi staff
q 21554 0x00000000 --rw------- sirsi staff
q 15055 0x00000657 --rw------- sirsi staff
q 23056 0x00000000 --rw------- sirsi staff
q 17557 0x00000000 --rw------- sirsi staff
q 23558 0x00000000 --rw------- sirsi staff
q 6559 0x00000000 --rw------- sirsi staff
q 29560 0x00000000 --rw------- sirsi staff
q 9061 0x00000000 --rw------- sirsi staff
q 21062 0x00000000 --rw------- sirsi staff
q 32063 0x00000000 --rw------- sirsi staff
q 1565 0x00000000 -Rrw------- sirsi staff
q 28566 0x00000000 -Rrw------- sirsi staff
q 31067 0x000005e9 --rw------- sirsi staff
q 2068 0x00000000 -Rrw------- sirsi staff
q 6569 0x00000650 --rw------- sirsi staff
q 20070 0x000005e2 --rw------- sirsi staff
q 15071 0x00000000 -Rrw------- sirsi staff
q 23072 0x00000000 -Rrw------- sirsi staff
q 25073 0x00000000 -Rrw------- sirsi staff
q 29574 0x000005eb --rw------- sirsi staff
q 1575 0x00000656 --rw------- sirsi staff
q 11076 0x00000000 --rw------- sirsi staff
q 17077 0x00000661 --rw------- sirsi staff
q 29078 0x00000000 -Rrw------- sirsi staff
q 24579 0x00000662 --rw------- sirsi staff
q 21580 0x00000000 -Rrw------- sirsi staff
q 31081 0x00000658 --rw------- sirsi staff
q 22582 0x00000000 -Rrw------- sirsi staff
q 19083 0x00000663 --rw------- sirsi staff
q 29084 0x00000000 --rw------- sirsi staff
q 27085 0x00000000 -Rrw------- sirsi staff
q 18086 0x00000664 --rw------- sirsi staff
q 23087 0x00000000 -Rrw------- sirsi staff
q 3588 0x00000665 --rw------- sirsi staff
q 32089 0x00000000 --rw------- sirsi staff
q 18090 0x000005e1 --rw------- sirsi staff
q 20091 0x00000000 -Rrw------- sirsi staff
q 21592 0x00000655 --rw------- sirsi staff
Shared Memory:
m 0 0x000018e5 --rw-rw-rw- root sys
Semaphores:
s 0 0x002ac048 --ra------- root sys
s 1 0x000018e5 --ra-ra-ra- root sys
#

Thanks for the help!

jasboy
 
could it be that RAM has developed some problem. if u can reboot the system, check from any utitliy like norton for extensive ram check.

normal boot time post check is not extensive

by the way what is meant by "availrmem availsmem (-r)"
in sar -r check

-------------------
[bigglasses]

ur feed back is a welcome desire
 
jasboy, there seem to be an awful lot of message queues there.

If you stop your library software, do they all go away?

You may want to monitor the number of these used through the day and see if peaks correspond with the times when you run into this problem. I'm not sure what the maximum number available is (probably the value of kernel parameter MSGMAP, which defaults to 512).

If restarting the library application fixes the problem you may consider adding a cron scheduled job to restart it every night.

Annihilannic.
 
Well, this system gets used pretty much 24/7 so it's hard to do much testing (shutting off library software) on this system. I guess I will try to run some of those commands when we get those errors. Also, we are going to replace and upgrade our RAM, just to rule that out.


Thanks for all your help!


jasboy
 
You may have a ram problem but you don't need to add any.
you have more than 128 mb of you ram available (39000 x 4K pages).

q. is everyone logged in as the same user? (sirsi)
(what is the value of NPROC in your /etc/conf/cf.d/stune file?)
q. what changed "a week or so ago"

q. what are the values of SEMMMI SEMMNU SEMMSL SEMMNS
 
here is the stune file.

TERM = (ansi)
Terminal type is ansi
# cat /etc/conf/cf.d/stune
NODE "spllibsv"
NSPTTYS 32
EVQUEUES 40
EVDEVS 48
NUMSP 256
NSTREAM 1152
NHINODE 1024
TTHOG 4096
NCLIST 512
NSTREVENT 2304
NUMTIM 304
NUMTRW 304
SECLUID 0
SECSTOPIO 1
SECCLEARID 1
MSGTQL 1500
MSGSSZ 64
MSGMNI 500
#


I'm not quite sure how to answer the question about everyone logging in as the same user. I guess I would have to assume they do. We have client side software that we point to an ip address of the SCO server. The Sirsi software is installed on the SCO box, so when they log in they are actually logging into the Sirsi software (circ, reftech, etc...). There is only 3 users for SCO, root, sirsi, and mmdf. We are always logged in as sirsi.

Two different times this year (March 30, and August 1) we got this error:
eeE: Watchdog ReInitMemory 6 for board 0WARNING: eeE: Allocb failure in ReInitMemory

After rebooting the machine the error went away and we haven't had a problem since, until now.

A week or so ago, Sirsi (the company) logged into our SCO box and installed SIPP2 protocol (an upgrade from SIPP1 with added features for self checks, etc..), and we immediately started having these problems. So they unistalled SIPP2 and we still have the problem. They claim it has nothing to do with the installation of SIPP2 and that it is a "system problem", so we're SOL. It may be a system problem and have nothing to do with SIPP2, but maybe that was final straw?

I have no idea what "SEMMMI SEMMNU SEMMSL SEMMNS" are, but I will try to look that up in my book.

Again, thank you all for the help!

jasboy
 
Those watchdog reinit errors are being reported by the eeE driver, which is for the Intel EtherExpress network cards.

We had similar problems which were resolved by upgrading to version 5.0.5f of the driver.

They used to be here, but... SCO/Caldera's FTP site isn't what it used to be, I find.

ftp://ftp.caldera.com/pub/openserver5/drivers/OSR505/network/

Alternatively try Intel's site, you'll usually find more up-to-date drivers there anyway:


Annihilannic.
 
Try to increase the MAXUP tunable parameter:
Code:
scoadmin -> Hardware/Kernel Manager -> Tune Parameters...
7 User and group configuration
You have to relink the kernel after the change.

Hope This Help
PH.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top