Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

server crashes between 3am and 5am every morning

Status
Not open for further replies.

linuxMaestro

Instructor
Jan 12, 2004
183
US
My server crashes between 3am and 5am every morning, in the error log it says the following right before the crash:

Dec 5 04:43:06 server101 kernel:
Dec 5 04:43:06 server101 kernel: Free pages: 5496kB (3264kB HighMem)
Dec 5 04:43:06 server101 kernel: Active:23703 inactive:2330 dirty:4 writeback:5 unstable:0 free:1374 slab:5324 mapped:20824 pagetables:1491
Dec 5 04:43:06 server101 kernel: DMA free:240kB min:16kB low:32kB high:48kB active:0kB inactive:0kB present:16384kB
Dec 5 04:43:06 server101 kernel: protections[]: 8 476 540
Dec 5 04:43:06 server101 kernel: Normal free:1992kB min:936kB low:1872kB high:2808kB active:464kB inactive:464kB present:901120kB
Dec 5 04:43:06 server101 kernel: protections[]: 0 468 532
Dec 5 04:43:06 server101 kernel: HighMem free:3264kB min:128kB low:256kB high:384kB active:94348kB inactive:8856kB present:114624kB
Dec 5 04:43:06 server101 kernel: protections[]: 0 0 64
Dec 5 04:43:06 server101 kernel: DMA: 60*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 240kB
Dec 5 04:43:06 server101 kernel: Normal: 494*4kB 2*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1992kB
Dec 5 04:43:06 server101 kernel: HighMem: 380*4kB 134*8kB 28*16kB 1*32kB 1*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3264kB
Dec 5 04:43:06 server101 kernel: Swap cache: add 1037089, delete 1026852, find 824732/1012991, race 3+2429
Dec 5 04:43:06 server101 kernel: Out of Memory: Killed process 30651 (exim).
Dec 5 04:43:06 server101 kernel: oom-killer: gfp_mask=0xd0
Dec 5 04:43:06 server101 kernel: DMA per-cpu:
Dec 5 04:43:06 server101 kernel: cpu 0 hot: low 2, high 6, batch 1
Dec 5 04:43:06 server101 kernel: cpu 0 cold: low 0, high 2, batch 1
Dec 5 04:43:06 server101 kernel: cpu 1 hot: low 2, high 6, batch 1
Dec 5 04:43:06 server101 kernel: cpu 1 cold: low 0, high 2, batch 1
Dec 5 04:43:06 server101 kernel: Normal per-cpu:
Dec 5 04:43:06 server101 kernel: cpu 0 hot: low 32, high 96, batch 16
Dec 5 04:43:06 server101 kernel: cpu 0 cold: low 0, high 32, batch 16
Dec 5 04:43:06 server101 kernel: cpu 1 hot: low 32, high 96, batch 16
Dec 5 04:43:06 server101 kernel: cpu 1 cold: low 0, high 32, batch 16
Dec 5 04:43:06 server101 kernel: HighMem per-cpu:
Dec 5 04:43:06 server101 kernel: cpu 0 hot: low 12, high 36, batch 6
Dec 5 04:43:06 server101 kernel: cpu 0 cold: low 0, high 12, batch 6
Dec 5 04:43:06 server101 kernel: cpu 1 hot: low 12, high 36, batch 6
Dec 5 04:43:06 server101 kernel: cpu 1 cold: low 0, high 12, batch 6
Dec 5 04:43:06 server101 kernel:
Dec 5 04:43:06 server101 kernel: Free pages: 4312kB (2160kB HighMem)
Dec 5 04:43:06 server101 kernel: Active:25095 inactive:1164 dirty:0 writeback:32 unstable:0 free:1078 slab:5359 mapped:21698 pagetables:1523
Dec 5 04:43:06 server101 kernel: DMA free:240kB min:16kB low:32kB high:48kB active:0kB inactive:0kB present:16384kB
Dec 5 04:43:06 server101 kernel: protections[]: 8 476 540
Dec 5 04:43:06 server101 kernel: Normal free:1912kB min:936kB low:1872kB high:2808kB active:396kB inactive:524kB present:901120kB
Dec 5 04:43:06 server101 kernel: protections[]: 0 468 532
Dec 5 04:43:06 server101 kernel: HighMem free:2160kB min:128kB low:256kB high:384kB active:99984kB inactive:4132kB present:114624kB
Dec 5 04:43:06 server101 kernel: protections[]: 0 0 64
Dec 5 04:43:06 server101 kernel: DMA: 60*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 240kB
Dec 5 04:43:06 server101 kernel: Normal: 462*4kB 8*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1912kB
Dec 5 04:43:06 server101 kernel: HighMem: 392*4kB 44*8kB 1*16kB 1*32kB 1*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 2160kB
Dec 5 04:43:06 server101 kernel: Swap cache: add 1037572, delete 1027816, find 824888/1013359, race 3+2429
Dec 5 04:43:06 server101 kernel: Out of Memory: Killed process 30672 (exim).

What could be causing this?

 
Cron starts running the scripts in /etc/cron.daily at around 4am. Start looking there.
 
How do I tell what times these run at?
root@server101 [/etc/cron.daily]# ll
total 52
drwxr-xr-x 2 root root 4096 Nov 28 23:41 ./
drwxr-xr-x 52 root root 12288 Dec 5 15:04 ../
lrwxrwxrwx 1 root root 28 Oct 17 04:20 00-logwatch -> ../log.d/scripts/logwatch.pl*
-rwxr-xr-x 1 root root 418 Nov 18 15:49 00-makewhatis.cron*
-rwxr-xr-x 1 root root 276 Feb 15 2004 0anacron*
-rwxr-xr-x 1 root root 57 Nov 28 23:41 fw*
-rwxr-xr-x 1 root root 180 Feb 15 2004 logrotate*
-rwxr-xr-x 1 root root 1603 May 5 2004 prelink*
-rwxr-xr-x 1 root root 104 Apr 16 2004 rpm*
-rwxr-xr-x 1 root root 82 Apr 16 2004 slocate.cron*
-rwxr-xr-x 1 root root 193 Feb 15 2004 tmpwatch*
-rwxr-xr-x 1 root root 136 May 11 2004 yum.cron*
 
crontab -l as root.
What version of exim are you running and is this a redhat
ES/AS machine? Maybe a Dell?
 
I am running fedora.
# crontab -l
# DO NOT EDIT THIS FILE - edit the master and reinstall.
# (- installed on Sun Dec 5 15:34:01 2004)
# (Cron version -- $Id: crontab.c,v 2.13 1994/01/17 03:20:37 vixie Exp $)
44 4 * * * /usr/local/cpanel/3rdparty/interchange/bin/expireall -r
* * * * * /root/pkill/master >/dev/null 2>&1
16 0 * * * /usr/local/bin/rkhunter -c --nocolors --cronjob --report-mode --createlogfile --skip-keypress --quiet

2,58 * * * * /usr/local/bandmin/bandmin
0 0 * * * /usr/local/bandmin/ipaddrmap
5 0 * * * /scripts/upcp
*/15 * * * * /usr/local/cpanel/whostmgr/bin/dnsqueue > /dev/null 2>&1
*/5 * * * * /usr/local/cpanel/bin/dcpumon >/dev/null 2>&1
0 6 * * * /scripts/exim_tidydb > /dev/null 2>&1
 
First col = minutes,etc..
Don't get the second line, but whatever:
never seen it before..I just don't grok it.

You have a lot of stuff running here relatively
frequently. The out of memory errors look like the
result of exim or exim related jobs being scheduled out and not behaving for some reason in conjunction with these multitudinous cron-jobs perhaps, some of which are pretty proc and i/o intensive. Not a great mix with whatever kernel.
However your box shouldn't just die. More memory and
more swap should fix this IMHO, but a rethink of the
jobs wouldn't hurt either.
 
I think the second line is just an advisory in the sense that the recommended method of editing a crontab is to edit a copy then replace the existing with the copy.

# crontab -l
# DO NOT EDIT THIS FILE - edit the master and reinstall.
# (- installed on Sun Dec 5 15:34:01 2004)
# (Cron version -- $Id: crontab.c,v 2.13 1994/01/17 03:20:37 vixie Exp $)
44 4 * * * /usr/local/cpanel/3rdparty/interchange/bin/expireall -r
* * * * * /root/pkill/master >/dev/null 2>&1
16 0 * * * /usr/local/bin/rkhunter -c --nocolors --cronjob --report-mode --createlogfile --skip-keypress --quiet
Is this a real blank line - if so, delete[/color red]
2,58 * * * * /usr/local/bandmin/bandmin
0 0 * * * /usr/local/bandmin/ipaddrmap
5 0 * * * /scripts/upcp
*/15 * * * * /usr/local/cpanel/whostmgr/bin/dnsqueue > /dev/null 2>&1
*/5 * * * * /usr/local/cpanel/bin/dcpumon >/dev/null 2>&1
0 6 * * * /scripts/exim_tidydb > /dev/null 2>&1

HTH.
 
cat /etc/crontab" should show when the cron directories are run. Mine looks like this:

# run-parts
01 * * * * root run-parts /etc/cron.hourly
02 4 * * * root run-parts /etc/cron.daily
22 4 * * 0 root run-parts /etc/cron.weekly
42 4 1 * * root run-parts /etc/cron.monthly

Yes, the "second line" is an advisory. If you're using personal crontabs, then the way to edit them is with "crontab -e".
 
No. I was talking about the second job line guys.
The advisory line is standard for crontab.
This:
Code:
* * * * * /root/pkill/master  >/dev/null 2>&1
I'm not sure when this will run..
 
The star is a wildcard meaning "any time". When you see:

* * * * *

this means, any minute, any hour, any day of the day of the month, any month, any day of the week.

Since cron "wakes up" once per minute, that script will be run once per minute.

Invoke
man 5 crontab
from a command prompt for more information



Want the best answers? Ask the best questions!

TANSTAAFL!!
 
Other than that pkill line (which i haven't seen before but my guess is that it is killing some process, and if it has trouble you will never know because of the redirection to /dev/null) and the "expireall" which seems to run just after your log snippet, the rest looks ok.

It sort of looks like a messy way of cleaning up dead processes.

Also make sure that the cleaning staff isn't plugging their vaccuum into your ups at that time..........
 
<facetious>
I've found that floor buffers are way better at browing out or tripping the breaker on an electrical circuit than vaccuum cleaners.
</facetious>



Want the best answers? Ask the best questions!

TANSTAAFL!!
 
cpanel is a notorious resource monger. Used to
have resource issues all the time on RH boxes at
a 'well known hosting provider' where I worked.
Thanks for the explanation on the cron line all.

I wouldn't trust a default wakeup line like that though.
What if someone hacked the vixie cron code to gain finer granularity and made the wait interval < 1 minute, or
used a different crond that scheduled every second?
That would really suck.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top