Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Red Hat Linux mysterious reboots

Status
Not open for further replies.

jlaw10

Technical User
Jul 28, 2005
54
US
A client had a 3 day span where their Red Hat server reboot several times throughout the day. We did not see any cronjobs or anything that stood out in the alert logs that would've caused the reboots. The system logs are attached. Client still seeking cause of reboot eventhough we have not seen that behavior again in the past week or so.

[root@rssvnap01 ~]# uname -a
Linux rssvnap01.maximus.com 2.6.18-194.el5 #1 SMP Tue Mar 16 21:52:39 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux
[root@rssvnap01 ~]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.5 (Tikanga)
[root@rssvnap01 ~]# cat /proc/meminfo
MemTotal: 8175332 kB
MemFree: 6887408 kB
Buffers: 44288 kB
Cached: 711204 kB
SwapCached: 0 kB
Active: 702008 kB
Inactive: 501236 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 8175332 kB
LowFree: 6887408 kB
SwapTotal: 8388600 kB
SwapFree: 8388600 kB
Dirty: 388 kB
Writeback: 0 kB
AnonPages: 447760 kB
Mapped: 75652 kB
Slab: 42048 kB
PageTables: 15820 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
CommitLimit: 12476264 kB
Committed_AS: 963976 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 267136 kB
VmallocChunk: 34359470839 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
Hugepagesize: 2048 kB
[root@rssvnap01 ~]#
---------------------------------

message repeated 10 times
Jul 26 11:39:39 rssvnap01 last message repeated 11 times
Jul 26 11:40:45 rssvnap01 last message repeated 11 times
Jul 26 11:40:57 rssvnap01 last message repeated 2 times
Jul 26 11:40:59 rssvnap01 shutdown[7542]: shutting down for system reboot
Jul 2

----------------------------


rssvnap01 last message repeated 11 times
Jul 26 10:55:13 rssvnap01 last message repeated 11 times
Jul 26 10:56:19 rssvnap01 last message repeated 11 times
Jul 26 10:57:25 rssvnap01 last message repeated 11 times
Jul 26 10:58:31 rssvnap01 last message repeated 11 times
Jul 26 10:59:37 rssvnap01 last message repeated 11 times
Jul 26 11:00:43 rssvnap01 last message repeated 11 times
Jul 26 11:01:49 rssvnap01 last message repeated 11 times
Jul 26 11:02:50 rssvnap01 last message repeated 10 times
Jul 26 11:03:56 rssvnap01 last message repeated 11 times
Jul 26 11:05:02 rssvnap01 last message repeated 11 times
Jul 26 11:06:08 rssvnap01 last message repeated 11 times
Jul 26 11:07:14 rssvnap01 last message repeated 11 times
Jul 26 11:08:15 rssvnap01 last message repeated 10 times
Jul 26 11:09:21 rssvnap01 last message repeated 11 times
Jul 26 11:10:27 rssvnap01 last message repeated 11 times
Jul 26 11:11:33 rssvnap01 last message repeated 11 times
Jul 26 11:12:39 rssvnap01 last message repeated 11 times
Jul 26 11:13:40 rssvnap01 last message repeated 10 times
Jul 26 11:14:46 rssvnap01 last message repeated 11 times
Jul 26 11:15:52 rssvnap01 last message repeated 11 times
Jul 26 11:16:58 rssvnap01 last message repeated 11 times
Jul 26 11:18:04 rssvnap01 last message repeated 11 times
Jul 26 11:19:05 rssvnap01 last message repeated 10 times
Jul 26 11:20:11 rssvnap01 last message repeated 11 times
Jul 26 11:21:17 rssvnap01 last message repeated 11 times
Jul 26 11:22:23 rssvnap01 last message repeated 11 times
Jul 26 11:23:29 rssvnap01 last message repeated 11 times
Jul 26 11:24:35 rssvnap01 last message repeated 11 times
Jul 26 11:25:41 rssvnap01 last message repeated 11 times
Jul 26 11:26:47 rssvnap01 last message repeated 11 times
Jul 26 11:27:48 rssvnap01 last message repeated 10 times
Jul 26 11:28:49 rssvnap01 last message repeated 10 times
Jul 26 11:29:55 rssvnap01 last message repeated 11 times
Jul 26 11:31:01 rssvnap01 last message repeated 11 times
Jul 26 11:32:07 rssvnap01 last message repeated 11 times
Jul 26 11:33:13 rssvnap01 last message repeated 11 times
Jul 26 11:34:19 rssvnap01 last message repeated 11 times
Jul 26 11:35:20 rssvnap01 last message repeated 10 times
Jul 26 11:36:26 rssvnap01 last message repeated 11 times
Jul 26 11:37:32 rssvnap01 last message repeated 11 times
Jul 26 11:38:33 rssvnap01 last message repeated 10 times
Jul 26 11:39:39 rssvnap01 last message repeated 11 times
Jul 26 11:40:45 rssvnap01 last message repeated 11 times
Jul 26 11:40:57 rssvnap01 last message repeated 2 times
Jul 26 11:40:59 rssvnap01 shutdown[7542]: shutting down for system reboot
Jul 26 11:41:03 rssvnap01 kernel: sd 0:0:0:0: timing out command, waited 5s
Jul 26 11:41:09 rssvnap01 kernel: sd 0:0:0:0: timing out command, waited 5s
Jul 26 11:41:08 rssvnap01 smartd[5684]: smartd received signal 15: Terminated
Jul 26 11:41:09 rssvnap01 smartd[5684]: smartd is exiting (exit status 0)
Jul 26 11:41:15 rssvnap01 kernel: sd 0:0:0:0: timing out command, waited 5s
Jul 26 11:41:16 rssvnap01 avahi-daemon[3572]: Got SIGTERM, quitting.
Jul 26 11:41:16 rssvnap01 avahi-daemon[3572]: Leaving mDNS multicast group on interface eth0.IPv4 with address 10.1.229.71.
Jul 26 11:41:21 rssvnap01 kernel: sd 0:0:0:0: timing out command, waited 5s
Jul 26 11:41:22 rssvnap01 rhnsd[3544]: Exiting
Jul 26 11:41:27 rssvnap01 kernel: sd 0:0:0:0: timing out command, waited 5s
Jul 26 11:41:39 rssvnap01 last message repeated 2 times
Jul 26 11:42:00 rssvnap01 kernel: VMware memory control driver unloaded
Jul 26 11:42:00 rssvnap01 kernel: Removing vmci device
Jul 26 11:42:01 rssvnap01 kernel: Resetting vmci device
Jul 26 11:42:01 rssvnap01 kernel: Unregistered vmci device.
Jul 26 11:42:01 rssvnap01 kernel: ACPI: PCI interrupt for device 0000:00:07.7 disabled
Jul 26 11:42:06 rssvnap01 xinetd[3391]: Exiting...
Jul 26 11:42:12 rssvnap01 ntpd[3408]: ntpd exiting on signal 15
Jul 26 11:42:13 rssvnap01 nm-system-settings: disconnected from the system bus, exiting.
Jul 26 11:42:13 rssvnap01 kernel: nm-system-setti[6332]: segfault at 0000000000000000 rip 0000000000000000 rsp 00007fffc96d7618 error 14
Jul 26 11:42:20 rssvnap01 rpc.statd[2674]: Caught signal 15, un-registering and exiting.
Jul 26 11:42:21 rssvnap01 portmap[8089]: connect from 127.0.0.1 to unset(status): request from unprivileged port
Jul 26 11:42:23 rssvnap01 restorecond: terminated
Jul 26 11:42:24 rssvnap01 auditd[2539]: The audit daemon is exiting.
Jul 26 11:42:24 rssvnap01 kernel: audit(1280158944.124:817): audit_pid=0 old=2539 by auid=4294967295 subj=system_u:system_r:auditd_t:s0
Jul 26 11:42:24 rssvnap01 pcscd: pcscdaemon.c:572:signal_trap() Preparing for suicide
Jul 26 11:42:24 rssvnap01 pcscd: hotplug_libusb.c:376:HPRescanUsbBus() Hotplug stopped
Jul 26 11:42:25 rssvnap01 pcscd: readerfactory.c:1379:RF
------------------------

 
Would a bad batter or power failure reflect the "shutting down for system reboot" message in the logs?
 
The first thing I thought of too was power supply. The power supply undoubtedly contains monitors to check the voltage levels and if one of them is out of line, cause the master reset line to get asserted. Other things could be a failing switch on the case that causes a momentary contact triggering a reset signal (again likely routed through the power supply).

Another option might be a fan fail and over heating causing a shutdown signal. Since it is Linux, the shutdown signal may not get interpreted as the manufacturer intended. For example, if on the console I type 'shutdown now' the system will reboot, but if I type 'shutdown now -P' it will shut down.

In any event, I suspect a hardware malfunction. You might want to browse in the pc hardware, general discussion forum for some similar occurrences.
 
I actually just got more info stating that this Linux server is running on a VMWare cluster of six Dell R905s.

No other servers are having an issue. There are about 150 VM client servers on the cluster. They also moved the Linux server from one VM cluster mate to another but the reboots continued.

When the users stopped using the system the reboots stopped. I also instructed them to increase the log/audit level to capture more info during future crashes.
 
Honestly, logs don't show anything obvious.
Based on my experience, it could be anything from bad motherboard, bad ram, bad power supply (most likely).

What happens is even if you measure the voltages on the power supply with the system up and running, you won't get much feedback because it may have ripple and that can cause issues under load.

See if you could setup a "load monitor" that would plot the load levels vs. uptime and than have a look at it.

Better yet, try to run a quick stress test on the machine and see if you can cause it to reboot. If you do (under heavy load) then I would bet my money on the P/S :D

Just my two cents :D

Good luck!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top