Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations derfloh on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Fedora 1 Intermittent Pauses

Status
Not open for further replies.

apostasy1

Technical User
Jan 15, 2005
17
US
I have Fedora Core 1 running on a server with the following specifications:

P3 900MHz
512 MB PC133 SDRAM
2 160GB Seagate drives in RAID-1 on a HPT370 RAID card

The above information probably won't help, but I thought I'd mention it anyways. This machine runs Samba and acts as a file server among a little over 20 users. At random, the server completely locks up for a period of 5 to 15 seconds. Nobody is able to browse Samba shares and all of the workstations running AutoCAD completely hang up. Oddly enough, I can still ping the server when this happens, and the ps -aux command doesn't reveal anything unusual - nor do the log files.

This does not happen to our two other servers, so I've ruled out the possibility that it's a switching problem. The server is receiving its IP address through DHCP (don't ask - I need to change it to static some day). I ordered some replacement memory and a new network card. I'm guessing that it's a hardware-related problem, but I can't say for certain.

Can anyone offer any insight to this problem?

Thanks in advance,
 
what kind of filesystem are you using? ext2? ext3? other?

Cheers.

Chacal, Inc.
 
I read a thread (sorry, I can't find it) where a MySQL database had the same problem (intermittent pauses), exactly as you are experiencing... the solution was the type of buffering: writeback, ordered, journal. (I think the mode was writeback for the solution).

Read the following:


Hope this helps.

Chacal, Inc.
 
Our server isn't running MySQL, but I suppose that doesn't mean those solutions couldn't work. How would I go about changing the filesystem buffering type?

Another thing I just discovered: using the Gnome System Monitor utility, it is showing 498 of 503 MB of physical memory being used (9 MB of which is the System Monitor itself). Could this be a potential cause? I've been watching the process list to see if there's a service that might be temporarily using up all of the CPU cycles, but haven't seen anything yet. Every service is shown as using 0%, aside from the System Monitor which uses 2%.

Thanks again,
 
If your physical memory is all used up, these pauses could be from swapping or paging. The quickest fix might be to just add memory to the box!

512 MB is not a lot if a lot of people are hitting it.

Also maybe find processes that are using a lot of memory. If there are any that aren't needed, you might stop them (i.e. any unneeded daemons).

Hope this helps.
 
How would I go about changing the filesystem buffering type?

Read the pages I linked you.

Regarding the memory issue, I agree with SamBones, if you get low memory, swap will cut you performance, worst if you have a slow hard disk.

Chacal, Inc.
 
Just removed the 512 MB that was in there and threw in 1 GB of Crucial PC133. I'll keep you guys updated.
 
I don't think that's the solution or problem, if no process needs more than 1%.
The system reports only a small amount free, but I guess there is a lot in the buffers, which might be used immediately, without swapping.

But gnome-System-monitor?
You're running X11/ Gnome on a samba-server?

You could save some RAM there, and monitor the system with 'top' from the commandline.

seeking a job as java-programmer in Berlin:
 
Sounds like to me the hard drive bus is resetting...

is it IDE or SCSI?
 
BuckWeet,

IDE. I'm using a FastTrack133 PCI RAID card.
 
The problem is still ocurring, even after installing the additional memory.

I just took a look at the System Monitor again. I don't know how helpful this would be, but I have the following processes running:

gnome system monitor 8.8 MB
cupsd 1.9 MB
wmck-applet 8.3 MB
mapping daemon 660 K
mixer-applet2 6.8 MB
notification-area applet 6.1 MB
python 17.6 MB
pam-panel icon 3.8 MB
pam-timestamp check 500 K
eggcups 6.4 MB
magicdev 5.3 MB
nautilus 14.3 MB
gnome-panel 11.8 MB
gnome-settings daemon 6.5 MB
metacity 6.8 MB
bonobo activation server 2.5 MB
gconfd 17.2 MB
gdm-binary 2.5 MB
X 22.8 MB
gnome-session 8.7 MB
ssh-agent 632 K
mingetty 340 K
mingetty 340 K
mingetty 340 K
mingetty 340 K
mingetty 340 K
mingetty 340 K
dbus-daemon1 856 K
atd 568 K
nmbd 2.2 MB
smbd 2.5 MB
3.1 MB
3.6 MB
7.3 MB
3.7 MB
3.8 MB
3.4 MB
3.9 MB
3.9 MB
6.6 MB
8.3 MB
3.7 MB
3.8 MB
3.9 MB
8.1 MB
3.7 MB
3.8 MB
3.8 MB
3.3 MB
xfs 3.3 MB
crond 600 K
clientmqueue 2.3 MB
sendmail: accepting incoming connections 2.5 MB
xinetd 888 K
fam 1.1 MB
sshd 1.4 MB
apmd 432 K
rpc.statd 708 K
portmap 576 K
klogd 376 K
syslogd 572 K
dhclient 984 K
kjournald 0
khubd 0
kjournald 0
raid1d 0
raid1d 0
raid1d 0
mdrecoveryd 0
kupdated 0
kswapd 0
bdflush 0
ksoftirqd/0 0
kapmd 0
keventd 0
init[5] 420 K
 
I also have the following services designated to run at startup:

acpid
anacron
apmd
atd
autofs
crond
cups
gpm
iptables
irqbalance
isdn
kudzu
messagebus
microcode_ctl
netfs
network
nfslock
pcmcia
portmap
random
rawdevices
rhnsd
sendmail
sgi_fam
smartd
smb
sshd
syslog
xinetd

Right now, used memory is at 997 of 1009 MB
Swap is at 108K of 2.0 GB
 
[root@Fileserver1 root]# w
14:44:43 up 19:57, 2 users, load average: 1.71, 1.12, 0.50
USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
root :0 - Mon 6pm ? 0.00s 0.53s /usr/bin/gnome-
root pts/1 192.168.1.107 2:44pm 0.00s 0.05s 0.02s w

[root@Fileserver1 root]# free
total used free shared buffers cached
Mem: 1032732 1022904 9828 0 89736 675280
-/+ buffers/cache: 257888 774844
Swap: 2048184 1076 2047108
 
Code:
-/+ buffers/cache:     257888     774844

This shows over 700MB of real free RAM. You do not have a RAM exhaustion problem.

My guess, given you are using a hardware IDE RAID card and that you have a relatively large amount of RAM committed to cache (>650MB) is that you have either
1) A RAID driver problem
2) A slow RAID card
3) Extremely high usage
4) Inappropriately configured RAID drive configuration for your application (i.e. RAID 1, 0 or 5 for wrong usage).

Please share more about your RAID configuration.

D.E.R. Management - IT Project Management Consulting
 
thedaver,

I am using a Promise FastTrak133 RAID card. The two drives are manufactured by Seagate and have a capacity of 160 GB each. The RAID array is 1/mirroring.

The way I understand it, there are three mirrored partitions, each beginning with the prefix "md." The RAID card is utilizing whatever driver is included with Fedora Core 1.

This problem seems to occur no matter what the load on the server is. I've had it hang up for several seconds when there were only two people in the office doing CAD work.

Another thing I should mention: I set this fileserver up in August of 2004, but these problems arose about two or three months ago (and have actually gotten worse since they first started).

I disabled the ACPI services this afternoon after reading about it causing random lockups on Fedora Core 1 installations. Still no difference, though. For the heck of it, I plan on replacing the network card sometime this week. I have a few new ones laying around and figure it's worth a try. I recall reading about a few people on Usenet with similar problems which turned out to be caused by a bad network card.

Thanks again,
 
You showed a "w" output that shows your CPU running over 1.0 (1.71 current, and 1.2 over last 5 mins?).

This is REALLY HIGH UTILIZATION for a box doing file services. You should spend some time watching the "top" command and keypress "P" once during that to sort the load by CPU hogs at the head of the list. You need to see what's killing your box.

It's POSSIBLE that you have one or more rouge Windows machines trying to fight over some SMB service privilege (like Master Browser) and they're overloading your Samba app(s). You need to view "top" to see what the primary resource users are!

Only other comment I can loft is this.... Using >software< RAID and ext3 I have had servers perform in this manner. This is not a direct correlation to your situation, but ext3 was in play in both cases.

ext3 is fairly resource intensive (IMHO) relative to ext2 because of its journalling functionality. The bigger the file activity, the more (CPU) work ext3 has to do...

I'm wondering if your journals have gotten larger with your local activity and ext3 is starting to show load on your server as it attempts to deal with RAID cache and ext3 journalling...

Of course, I could be talking out my butt too, so please treat this with a fistful of salt.

Finally, to simplify diagnosing a problem, I'd start shutting down a lot of stuff (read:garbage) that you're running on a file server... I'd consider killing off:

atd, anacron, gpm, portmap, netfs, apmd, cups, isdn, and nfslock.

I don't see those services adding value in what you have described so far, but please perform your own due dilligence before taking such steps.

Take some time studying "top" with "P" and see if you can find a source for your ills.



D.E.R. Management - IT Project Management Consulting
 
thedaver,

I'm monitoring CPU usage right now with the "top" command as you suggested. All of the services appear to be normal, with CPU usage fluctuating from 0% to 1% or so. Service smbd, however, spiked to 39.7% for several seconds and then proceeded to drop back down to under 1%. Unfortunately, Samba is configured with the "force user=root" parameter (Not my idea, by the way). It is only showing one smbd service, and it is showing the user of this service as belonging to root.

I don't know what to do from here. Is that an unusually high CPU utilization for Samba? What would cause the Windows workstations to fight over SMB service privileges?

Thanks again,
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top