PE6800 and PE2800 degraded performance

peterve · Oct 10, 2005

I have 2 brand new servers : PE6800 and PE2800.
They both have a Derc4e/Di Raid controller, and on both boxes, the 6 x 146Gb U320 disks are configured in 3 Raid1 sets. (3 x 146Gb effective size)
Both servers are running Red Hat Enterprise Linux 4 AS, with the latest kernel and packages.
The PE6800 has 4 processors and 16Gb of Ram. THe PE2800 has 2 processors and 4Gb of Ram

When writing large files (or a large amount of data) to the disks, I've noticed big delays.
First test : I've created a couple of files of 1Gb, one file after the other.
In most cases, it took somewhere between 30 seconds and 1 minute to create a file. After x amount of files, I was seeing a delay peak ... for some reason it took more than 7 minutes to create a file. After that file, the speed went back to 3 minutes. (same results on both machines)

I've changed a couple of parameters :

Code:

vm.dirty_expire_centisecs = 300
vm.dirty_writeback_centisecs = 60
vm.dirty_ratio = 10

After rebooting the server, it still took between 8 and 50 seconds to create a file, with a maximum peak of 3 minutes.

I would expect a new server to be fast. 8 seconds would be more normal than 1 minute. 7 minutes or even 3 minutes is not acceptable for a fast server that is not running anything...
The speed remains the same, no matter what the destination volume is (/dev/sda, /dev/sdb and /dev/sdc gave the same results), and I'm seeing similar results on both machines.
What can I do to troubleshoot and what can I do to solve the issue ?

A hdparm on all disks, on both servers, also produced the same results :

Code:

hdparm -T /dev/sda
/dev/sda:
 Timing cached reads:   3492 MB in  2.00 seconds = 1744.52 MB/sec

hdparm -t /dev/sda
/dev/sda:
 Timing buffered disk reads:  154 MB in  3.02 seconds =  50.93 MB/sec

thanks

--------------------------------------------------------------------
How can I believe in God when just last week I got my tongue caught in the roller of an electric typewriter?
---------------------------------------------------------------------

jkupski · Oct 10, 2005

I would expect a new server to be fast. 8 seconds would be more normal than 1 minute. 7 minutes or even 3 minutes is not acceptable for a fast server that is not running anything...

FWIW, I have Centos4.1 (RHEL4.1 AS) running on a PE2850 with a single CPU, 1GB RAM, and 2 15kRPM disks in a RAID1 on a PERC4/di, and I ran a test similar to yours. I did not do any tuning for disk performance.

Test: write 1GB files composed of random data.
Method: 'dd if=/dev/urandom of=/tmp/rand.out count=2048000'
Iterations: 3
Results: 1) 3m27.108s 2) 3m21.381s 3) 3m28.448s

Conclusions: my results are in line with your "peak" times. Either the PERC4 has poor write performance, or your expectations are too high. I'm leaning toward the latter.

jkupski · Oct 10, 2005

Additional data:

Code:

hdparm -Tt /dev/sda

/dev/sda:
 Timing cached reads:   3084 MB in  2.00 seconds = 1541.46 MB/sec
 Timing buffered disk reads:  210 MB in  3.00 seconds =  69.94 MB/sec

Additional testing:

Test: write 1GB files composed of non-random data.
Method: 'dd if=/dev/zero of=/tmp/rand.out count=2048000'
Iterations: 5
Results: 1) 19.609s 2) 20.811s 3) 22.574s 4) 22.256s 5) 21.691s

Conclusions: hmmm. What sort of data are you using as your test set?

peterve · Oct 10, 2005

the data I'm testing is basically creating a blank database extent - sort of placeholder to fill it up with real data afterwards. I think you can compare that with creating a file and fill it with zeros

dd if=/dev/zero of=/tmp/rand.out count=2048000 on my system gave me the following results :

1) 4m5.688s
(waited for a couple of minutes)
2) 4m31.856
immediately followed by test nr 3 :
3) 0m16.504s
waited 1 minute
4) 0m21.482s
waited one minute
5) 0m16.380s
6) 0m24.323s
waited one minute
then ran
dd if=/dev/urandom of=rand.out count=2048000
7) 3m38.654s
waited one minute
8) 7m31.066s

See - weird performance huh ? 16 to 30 seconds would be fine. 4 to 5 minutes is not. The same run on a ordinary PC, with a normal IDE disk, returned 5m7.542s

The server has U320 disks 15Krpm - it should go way faster than that.

what can I do ?

--------------------------------------------------------------------
How can I believe in God when just last week I got my tongue caught in the roller of an electric typewriter?
---------------------------------------------------------------------

jkupski · Oct 10, 2005

See - weird performance huh ?

Definitely--there's a problem somewhere, but what remains in question. Let's clarify a few things:

1. You mentioned the problem is happenning on both machines. Does it happen on all three arrays, or just one?

2. Is hyperthreading enabled? If so, disable.

3. Are these 64-bit xeons (I'm guessing yes.) If so, are you using the 64-bit kernel?

4. How big are the swap partitions? Are they on the same array that you are using to test? Does the problem still exist if you turn off swap?

5. top is your friend. What processes using CPU and RAM while you're doing your testing? Does the profile change when you experience your delays?

6. Bring the system up single user, and attempt to recreate the problem. Still there?

7. Dell's linux support is still spotty at best (they seem to have some poweredge techs who grok linux, but the majority don't) so I wouldn't expect TOO much help from that end. OTOH, you're running RHEL4. Based on the machine specs (that quad CPU box had to cost upwards of $25k) I'm guessing you're paying for it... and the ONLY reason to pay for RedHat is support. What does RedHat have to say about this?

peterve · Oct 11, 2005

1. it happens on all three arrays
2. where can I find the hyperthreading setting ?
3. they are regular Xeon processors, we are using 32Bit kernel
4. the swap partition is on the first array. It is 32gb.
based on output of sar -r, I can confirm that the machine is not swapping when I'm doing the test
5. I'll try this
6. I'll try this
7. I wanted to escalate things through Dell. We are paying for it, but I want to make sure Dell doesn't point at RedHat and RedHat doesn't point at Dell at the same time...

--------------------------------------------------------------------
How can I believe in God when just last week I got my tongue caught in the roller of an electric typewriter?
---------------------------------------------------------------------

jkupski · Oct 11, 2005

The hyperthreading setting is in the BIOS--probably under "CPU Settings" and the name of the setting itself may be called "logical processor."

Also, this probably isn't your problem, but 32gb of swap is overkill--the old rule of thumb about using 2x your RAM doesn't apply when you get to large memory sizes. FWIW, Oracle recommends 8GB of swap on a machine with 16GB RAM running RHEL. Personally, I still think that's a bit much. Also, given that the maximum size of a swap partition is 2GB, you must have sixteen of them--that can't be good for performance.

In any case, even though your system doesn't appear to be using any swap, try testing with swap disabled (/sbin/swapoff -a)

Understand about you wanting to take this up with Dell--finger pointing is always annoying to be in the middle of. Given that you have two servers from different model lines, with different configurations, exhibiting the same problem, though, I'm pretty sure you'll end up dealing with RH in the end.

peterve · Oct 11, 2005

Luckily this is a Dell system, and Dell has invested a lot of money in RH... Dell is going to be the single point of contact for me...
I'll try without swap, but I guess will have to take over for me...

thanks

--------------------------------------------------------------------
How can I believe in God when just last week I got my tongue caught in the roller of an electric typewriter?
---------------------------------------------------------------------

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

PE6800 and PE2800 degraded performance

peterve

IS-IT--Management

jkupski

MIS

jkupski

MIS

peterve

IS-IT--Management

jkupski

MIS

peterve

IS-IT--Management

jkupski

MIS

peterve

IS-IT--Management

Similar threads

Part and Inventory Search

Sponsor