Perl always reads in 4K chunks and writes in 1K chunks... Loads of IO!

NeilFawcett · Dec 31, 2005

I've been doing some analysis of some of my perl scripts...

Take this simple example reading/writing a 1 meg file (with about 3000 lines):-

Code:

#!/usr/bin/perl
$|=1;
print "Content-type:text/html;charset=ISO-8859-1\n\n";

open DF,"test.txt";
@test=<DF>;
close DF;

my $rec;
foreach(@test){$rec.=$_;}
sysopen (DF,"test.txt",O_WRONLY | O_CREAT);
syswrite DF,$rec,length($rec);
close DF;

exit;

When I watch this running (XP Pro SP2) using File Monitor (by SysInternals) I can see the read generates a new IO process 4K at a time. Worse still, when writing, it generates an IO process for each 1 K... In total this simple operation generates over 1500 IO processes (in File Monitor).

I've tried the same thing running perl in CYGWin with File Monitor looking, and the same results are shown, 4K chunks are read, and 1K chunks are written.

Is there anyway around this? Am I monitoring it correctly? You can see by my example I've tried using the more exotic calls to try and stop this "buffering"...

Here's a link to an example output from FileMonitor showing the 4K chunks being read in. Click Me (Rememember to maximise the image size)

Note: This is all because my service provider informed me my IO Processes were getting a bit high, so I'm trying to reduce them by examing how efficient my code is and what it's up to...

KevinADC · Dec 31, 2005

probably is because you are using syswrite, why not try using print and open() instead of using the sys functions?

NeilFawcett · Dec 31, 2005

Started with standard open and print... No difference at all (surpisingly)!

I thought SYSOPEN and SYSWRITE were suppose to be more efficient, but alas no!

NOTE: I missed a line out of my example above:-
line 2 = use Fcntl qw

DEFAULT :flock);

My concern is that maybe on Unix these would be far more efficient? But on Windows (even under CGYWin) this is the case?

NeilFawcett · Dec 31, 2005

typo!! "this is the case?" -> "this is NOT the case?

KevinADC · Dec 31, 2005

well, I just don't know. Hopefully someone else will.

NeilFawcett · Jan 1, 2006

This reduces the io Count from 1600 odd down to just about 600!!!!!

Code:

#!/usr/bin/perl
$|=1;
print "Content-type:text/html;charset=ISO-8859-1\n\n";

open DF,"test.txt";
@test=<DF>;
close DF;

my $rec;foreach(@test){$rec.=$_;}
open DF,"test.txt";
binmode DF;
print DF $rec;
close DF;

exit;

Note: If you do a "print DF @array" instead that uses about 841 processes instead of about 600!

All this gain is purely in the write code! Rather than writing in silly 1K chunks it does it in far bigger chunks (as should be)!!

Just two hurdles now!
1) Improving the read logic (still no matter what) reads in 4K chunks.
2) Don't use the ($rec) variable to consolidate the @test array. This obviously wastes a lot of memory and processor. Is there a better way to consolidate that array than passing it thru another variable?

vjcyrano · Jan 1, 2006

you could try this :

undef $/;
open DF,"test.txt";

open BF,">test1.txt";
print BF <DF>;

$/="\n";
close BF;
close DF;

You are not storing the data in a variable.

I have a doubt with ur scripr..
--------------------
open DF,"test.txt";
binmode DF;
print DF $rec;
-------------------
You are opening test.txt in read mode
and then writing to it ??

NeilFawcett · Jan 1, 2006

vjcyrano,

Yes, the open for the write should have been ">test.txt"! My bad!

Alas I need the data in an array to then step through it ane process it. Once I've process it I then want to write it out again.

Consolidating this array into a single variable though, and writing it out (instead of printing the array) helps reduce the io processes alot.<BR>
<BR>
So I need it in an array between reading and writing, but ideally want to write it out as a single variable with as little processing and memory as possible. ie: In my example if the file had 5 meg of data, I'd have the array (@test) with 5 meg in, and the variable ($rec) with 5 meg in... Blah! Plus I would have stepped though X thousand lines transferring the data... More blah!

NeilFawcett · Jan 1, 2006

Oh for the love of God!

When I test using BINMODE in CYGWin there is NO improvement at all! My testbed is a Windows XP system, but it will run on my ISP which is Unix, so I'm absolutely lost now as to what is a decent way to test if I'm going to improve things...

Grrrr!

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Perl always reads in 4K chunks and writes in 1K chunks... Loads of IO!

NeilFawcett

Programmer

KevinADC

Technical User

NeilFawcett

Programmer

NeilFawcett

Programmer

KevinADC

Technical User

NeilFawcett

Programmer

vjcyrano

Programmer

NeilFawcett

Programmer

NeilFawcett

Programmer

Similar threads

Part and Inventory Search

Sponsor