Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Perl always reads in 4K chunks and writes in 1K chunks... Loads of IO!

Status
Not open for further replies.

NeilFawcett

Programmer
Mar 19, 2004
30
0
0
GB
I've been doing some analysis of some of my perl scripts...

Take this simple example reading/writing a 1 meg file (with about 3000 lines):-
Code:
#!/usr/bin/perl
$|=1;
print "Content-type:text/html;charset=ISO-8859-1\n\n";

open DF,"test.txt";
@test=<DF>;
close DF;

my $rec;
foreach(@test){$rec.=$_;}
sysopen (DF,"test.txt",O_WRONLY | O_CREAT);
syswrite DF,$rec,length($rec);
close DF;

exit;

When I watch this running (XP Pro SP2) using File Monitor (by SysInternals) I can see the read generates a new IO process 4K at a time. Worse still, when writing, it generates an IO process for each 1 K... In total this simple operation generates over 1500 IO processes (in File Monitor).


I've tried the same thing running perl in CYGWin with File Monitor looking, and the same results are shown, 4K chunks are read, and 1K chunks are written.


Is there anyway around this? Am I monitoring it correctly? You can see by my example I've tried using the more exotic calls to try and stop this "buffering"...

Here's a link to an example output from FileMonitor showing the 4K chunks being read in. Click Me (Rememember to maximise the image size)


Note: This is all because my service provider informed me my IO Processes were getting a bit high, so I'm trying to reduce them by examing how efficient my code is and what it's up to...
 
probably is because you are using syswrite, why not try using print and open() instead of using the sys functions?
 
Started with standard open and print... No difference at all (surpisingly)!

I thought SYSOPEN and SYSWRITE were suppose to be more efficient, but alas no!

NOTE: I missed a line out of my example above:-
line 2 = use Fcntl qw:)DEFAULT :flock);


My concern is that maybe on Unix these would be far more efficient? But on Windows (even under CGYWin) this is the case?
 
well, I just don't know. Hopefully someone else will.
 
This reduces the io Count from 1600 odd down to just about 600!!!!!

Code:
#!/usr/bin/perl
$|=1;
print "Content-type:text/html;charset=ISO-8859-1\n\n";

open DF,"test.txt";
@test=<DF>;
close DF;

my $rec;foreach(@test){$rec.=$_;}
open DF,"test.txt";
binmode DF;
print DF $rec;
close DF;

exit;

Note: If you do a "print DF @array" instead that uses about 841 processes instead of about 600!

All this gain is purely in the write code! Rather than writing in silly 1K chunks it does it in far bigger chunks (as should be)!!

Just two hurdles now!
1) Improving the read logic (still no matter what) reads in 4K chunks.
2) Don't use the ($rec) variable to consolidate the @test array. This obviously wastes a lot of memory and processor. Is there a better way to consolidate that array than passing it thru another variable?
 

you could try this :

undef $/;
open DF,"test.txt";

open BF,">test1.txt";
print BF <DF>;

$/="\n";
close BF;
close DF;

You are not storing the data in a variable.


I have a doubt with ur scripr..
--------------------
open DF,"test.txt";
binmode DF;
print DF $rec;
-------------------
You are opening test.txt in read mode
and then writing to it ??

 
vjcyrano,

Yes, the open for the write should have been ">test.txt"! My bad!

Alas I need the data in an array to then step through it ane process it. Once I've process it I then want to write it out again.

Consolidating this array into a single variable though, and writing it out (instead of printing the array) helps reduce the io processes alot.<BR>
<BR>
So I need it in an array between reading and writing, but ideally want to write it out as a single variable with as little processing and memory as possible. ie: In my example if the file had 5 meg of data, I'd have the array (@test) with 5 meg in, and the variable ($rec) with 5 meg in... Blah! Plus I would have stepped though X thousand lines transferring the data... More blah!
 
Oh for the love of God!

When I test using BINMODE in CYGWin there is NO improvement at all! My testbed is a Windows XP system, but it will run on my ISP which is Unix, so I'm absolutely lost now as to what is a decent way to test if I'm going to improve things...

Grrrr!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top