Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Reading very large files in Perl 1

Status
Not open for further replies.

rsteffler

Programmer
Aug 2, 2001
10
0
0
US
I get core dumps when I use Perl to read very large programs using either:
Code:
  while(<IN>) {   }
or
Code:
  perl -ne '  ' filename
I am trying to read files that are 300 meg + on a machine with 6 CPU's and 8GB of memory. Disk space isn't a problem.

Is there some way to keep Perl from reading the file into memory, even though I'm only asking it to process one line at a time? Is it possible to flush the memory after so many lines to avoid a core dump?

My code is simple, so I don't understand why it continues to eat up memory until it core dumps.

Here's the code within the while loop:
Code:
  chomp;
  split(/\|/);
  $line=join(&quot;\|&quot;,$_[11],$_[12],$_[13],$_[0],$_[1],$_[2],$_[3],$_[4],$_[5],$_[6],$_[7],$_[8],$_[9],$_[10]);
  print &quot;$line\n&quot;;

 
So your code looks like this?
[tt]
while(<IN>){
chomp;
split(/\|/);
$line=join(&quot;\|&quot;,$_[11],$_[12],$_[13],$_[0],$_[1],
$_[2],$_[3],$_[4],$_[5],$_[6],$_[7],$_[8],
$_[9],$_[10]);
print &quot;$line\n&quot;;
}
[/tt]
That should be ok, I would have thought... As I understand it Perl *doesn't* read the whole file into memory when you process it like this.

&quot;When you don't understand something, change it (a bit) and watch for what changes...&quot;

The first thing that occurs to me is that it's a large file, so it's worthwhile trying to make it run just a little faster... So try getting rid of that $line variable assignment.
[tt]
while(<IN>){
chomp;
@ary = split(/\|/);
print join(&quot;\|&quot;,$ary[11],$ary[12],$ary[13],$ary[0],
$ary[1],$ary[2],$ary[3],$ary[4],$ary[5],
$ary[6],$ary[7],$ary[8],$ary[9],$ary[10],&quot;\n&quot;);
}
[/tt]
Does that still cause you problems? Mike
&quot;Experience is the comb that Nature gives us after we are bald.&quot;

Is that a haiku?
I never could get the hang
of writing those things.
 
Mike,

Thanks for the response. I've found it very common for Perl to cause core dumps when reading very large files.

I tried your code and had the same problems. I watched one version of the program and saw the panic:popstack errors flying down the screen.

I just can't figure out why it won't actually process one line at a time and &quot;forget&quot; the previous lines. Sadly, I think that I might have to do the work in AWK.

Robert
 
It certainly shouldn't be reading the whole file at once - unless of course you've set slurp mode by $/=undef; somewhere in your earlier code?

Then it would crash OK

I gather you're just swapping around the order of fields in the file yes? Are you piping this into another process? Is there any chance you can amend the program that processes them instead? That might be faster.

Just a thought.
Roger
 
Well I created a text file with a script that was approximately 300 meg. (It consisted of rows of the letters of the alphabet x 4 ie aaaa bbbb cccc dddd etc.) I then created a second script that opened the file and used the above code with no problem. Well it was slow but it didn't core dump. What version of perl are you using and what OS?
 
Robert,

I do this sort of thing all of the time and haven't seen the problems you describe.

It might well be worth checking that you are using the latest version of Perl and that it is built correctly on your system. Mike
&quot;Experience is the comb that Nature gives us after we are bald.&quot;

Is that a haiku?
I never could get the hang
of writing those things.
 
Buddy i got the same problem, i am actually just reading in two 65 meg files,it reads the first one ok, but the second one takes forever. its a simple code: @vec=<filehandle>;close(filehandle); i put this into a class and hence am using objects to avoid pollution of namespace. I just want to know how to go about making the reading in same time for the 2 files. I am running active state on xp_prof.
 
Code_VX, your code is more likely than Robert's to produce a core dump, since you're reading the whole file into memory. It's better to process it with a while(<>) loop. Is there any reason why you need to make an entire array?
 
Perhaps your input file is not pure text, and cannot be separted into lines successfully? Maybe replace [tt]<FILE>[/tt] with [tt]sysread[/tt]...
 
toolkit, I was thinking pretty much the same thing. If the file doesn't have LF or CRLF at the end of each line it would definitely create a problem. It would probably also create a problem if the file had the wrong KIND of line endings for the system it was running on (i.e. CRLF or *unix or LF on Windows). I seem to recall that I've run into that particular problem before. Tracy Dryden
tracy@bydisn.com

Meddle not in the affairs of dragons,
For you are crunchy, and good with mustard. [dragon]
 
Just a reminder that the input field separator is easily changed to anything you want like this:

$/ = &quot;\r&quot;; # Set input field separator to carriage return
 
Just for debugging, try this:
print &quot;Read a line here&quot;;
Put this in the readline loop.

-- A haiku by Jim

If the screen isn't flying with &quot;Read a line here&quot;
then you know the problem is perl's interpretation of
what a line is. Use raider's reminder to correct this
problem.

--Jim
 
*that's* not a haiku! LOL Mike
&quot;Experience is the comb that Nature gives us after we are bald.&quot;

Is that a haiku?
I never could get the hang
of writing those things.
 
well last night i decided to modify the code,I did read it from a file line by line instead of storing it in an array, I pulled up the task manager to watch the process for the first file.by the time the exec is done with the first file, RAM usage =270mb, now the class object called in another file, the process then just hung up there cause it was too obese to move. Now i would like to know how to do a clean up. Is there a module to perform a clean up as i switch between packages and classes? I am kinda new to perl.



 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top