Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Readdir large number of files and mod each file

Status
Not open for further replies.
Jun 3, 2007
84
US
Hello Perl experts,

I am pretty new to Perl and having a memory problem with my code. I have narrowed it down to the following section of my code which always seems to generated the out of memory error when running.

Code:
                opendir (TXT, "$processedDirPath") or die "Cannot open $processedDirPath directory $!\n";
                while ( (my $matchingFiles = readdir(TXT)) ) {
                    if ( $matchingFiles =~ /\.rtsd001$/ ) {
                        my @fileContents;
                        my $UNMODIFIED;
                        my $MODIFIED;
                        open(UNMODIFIED, "<$processedDirPath/$matchingFiles") or die "Cannot open $UNMODIFIED for reading $!\n";
                        open(MODIFIED, ">$processedDirPath/$matchingFiles.old") or die "Cannot open $MODIFIED for writing $!\n";
                        while (<UNMODIFIED>) {
                            /STRUC20/ and @fileContents=(), next or push @fileContents, $_;
                        }
                        print MODIFIED @fileContents;
                        close(UNMODIFIED);
                        close(MODIFIED);
                        system( "/bin/mv", "$processedDirPath/$matchingFiles.old", "$processedDirPath/$matchingFiles" ) == 0 or warn "Move command filed $!\n";;
                    }
                }
                closedir TXT;

Now a little background. I want to process all files in a directory which is held by $processedDirPath variable and only work with files that end with "rtsd001".

What I am wondering what is the MOST memory efficient/best way to read files in a directory then modify each file found (Meaning I have to rewrite the file to remove all data before the STRUC20 line. )

The directory which I am running readdir against contains about 6-10 thousand files each about <1k - 40k> in size.

So, working with this large number of files what would be the best way to do this?

Thanks for the help in advance.
 
Hi

I would do it like this, keeping only the currently processed line in the memory :
Code:
[b]opendir[/b][teal]([/teal]TXT[teal],[/teal] [green][i]"$processedDirPath"[/i][/green][teal])[/teal] or [b]die[/b] [green][i]"Cannot open $processedDirPath directory $!\n"[/i][/green][teal];[/teal]

[b]while[/b] [teal]([/teal] [b]my[/b] [navy]$matchingFiles[/navy] [teal]=[/teal] [b]readdir[/b][teal]([/teal]TXT[teal])[/teal] [teal])[/teal] [teal]{[/teal]
  [b]next[/b] [b]unless[/b] [navy]$matchingFiles[/navy] [teal]=~[/teal] [green][i]/\.rtsd001$/[/i][/green][teal];[/teal]

  [b]my[/b] [navy]$struc20seen[/navy][teal]=[/teal][purple]0[/purple][teal];[/teal]

  [b]open[/b][teal]([/teal]UNMODIFIED[teal],[/teal] [green][i]"<$processedDirPath/$matchingFiles"[/i][/green][teal])[/teal] or [b]die[/b] [green][i]"Cannot open $UNMODIFIED for reading $!\n"[/i][/green][teal];[/teal]
  [b]open[/b][teal]([/teal]MODIFIED[teal],[/teal] [green][i]">$processedDirPath/$matchingFiles.old"[/i][/green][teal])[/teal] or [b]die[/b] [green][i]"Cannot open $MODIFIED for writing $!\n"[/i][/green][teal];[/teal]

  [b]while[/b] [teal]([/teal][green][i]<UNMODIFIED>[/i][/green][teal])[/teal] [teal]{[/teal]
    [b]print[/b] MODIFIED [b]if[/b] [navy]$struc20seen[/navy][teal];[/teal]
    [navy]$struc20seen[/navy][teal]=[/teal][purple]1[/purple] [b]if[/b] [green][i]/STRUC20/[/i][/green][teal];[/teal]
  [teal]}[/teal]

  [b]close[/b][teal]([/teal]UNMODIFIED[teal]);[/teal]
  [b]close[/b][teal]([/teal]MODIFIED[teal]);[/teal]

  [b]system[/b][teal]([/teal] [green][i]"/bin/mv"[/i][/green][teal],[/teal] [green][i]"$processedDirPath/$matchingFiles.old"[/i][/green][teal],[/teal] [green][i]"$processedDirPath/$matchingFiles"[/i][/green] [teal])[/teal] [teal]==[/teal] [purple]0[/purple] or [b]warn[/b] [green][i]"Move command filed $!\n"[/i][/green][teal];;[/teal]
[teal]}[/teal]

[b]closedir[/b] TXT[teal];[/teal]


Feherke.
 
Thanks for the quick reply. The only thing I forgot to mention is that all the files that end with the extension of "rtsd001" will contain the "/STRUC20/" so every since file will need to be printed out with the lines prior to that.

So would the line below still be required?
Code:
$struc20seen=1 if /STRUC20/;

thanks again for you help.
 
Hi

learingperl01 said:
So would the line below still be required?
Code:
$struc20seen=1 if /STRUC20/;
$struc20seen is a flag :
[ul]
[li]0 means no line matching /STRUC20/ seen so far[/li]
[li]1 means at least one line matching /STRUC20/ has been seen[/li]
[/ul]
I am abit confused by your confusion (-:, but one thing is sure : that line must be there.

Feherke.
 
Sorry for the confusion. Let me try to explain once more hopefully it makes more sense.

All the files with the name ending with :rtsd001 contains data which I have to reprocess minus all the lines before the line STRUC

for example: sample file.rtsd001 contents:

data data
blah blah
date time
STRUC
needed data
needed data
more needed data

So what I am doing is printing all lines after STRUC in the file into a new file with the data following the line STRUC.
Lines: "needed data, needed data, more needed data" ( @fileContents array contains these lines).

I think one of my problems is that I am pushing the contents of the file to @fileContents array. Then writting it out.

Not sure how to go about instead of saving/pushing to an array just print directly to the MODIFIED file.

Code:
while (<UNMODIFIED>) {
                            /STRUC20/ and @fileContents=(), next or push @fileContents, $_;
                        }
 
Hi

And that happens in my code : after $struc20seen is set to 1 the lines are [tt]print[/tt]ed :
Code:
                  [b]print[/b] MODIFIED      [navy]$struc20seen[/navy][teal]=[/teal][purple]1[/purple]
                    [b]if[/b] [navy]$struc20seen[/navy][teal];[/teal]    [b]if[/b] [green][i]/STRUC20/[/i][/green][teal];[/teal]

data data         nope                nope
blah blah         nope                nope
date time         nope                nope
STRUC             nope                [b]set to 1[/b]
needed data       [b]line printed[/b]        nope
needed data       [b]line printed[/b]        nope
more needed data  [b]line printed[/b]        nope
By the way, is it "STRUC" or "STRUC20" ?

Feherke.
 
Gotcha thanks I'll give that a go and see what happens. Thanks for the help again.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top