Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Split large data file based on content 1

Status
Not open for further replies.

MightyJayDog

Technical User
May 29, 2007
11
US
I am looking for some suggestions on how to split a large 13 Gb+ file into smaller more manageable sized file chunks based on content using Perl. I am trying to make it a little more generic as to exactly what data or content to split on, just hoping that someone can point me in the right direction. I am wanting to search through the file for a certain pattern (invoice end point) and then after a certain number of invoices or a certain size - find the end of the invoice and then split it off into it's own file. Any ideas? I am pretty new to Perl and being thrown into it to figure out this problem. :)

Jason
 
brigmar - you rock - thanks a ton for all your help. After using your suggestions and changing the input file, here is what I came up with and it works great. It basically looks for an "11" beginning at the 67th character and then splits the file into 500 Kb chunks (or larger depending on the size of the last invoice). I will be making more modifications to this script and will post them as they develop. Thanks again!

Code:
#!usr/bin/perl -w

my $chunksize = 500 * 1024; # 500Kb
my $filenumber = 0;
my $infile = "infile.dat";
my $outsize = 0;
my $eof = 0;

open INFILE, $infile;
open OUTFILE, ">outfile_".$filenumber.".dat";

while(<INFILE>)
{
        chomp;

        $outsize++;
        if( $outsize>$chunksize and /^.{67}11/ )
        {
                close OUTFILE;
                $outsize = 0;
                $filenumber++;
                open (OUTFILE, ">outfile_".$filenumber.".dat") or die "Can't open outfile_".$filenumber.".dat";
        }

        print OUTFILE "$_\n";
        $outsize += length;

}
close INFILE;
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top