use perl to delimit and delete bad records in flat-file DB export

shotokan · Jul 12, 2009

I'm new to perl and need help building a pipe delimited text file with rows that contain the correct location of the delimiter.
I'm trying to modify a text file that has record elements delimited by spaces. But I don't want to replace all of the spaces in each row. Then I need to remove any rows that are missing data in the fields. This is to prepare a file for import into a database using an import utility.

Here's what I have so far, input file looks like this and has several hundred rows.

<code>
0211121253 Mike Dell ID065533
empty_field Bill Apple ID394033
0211121253 Leo De Vinci ID02014
0211121253 Joe Del La Marco ID653335
2033223222 John Michael Smith ID345234
...
</code>
<code>
#!/usr/bin/perl

$header = "ACCOUNT|FNAME|LNAME|ID\n";
@data =();
open INFILE, "< dat.txt"
or die "Cant open input file : $!";
# print $header;
while (<INFILE>) {
chomp;
push (@data, $_);
$_ =~ s/\s/\|/g;
print "$_\n";
}
close INFILE;
</code>

After processing, the output file should look like this:
<code>
ACCOUNT|FNAME|LNAME|ID # <-header added
0211121253|Mike|Dell|ID065533
0211121253|Leo|De Vinci|ID02014 # <-space left in place
0211121253|Joe|Del La Marco|ID653335 #same here
2033223222|John Michael|Smith|ID345234 # <-space left place
...
</code>
The script should also delete any rows with empty elements.

I'm not sure what would work best, the join, split, push, shift/unshift operators, or a regex that replaces the specified space with a pipe character? If using a regex, can I specify replacing the first space, after the ACCOUNT field, then the space after the FNAME field? However, the first or last names may have a space in them. The last record has a first name that has a space and the other records might have spaces in the last name field. I may be able to get a file(s) that have a list of the last or first names only, and use that as a reference for delimiting.

I code use regex to delimit the first space and the preceding space of the last field. Then use some way of pulling in the first and last name fields putting them in between the account and ID pipe delimited fields.

Unfortunately, I can't get the text file already delimited correctly, thus all this work, but I can get separate files that have only the individual fields in them (e.g, first_name.txt, last_name.txt, id.txt, etc). Then find away to build the dat.txt file with delimiters properly located.

Thank you in advanced for your help,
~Shotokan

Annihilannic · Jul 12, 2009

I'd get the separate files and use the Unix paste utility to stick 'em together, e.g. paste -d'|' accounts fnames lnames ids.

Annihilannic.

stevexff · Jul 13, 2009

Use a pattern to collect the data using capturing parentheses. This will give you the acount in $1, the ID in $3. $2 will contain all the names, so just split them on spaces into a temporaray array, then pop the first one off as the FNAME. Join any others on space to form the LNAME.

Steve

[small]"Every program can be reduced by one instruction, and every program has at least one bug. Therefore, any program can be reduced to one instruction which doesn't work." (Object:erlDesignPatterns)[/small]

Annihilannic · Jul 13, 2009

stevexff said:
... then pop the first one off as the FNAME.

If you examine the sample data more closely you'll see it's not as simple as that, some of the first names have spaces in too.

Annihilannic.

stevexff · Jul 14, 2009

Hmm. But without some fairly complex rules, you aren't ever going to get around this...

Steve

[small]"Every program can be reduced by one instruction, and every program has at least one bug. Therefore, any program can be reduced to one instruction which doesn't work." (Object:erlDesignPatterns)[/small]

KevinADC · Jul 14, 2009

Yea, the name requirement is a deal buster.

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

use perl to delimit and delete bad records in flat-file DB export

shotokan

IS-IT--Management

Annihilannic

MIS

stevexff

Programmer

Annihilannic

MIS

stevexff

Programmer

KevinADC

Technical User

Similar threads

Part and Inventory Search

Sponsor