Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

use perl to delimit and delete bad records in flat-file DB export

Status
Not open for further replies.

shotokan

IS-IT--Management
Jul 13, 2003
1
0
0
US
I'm new to perl and need help building a pipe delimited text file with rows that contain the correct location of the delimiter.
I'm trying to modify a text file that has record elements delimited by spaces. But I don't want to replace all of the spaces in each row. Then I need to remove any rows that are missing data in the fields. This is to prepare a file for import into a database using an import utility.

Here's what I have so far, input file looks like this and has several hundred rows.

<code>
0211121253 Mike Dell ID065533
empty_field Bill Apple ID394033
0211121253 Leo De Vinci ID02014
0211121253 Joe Del La Marco ID653335
2033223222 John Michael Smith ID345234
...
</code>
<code>
#!/usr/bin/perl

$header = "ACCOUNT|FNAME|LNAME|ID\n";
@data =();
open INFILE, "< dat.txt"
or die "Cant open input file : $!";
# print $header;
while (<INFILE>) {
chomp;
push (@data, $_);
$_ =~ s/\s/\|/g;
print "$_\n";
}
close INFILE;
</code>

After processing, the output file should look like this:
<code>
ACCOUNT|FNAME|LNAME|ID # <-header added
0211121253|Mike|Dell|ID065533
0211121253|Leo|De Vinci|ID02014 # <-space left in place
0211121253|Joe|Del La Marco|ID653335 #same here
2033223222|John Michael|Smith|ID345234 # <-space left place
...
</code>
The script should also delete any rows with empty elements.

I'm not sure what would work best, the join, split, push, shift/unshift operators, or a regex that replaces the specified space with a pipe character? If using a regex, can I specify replacing the first space, after the ACCOUNT field, then the space after the FNAME field? However, the first or last names may have a space in them. The last record has a first name that has a space and the other records might have spaces in the last name field. I may be able to get a file(s) that have a list of the last or first names only, and use that as a reference for delimiting.

I code use regex to delimit the first space and the preceding space of the last field. Then use some way of pulling in the first and last name fields putting them in between the account and ID pipe delimited fields.

Unfortunately, I can't get the text file already delimited correctly, thus all this work, but I can get separate files that have only the individual fields in them (e.g, first_name.txt, last_name.txt, id.txt, etc). Then find away to build the dat.txt file with delimiters properly located.

Thank you in advanced for your help,
~Shotokan
 
I'd get the separate files and use the Unix paste utility to stick 'em together, e.g. paste -d'|' accounts fnames lnames ids.

Annihilannic.
 
Use a pattern to collect the data using capturing parentheses. This will give you the acount in $1, the ID in $3. $2 will contain all the names, so just split them on spaces into a temporaray array, then pop the first one off as the FNAME. Join any others on space to form the LNAME.

Steve

[small]"Every program can be reduced by one instruction, and every program has at least one bug. Therefore, any program can be reduced to one instruction which doesn't work." (Object::perlDesignPatterns)[/small]
 
stevexff said:
... then pop the first one off as the FNAME.

If you examine the sample data more closely you'll see it's not as simple as that, some of the first names have spaces in too.

Annihilannic.
 
Hmm. But without some fairly complex rules, you aren't ever going to get around this...

Steve

[small]"Every program can be reduced by one instruction, and every program has at least one bug. Therefore, any program can be reduced to one instruction which doesn't work." (Object::perlDesignPatterns)[/small]
 
Yea, the name requirement is a deal buster.

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top