Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

perl or awk?? 1

Status
Not open for further replies.

scooter6

IS-IT--Management
Jul 10, 2001
44
US
Okay...here's the problem:

I have a simple fixed length text file...that looka a bit like this....

John Adams 1234 Main Street Anywhere TX

John William Adams 9876 Walnut Ave Some Town TX

The problem:

in order to import it into an existing database I have, I want to convert it to an ASCII comma delimmted file..with a bit of a twist....as far as the name, my database carries the name as three different fields...First Name Middle Init
and Last Name and carries City and state as two different fields as well....

in other words, the names I need exported as:

"John","","Adams"
"John","W","Adams"

and, the addresses need to be:

"Anywhere","TX"
"Some Town","TX"

etc

the problem is how do I do this manipulation since I can't really use white space as an indicator and not all names have a middle name or initial...and, some cities are two words instead of one, etc...

anyone have any ideas on whether awk or perl would handle this better? or any other text manipulation tools that would simply this?

I would do it manually in Access but I have a TON of records to do, etc....

any help would be appreciated...thanks

Scooter6


 
Perl could do the job I guess, you'll have to make some compromises though as it probably won't be possible to convert all of your records automatically, you'll be able to get close though.

Here's a start, shout up if you'd like more gaps filled in (it's late here now...)

my $name;
my $address;
while(<>){
if(/[0-9]/){ # find the house number, if there is one
$name = $`;
$address = $& . $';
@name_array = split(/\s+/,$name);
@address_array = split(/\s+/,$address);
# print comma delimited names and addresses
} else {
# do something else ...
}
}
Mike
michael.j.lacey@ntlworld.com
Email welcome if you're in a hurry or something -- but post in tek-tips as well please, and I will post my reply here as well.
 
Hmm tricky...

I think Mikes idea is your only option.

But then its downfall is that it relies on the address fields starting with a number, which lets face it isn't good.

In either case they can both be written in perl or awk.

Andy
++
 
Well, unfortunately, it's the problem I am faced with.
I'm playing with some Perl scripts, but it is indeed very tricky.....

Oh well, back to the drawing board...

If anyone comes up with an idea, please post. Thanks

Scooter6
 
Hi Scotter,
Question: Do you always have one white space between words?
for exapmle is it look like this
John W Adams add1
john Adans add1

or
John W Adams add1
john Adans add1

Sachin
 
it is fixed length records....so, for instance, positions 1-32 carry the name, 33-63 carry the address, 64-90 carry city/state, etc.

so, it would look like this

John W Adams 1234 main street anywhere tx
John Adams 9876 Elm St Apt 10 sometown tx
John Adams Jr 111 Broadway our city tx

hence, the problem...there is always a white space between text, just not necessarily the same number.....

for the above example, I need it to look like this:

&quot;John&quot;,&quot;W&quot;,&quot;Adams&quot;,&quot;1234 Main Street&quot;,&quot;anywhere&quot;,&quot;tx&quot;
&quot;John&quot;,,&quot;Adams&quot;,&quot;9876 Elm St&quot;,&quot;Apt 10&quot;,&quot;sometown&quot;,&quot;tx&quot;
&quot;John&quot;,,&quot;Adams Jr&quot;,&quot;111 Broadway&quot;,&quot;our city&quot;,&quot;tx&quot;

etc...etc


some names will have 3 positions, some addresses will have two (depending on the length..my database carries 25 char for address 1 and 25 for address 2...the text file carries address as 32 char...text file also contains city/state together as one field...I have it as two...so, some could be one white space for city, then state...or, if it's a two word city name, could be two white spaces, then state)

problem I have on my database side is there is entirely too much built on it as far as reports,letters,etc. to change the fields in the database to match...so, I'm trying to &quot;chop up the text file&quot; to my liking....a bigger than exprected chore as I'm finding out...

 
The fixed length records do make it a little easier. This allows you to split the line into a name, address, and location fields. I wrote a Perl script that should do the job for lines with no middle name and a city with multiple words. The assumptions I made are:

1) locations are 1-32(name), 33-63(address), 64-90(location)
2) name is &quot;first middle last&quot; or &quot;first last&quot;
3) location - last word is state

($name, $address, $citystate) = $data =~ m/^(.{32})(.{31})(.{27})/;
@Name = split &quot; &quot;, $name;
$first = $Name[0];
if ($#Name < 2) {
# No middle name
$middle = &quot;&quot;;
$last = $Name[1];
} else {
# Middle name
$middle = $Name[1];
$last = $Name[2];
}
($city, $state) = $citystate =~ m/^(.+)\s+(\w+)/;
($address) =~ s/\s{2,}//g; # Remove trailing white space
print &quot;$first,$middle,$last,$address,$city,$state\n&quot;;
 
okay....I assume the first line I need is

#!/usr/bin/perl


is that going to work with no &quot;foreach&quot; statement...

seems like you're missing the first part of the script...

I'll see what I can do with this.....thanks

 
I didn't know I was dealing with a Perl newbie :)

Below is a complete script. Just change the INPUT_FILENAME to the appropriate file name and it should work. Make sure the OUTPUT_FILENAME is different from the INPUT_FILENAME!

Good Luck

Code:
#!/usr/bin/perl

$input = &quot;INPUT_FILENAME&quot;;
$output = &quot;OUTPUT_FILENAME&quot;;
open(IN, &quot;$input&quot;) or die &quot;Can't open $input: $! \n&quot;;
open(OUT, &quot;>$output&quot;) or die &quot;Can't open $output: $! \n&quot;;
while(<IN>) {
    chomp($_);
    ($name, $address, $citystate) = $_ =~ m/^(.{32})(.{31})(.{27})/;
    print &quot;n=$name a=$address c=$citystate \n&quot;;
    ($name, $address, $citystate) = $_ =~ m/^(.{32})(.{31})(.{27})/;
    @Name = split &quot; &quot;, $name;
    $first = $Name[0];
    if ($#Name < 2) {
        # No middle name
        $middle = &quot;&quot;;
        $last = $Name[1];
    } else {
        # Middle name
        $middle = $Name[1];
        $last = $Name[2];
    }
    ($city, $state) = $citystate =~ m/^(.+)\s+(\w+)/;
    ($address) =~ s/\s{2,}//g; # Remove trailing white space
    print OUT &quot;$first,$middle,$last,$address,$city,$state\n&quot;;
}
close(IN);
close(OUT);
 
sorry...yes, I am a Perl newbie...sorry....

thanks for all the help...I'll work with that script a bit..
but, I think this has helped with the majority of my problem anyways, just not all of it yet....

there are other parts of text that I need to work into there, but they aren't that hard to cut up, I don't believe...thanks again
 
This may or may not help ...
I am frequently asked to export files from UNIX to various Windows applications, such as Act. One handy trick I use is vi, with the :%s/some string or character group/some other string or character/.
The substitution %s command I use to convert multiple spaces to single spaces, or ^I to commas, and other such character manipulations.
Good Luck
 
doh!!! &quot;fixed length&quot;!!! I missed that, sorry guys... Mike
michael.j.lacey@ntlworld.com
Email welcome if you're in a hurry or something -- but post in tek-tips as well please, and I will post my reply here as well.
 
well, I got it as close as I'm probably going to

thanks for all the help everyone...

scooter6
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top