Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Text file Parsing .....

Status
Not open for further replies.

nani123nani

Programmer
Dec 7, 2002
4
US
Hi guys,

I am very new to the shell scripting ....
I have a text file with 4 columns and all colums are seperated by some space ...

My text file is like ..................
------------------------------------------------------------

FirstName
LastName
PhoneNumber Group
------------------------------------------------------------

David
Ked 345-234-
1234 Dba

Edward
Nas
234.123.5673 WDS

Sdert
Fuji 91 23
45678 ABC

serty
asder 01 34 456
345678978 KLS

cderfg
Sddd 123
345 VBC

Asasasasas
QWER 2154-589-
9656 asd

------------------------------------------------------------

I need to store parse above file like need to store all individual information in seperate attributes.
I need to capture individual information, such as firstname . second name, ph number and group in seperate fileds. I tried by CUT command, since some of phone numbers having spaces in between , I am not able to capture phone numbers properly.
Can some one help me How to capture phone numbers and also other informations.

I appreciate your help!

Thanks in advances ....
 
Maybe you can do it like this.
It is a slow way but very easy to understand.

It only works when the positions stay the same.

while read xx
do
a=`echo $xx|awk '{ print substr( $0, 1, 48)}'`
b=`echo $xx|awk '{ print substr( $0, 49, 45)}'`
c=`echo $xx|awk '{ print substr( $0, 92, 53)}'`
d=`echo $xx|awk '{ print substr( $0, 145, 20)}'`
echo $a;$b;$c;$d
done < 4colfile

Regards Gregor Gregor.Weertman@mailcity.com
 
Thanks for you tip ...
But some lines I have like below in my text file ...
---------------------------------------------------------
David Ked 345-234-1234 Dba
Edward Nassw 234.123.5673 WDS
Sdert Fuji 91 23 45678 ABC
serty asder 01 34 456 345678978 KLS
---------------------------------------------------------
I am facing difficulty to parse in above case ....


 
Now I’m going to cheat.
I assume the phone number has only 4 fields at most.
It makes the field separator “;”.

awk '{
printf( &quot;%s;&quot;, $1)
sub( $1, &quot;&quot;)
printf( &quot;%s;&quot;, $1)
sub( $1, &quot;&quot;)
printf( &quot;%s;&quot;, $NF)
sub( $NF, &quot;&quot;)
print $1, $2, $3, $4
}' 4colfile

Regards Gregor Gregor.Weertman@mailcity.com
 
How about this:

[tt]#!/usr/bin/ksh

IFS=';'
sed -e 's/\([A-Za-z ]*\) *\([0-9 -.]*\) *\([A-Z]*\)/
\1;\2;\3/' 4colfile | while read LINE
do
set -- $LINE
echo $1
echo $2
echo $3
echo
done[/tt]

It's not very easy to read, but seems to do the job!

sed is used to match the sets of characters you are likely to find in names and phone numbers and separates them using semicolons. The shell IFS (Inter Field Separator) variable is set to semicolon, so the set -- $LINE command separates them into $1, $2, $3, etc.

Annihilannic.
 
nawk -f customers.awk customers.txt

#------------------- customers.awk
BEGIN{
OFS=&quot;,&quot;
}
{
phone=&quot;&quot;
for(i=3; i <= NF-1; i++)
phone=phone $i ((i < NF-1) ? FS : &quot;&quot;)
printf(&quot;%s%s&quot;, $1, OFS);
printf(&quot;%s%s&quot;, $2, OFS);
printf(&quot;%s%s&quot;, phone, OFS);
printf(&quot;%s\n&quot;, $NF);
} vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+
 
None of the above tips working ....
My text file is like below ...
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++
UserId FNAME LNAME Phone ORG
-----------------------------------
PZ44GK Gery Kissel 8-353-4149 ATV
SZM72N Greg Helzer 8-353-4167 ATV
CZHRZG Jeff Purdue 8-226-7511 EICC
FZW2WV Phillip Louey +61 3 9647 5520 HOLDENS
KZ20FH James Davies +61 3 9647 1420 HOLDENS
BZYB7C Andrew Brenz 8-(810) 236-0598 CLCD
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

From above text file as a Input, I need to capture all individual user information attributes seperately like Phone number, UserId, FNAME etc.
I am able to parse few lines which have phone number is single string with out spaces in between. But I am not able to parse phone numbers which have spaces ...

Any suggestions .......

Thanks in Advance ....



 
nani,

Can you guarantee that the last field (ORG) is always going to be a single word? Mike

Want to get great answers to your Tek-Tips questions? Have a look at faq219-2884
 
Do I understand correctly? You may have more than one space between fields but also you may have 1 space between parts of a phone number? This is how I do things like this - you need to identify a charactter which DOESN'T appear in your file (I've used ']') & amend all occurrences of 2 or more spaces to that character, then use awk to separate the fields. (There are 3 spaces in the first part of the sed edit)

cat textfile | sed 's/ */]/g' | awk -F] '{ printf &quot;UserID: %s; FName: %s; LName: %s; Phone: %s; ORG: %s\n&quot;, $1, $2, $3, $4, $5 }'

HTH

(PS - I know it's a UUOC but I find this format easier to read)
 
BEGIN{
OFS=&quot;,&quot;
}
{
phone=&quot;&quot;
for(i=4; i <= NF-1; i++)
phone=phone $i ((i < NF-1) ? FS : &quot;&quot;)
printf(&quot;%s%s&quot;, $1, OFS);
printf(&quot;%s%s&quot;, $2, OFS);
printf(&quot;%s%s&quot;, $3, OFS);
printf(&quot;%s%s&quot;, phone, OFS);
printf(&quot;%s\n&quot;, $NF);
}
vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+
 
Yes, all the fields except phone number going to be a single word. I need to read line by line and capture all individual entities seperately.
 
It's no surprise that none of the above tips worked when the format of the file changed!

Here's my solution again updated for the new format:

[tt]#!/usr/bin/ksh

IFS=';'
sed -e 's/\([A-Za-z]* \)\([A-Za-z ]*\) *\([.0-9 -\(\)+]*\) *\([A-Za-z]*\)/\1;\2;\3;\4/' sourcefile | while read LINE
do
set -- $LINE
echo &quot;UserID: $1&quot;
echo &quot; Name: $2&quot;
echo &quot; Phone: $3&quot;
echo &quot; Org: $4&quot;
echo
done[/tt]

I've tested it with your sample of the input, which I put in sourcefile, and it does appear to work. It now matches +, ( and ) in the phone number as well.
Annihilannic.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top