Text file Parsing .....

nani123nani · Dec 7, 2002

Hi guys,

I am very new to the shell scripting ....
I have a text file with 4 columns and all colums are seperated by some space ...

My text file is like ..................
------------------------------------------------------------

FirstName
LastName
PhoneNumber Group
------------------------------------------------------------

David
Ked 345-234-
1234 Dba

Edward
Nas
234.123.5673 WDS

Sdert
Fuji 91 23
45678 ABC

serty
asder 01 34 456
345678978 KLS

cderfg
Sddd 123
345 VBC

Asasasasas
QWER 2154-589-
9656 asd

------------------------------------------------------------

I need to store parse above file like need to store all individual information in seperate attributes.
I need to capture individual information, such as firstname . second name, ph number and group in seperate fileds. I tried by CUT command, since some of phone numbers having spaces in between , I am not able to capture phone numbers properly.
Can some one help me How to capture phone numbers and also other informations.

I appreciate your help!

Thanks in advances ....

gregor weertman · Dec 7, 2002

Maybe you can do it like this.
It is a slow way but very easy to understand.

It only works when the positions stay the same.

while read xx
do
a=`echo $xx|awk '{ print substr( $0, 1, 48)}'`
b=`echo $xx|awk '{ print substr( $0, 49, 45)}'`
c=`echo $xx|awk '{ print substr( $0, 92, 53)}'`
d=`echo $xx|awk '{ print substr( $0, 145, 20)}'`
echo $a;$b;$c;$d
done < 4colfile

Regards Gregor Gregor.Weertman@mailcity.com

nani123nani · Dec 7, 2002

Thanks for you tip ...
But some lines I have like below in my text file ...
---------------------------------------------------------
David Ked 345-234-1234 Dba
Edward Nassw 234.123.5673 WDS
Sdert Fuji 91 23 45678 ABC
serty asder 01 34 456 345678978 KLS
---------------------------------------------------------
I am facing difficulty to parse in above case ....

gregor weertman · Dec 8, 2002

Now I’m going to cheat.
I assume the phone number has only 4 fields at most.
It makes the field separator “;”.

awk '{
printf( "%s;", $1)
sub( $1, "&quot

printf( "%s;", $1)
sub( $1, "&quot

printf( "%s;", $NF)
sub( $NF, "&quot

print $1, $2, $3, $4
}' 4colfile

Regards Gregor Gregor.Weertman@mailcity.com

Annihilannic · Dec 8, 2002

How about this:

[tt]#!/usr/bin/ksh

IFS=';'
sed -e 's/$[A-Za-z ]*$ *$[0-9 -.]*$ *$[A-Z]*$/
\1;\2;\3/' 4colfile | while read LINE
do
set -- $LINE
echo $1
echo $2
echo $3
echo
done[/tt]

It's not very easy to read, but seems to do the job!

sed is used to match the sets of characters you are likely to find in names and phone numbers and separates them using semicolons. The shell IFS (Inter Field Separator) variable is set to semicolon, so the set -- $LINE command separates them into $1, $2, $3, etc.

Annihilannic.

vgersh99 · Dec 8, 2002

nawk -f customers.awk customers.txt

#------------------- customers.awk
BEGIN{
OFS=","
}
{
phone=""
for(i=3; i <= NF-1; i++)
phone=phone $i ((i < NF-1) ? FS : "&quot

printf("%s%s", $1, OFS);
printf("%s%s", $2, OFS);
printf("%s%s", phone, OFS);
printf("%s\n", $NF);
} vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+

nani123nani · Dec 16, 2002

None of the above tips working ....
My text file is like below ...
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++
UserId FNAME LNAME Phone ORG
-----------------------------------
PZ44GK Gery Kissel 8-353-4149 ATV
SZM72N Greg Helzer 8-353-4167 ATV
CZHRZG Jeff Purdue 8-226-7511 EICC
FZW2WV Phillip Louey +61 3 9647 5520 HOLDENS
KZ20FH James Davies +61 3 9647 1420 HOLDENS
BZYB7C Andrew Brenz 8-(810) 236-0598 CLCD
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

From above text file as a Input, I need to capture all individual user information attributes seperately like Phone number, UserId, FNAME etc.
I am able to parse few lines which have phone number is single string with out spaces in between. But I am not able to parse phone numbers which have spaces ...

Any suggestions .......

Thanks in Advance ....

MikeLacey · Dec 17, 2002

nani,

Can you guarantee that the last field (ORG) is always going to be a single word? Mike

Want to get great answers to your Tek-Tips questions? Have a look at faq219-2884

kasparov · Dec 17, 2002

Do I understand correctly? You may have more than one space between fields but also you may have 1 space between parts of a phone number? This is how I do things like this - you need to identify a charactter which DOESN'T appear in your file (I've used ']') & amend all occurrences of 2 or more spaces to that character, then use awk to separate the fields. (There are 3 spaces in the first part of the sed edit)

cat textfile | sed 's/ */]/g' | awk -F] '{ printf "UserID: %s; FName: %s; LName: %s; Phone: %s; ORG: %s\n", $1, $2, $3, $4, $5 }'

HTH

(PS - I know it's a UUOC but I find this format easier to read)

vgersh99 · Dec 17, 2002

BEGIN{
OFS=","
}
{
phone=""
for(i=4; i <= NF-1; i++)
phone=phone $i ((i < NF-1) ? FS : "&quot

printf("%s%s", $1, OFS);
printf("%s%s", $2, OFS);
printf("%s%s", $3, OFS);
printf("%s%s", phone, OFS);
printf("%s\n", $NF);
}
vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+

nani123nani · Dec 17, 2002

Yes, all the fields except phone number going to be a single word. I need to read line by line and capture all individual entities seperately.

Annihilannic · Dec 17, 2002

It's no surprise that none of the above tips worked when the format of the file changed!

Here's my solution again updated for the new format:

[tt]#!/usr/bin/ksh

IFS=';'
sed -e 's/$[A-Za-z]* $$[A-Za-z ]*$ *$[.0-9 -\($+]*\) *$[A-Za-z]*$/\1;\2;\3;\4/' sourcefile | while read LINE
do
set -- $LINE
echo "UserID: $1"
echo " Name: $2"
echo " Phone: $3"
echo " Org: $4"
echo
done[/tt]

I've tested it with your sample of the input, which I put in sourcefile, and it does appear to work. It now matches +, ( and ) in the phone number as well.
Annihilannic.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Text file Parsing .....

nani123nani

Programmer

gregor weertman

Programmer

nani123nani

Programmer

gregor weertman

Programmer

Annihilannic

MIS

vgersh99

Programmer

nani123nani

Programmer

MikeLacey

MIS

kasparov

Programmer

vgersh99

Programmer

nani123nani

Programmer

Annihilannic

MIS

Similar threads

Part and Inventory Search

Sponsor