Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

How to fromat the following file in UNIX 1

Status
Not open for further replies.

zubnus

Programmer
Apr 22, 2003
8
US
Hi Gurus,

I have a file with the FOLLOWING CONTENTS
here is a little snapshot of it. The actual file may contains more than 1000 lines. I have also put line numbers against each line. so lets say the following file (line 1-12) name is test.csv.


1.Report for states
2.Alabama
3.Michigan
4.Connecticut
5.MONTHLY REPORT - ALABAMA DIVISION
6.01,987,765,980, 345.00
7.44,56,675,380,450.00
8.MONTHLY REPORT - ALABAMA HQ
9.32,767,876,560,450.00
10.44,876,908,787,430.05
11.
12.END OF REPORT

I want to delete lines which doesn't begins with a number or 'MONTHLY REPORT' next thing is as you see line 5 has MONTHLY REPORT - ALABAMA DIVISION. I want to put on line 6 and 7 AD(ALABAMA DIVISION) at the start of line so line 6 should look like
AD,01,987,765,980, 345.00 and line 7 should look like
AD,44,56,675,380,450.00

line 8 has ALABAMA HQ so line 9 and 10 should look like
AH,32,767,876,560,450.00
AH,44,876,908,787,430.05

line 11 is blank so I need to delete that line
line 12 also should get deleted
in the end the file should look like following

AD,01,987,765,980, 345.00
AD,44,56,675,380,450.00
AH,32,767,876,560,450.00
AH,44,876,908,787,430.05


 
try something like (this UNTESTED)

put this in a sed-cmd file
.... CUT OUT the number and tabs at beginn
of each line, they are just for explanation:

----file begin
1 /^[SpaceTab]*[SpaceTab]//g
2 /^$/d
3 /^MON.*ALABAMA.*DIVISION/,/^[A-z]/{
s/^/99xxxx_xxxxAD,/
}
4 /^MON.*ALABAMA.*HQ/,/^[A-z]/{
s/^/99xxxx_xxxxHQ,/
}
5 /^MON.*ALABAMA.*CITY/,/^[A-z]/{
s/^/99xxxx_xxxxCY,/
}
6 /^[A-z]/d
7 /^99xxxx_xxxx//
----file end
then:

sed -f sec-cmd yourfile | nl -ab > output

not sure about the option of nl
^ is ^ on US-ascii kb shift6
[] are the [] brakets
$ is the $ sign
A-z is uppercase A followed by lowercase z, purges all
ALPHA beginning lines

line 1: SpaceTab is a Space-Char followed by a Tab-Char
this purge leading blancs
line 2: purges empty lines
line 3:
line 4:
line 5: between MON.*ALA.*xxx and the next beginning ALPHA
a 99xxxx_xxxxZZ, will be prepended to the output
make a long string to be sure step 7 will properly
clear it
line 6: all lines beginning by ALPHA are killed
line 7: purge the prependet 99xxxx_xxxx
the nl cmd insert a line number.

your job is to define the lines 3-5, be carefully here, this is
the core-job, if you have a lot of different location:
a) put lines 1-2 in sed-begin
b) put lines 3-5 in sed-loc[1,2,3,4,5,6,7....Z]
this is a big job you have to do ONCE :)
c) put lines 6-7 in sed-end
then

sed -f sed-begin yourfile | sed -f sed-loc1 |sed -f sed-loc2 | ... | sed -f sed-locZ |sed -f sed-end |nl -ab >outfile

sed has a limit: 200 cmds a sed-cmd-file, but unix knows 'pipes'

sure you can do it using awk or perl, i personnaly would write a
c-program using structures.
hope it helps.

 
awk would seem to be the obvious choice...
[tt]
awk '/MONTHLY REPORT/ {n=substr($4,1,1) substr($5,1,1)}
/^[0-9]/ {print n &quot;,&quot; $0}' < file
[/tt]
Tested...
[tt]
AD,01,987,765,980, 345.00
AD,44,56,675,380,450.00
AH,32,767,876,560,450.00
AH,44,876,908,787,430.05
[/tt]
 
Ygor,

Thanks for the awk script, it is working the only problem I have is that few lines in text file begins with address like following this is just an example.

12 Eastborough side

the awk script deletes all lines which begin with MONTHLY REPORT or any other Character, but it is unable to delete the above address lines ( I believe because it begins with a number ). is there a way to delete these lines in your script ??
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top