Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Can you use awk to search through a .doc file and extrapolate data?

Status
Not open for further replies.

tcerv79

Programmer
Mar 26, 2009
16
US
I have many .doc files that all have the same formatting and hard coded questions (Facility Code, File Name, File Date, Total Records). These questions are always the same in each .doc I am trying to pull the data after each specific set of questions as shown below a:
----------------------------------------- *.DOC FILES
/* NEED THE "ABCD" */
Facility Code: ABCD
/* NEED THE "FILETYPE" */
File Name: 20090722.ABCD.FILETYPE.TXT
/* NEED THE "07/22/2009" */
File Date: 07/22/2009
...........
...........
...........
5. Do the records have the correct number of fields? Yes
Total lines 8
/* NEED THE "7" */ Total records: 7
Total failed format: 0
Total percent failed: 0
----------------------------------------------------EOF

Output to a .txt file should =
ABCD|FILETYPE|07/22/2009|7
CDEF|FILE1234|08/01/2009|12
etc....

THANKS IN ADVANCE!
 


There may be a better way, but try this:
Code:
awk -F: '
/Facility Code/ {if(w>0){print fc"|"fn"|"dt"|"tr;w=0;} else {w=1;fc=$2}}
/File Name/ {split($2,a,"."); fn=a[3];}
/File Date/ {dt=$2}
/Total records/ {tr=$2}
END {print fc"|"fn"|"dt"|"tr;}
' MyFile.doc
[3eyes]

----------------------------------------------------------------------------
The person who says it can't be done should not interrupt the person doing it. -- Chinese proverb
 
Not a huge improvement :) but personally I'd define OFS to make the code look a little neater:

Code:
awk -F: -v OFS='|' '
/Facility Code/ {if(w>0){print fc,fn,dt,tr;w=0;} else {w=1;fc=$2}}
/File Name/ {split($2,a,"."); fn=a[3];}
/File Date/ {dt=$2}
/Total records/ {tr=$2}
END {print fc,fn,dt,tr;}
' MyFile.doc

Annihilannic.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top