Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Extract data between repeating characters & pipe to unique text files 2

Status
Not open for further replies.

aawker

Technical User
May 11, 2011
4
US
I have a text file which contains data like this :-

abc xyz

....variable data containing letters and numbers
spread over multiple lines......................
................................................
................................................
............................................... |

abc xyz

....variable data containing letters and numbers
spread over multiple lines......................
................................................
................................................
............................................... |

.
.
.




So, 'abc xyz' and | repeat more than once in the file and I need to extract the data that follows 'abc xyz' through to |.

I want to read the data file one 'section' (i.e. between the first 'abc xyz' and | set) at a time and send the data relative to that 'section' to a separate file.

Then read the second 'section' of data and send those contents to a different file, etc, etc.

By the way, I can change the | (my attempt at defining a record separator) to something else, I don't know, say, 'endoffile'.

I'm afraid that this task is beyond my awk skills....any help is greatly appreciated. Thanks!
 
Something like this?

Code:
awk '[green]/abc xyz/[/green] { i++ } { [b]print[/b] >> [red]"[/red][purple]aawker.[/purple][red]"[/red]i }' inputfile

It puts each section into a file called awker.n, incrementing n each time. You can omit the "|" separator from the data completely as it only acts on the "abc xyz" marker.

Annihilannic.
 
Wow! Thanks very much!

That is very close to what I am trying to do. By the way, am trying to read up on the i++, hadn't seen that before.

I have to admit to an error in my original question....the pattern of the data is somewhat different from that which I described it as originally.

So, the 'abc xyz' is actually at the bottom of the 'section' of data I want to extract.

So, the data is laid out like this.....

variable data................
spread out over..............
multiple lines...............
.............................
.............................
.
.
.
abc xyz.....variable data....


So, my revised need is to extract everything in each 'section' where each section starts with variable data and ends with a line on which we find 'abc xyz' followed by some additional variable data.

I do need to capture the 'last' line of each 'section' that contains 'abc xyz' because the data on the last line that contains 'abc xyz' is pertinent to the data directly above it. Hope I am stating this clearly.

By the way, I ran your code, and, yes, it does work, no question. Two things, however,

(1) the files being generated are called awker., awker.1, awker.2, etc

This is fine, except, I would like to have these files have the extension of txt (so they can be easily opened up in notepad, etc). I'm hoping that that is possible.

(2) Further, fyi, using the code you provided puts the 'last' line that belongs to awker. at the top of the next file, awker.1 and so on. So, as I mentioned before, I'd like to keep the 'last' line containing 'abc xyz' with the 'section' of data that it belongs with.

Once again many thanks for your input and sorry about the lengthy post!


 
Rearranging Annihilannic's code should do what you want.
Code:
{ print >> "aawker"(i+1)".txt" } /abc xyz/ { i++ }
The statement i++ is shorthand for i=i+1

CaKiwi
 
Thank you! That worked just beautifully. I had worked out the details for renaming the files with a .txt extension but your code does the job all at the same time.

Again, thanks very much for the input!
 
Friends,

I have one final modification to work out......

I need to extract a string of characters available in a specific group of contiguous fields in a specific line from the parsed files and make that extracted text available to sendmail to be used as the subject of an email being sent with the aforementioned parsed text file as an attachment.

Here's the current script......

/usr/bin/awk '{print >> "/abc/def/ghi/parsedfilename"(i+1)".txt"} /demarcation text/ {i++}' /abc/def/sourcefilename.txt
cd /abc/def/ghi
for file in *.txt
do
uuencode "$file" "$file" | mail -v -s "Need to insert extracted text and place here as subject of email" aawker@awkfan.com
done

So, for example, the top of one of the parsed files looks like this.....

SHIPPING DETAIL NAME OF COMPANY INC.


XXXXX XXXXXXXXXXXXXX PO BOX 123456
ACCOUNTS PAYABLE TIMBUKTU HI 00001
1000 CORPORATE PARK DR
MAUI , HI 00000-



Account#: 6713 Dept:05 THIS IS THE TEXT I WANT TO USE
 
Try this, assuming you want all of the Account# line:

Code:
/usr/bin/awk '
    {print >> "/abc/def/ghi/parsedfilename"(i+1)".txt"}
    /^Account #/ {print >> "/abc/def/ghi/subjectfilename"(i+1)".txt"}
    /demarcation text/ {
        close "/abc/def/ghi/parsedfilename"(i+1)".txt"
        close "/abc/def/ghi/subjectfilename"(i+1)".txt"
        i++
    }
    
' /abc/def/sourcefilename.txt
cd /abc/def/ghi                                                                                                        
for file in parsedfilename*.txt                                                                                                             
do                                                           
    uuencode "$file" "$file" | mail -v -s "$(<subjectfilename${file#parsedfilename})" aawker@awkfan.com
done

I added the closes because awk can quickly run out of file handles in this situation if you are processing many files.

Annihilannic.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top