Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

One File to be broken up into parts? 2

Status
Not open for further replies.

wellster34

Programmer
Sep 4, 2001
113
CA
Hi,

I have a data file like the following:

BEGIN 1ST FILE|
1|2|3|4|5
6|7|8|9|10
END 1ST FILE|
BEGIN 2ND FILE|
A|B|C|D|E
F|G|H|I|J
END 2ND FILE|
BEGIN 3RD FILE|
.....ETC....

Is there a way to extract the data just for the 1ST File and then 2ND File and etc...?

I need to ouput the 1ST FILE to a file called 1.dat
Then the 2ND FILE to the file called 2.dat and the 3RD FILE to 3.dat and so on.

I understand I might have to run the same command but different criteria. I tried the following but it did not work:

sed '/BEGIN 1ST FILE/d;/END 1ST FILE/q' main.dat > 1.dat

sed '/BEGIN 2ND FILE/d;/END 2ND FILE/q' main.dat > 2.dat

The 1.dat works!!! Yeah! But when the 2.dat was created, it contains all the data from the 1.dat too... :-(


I was curious if anyone had any ideas on how to resolve this? Any help is greatly appreciated!

Thanks for your time.
 
I would use awk like this:

Code:
awk '
        /BEGIN/ { FILE=$2 ;  sub("..$","",FILE) ; FILE=FILE ".dat" ; next }
        /END/ { close(FILE) ; next }
        { print >> FILE }
' inputfile

Annihilannic.
 
nawk -f well.awk main.dat

well.awk:
Code:
/^BEGIN/ { gsub(/[^0-9]/, "",$2); out=$2 ".dat"; next }
/^END/ { next }
{ print > out }

vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+
 
yeah, need to close the file!
Code:
/^BEGIN/ { gsub(/[^0-9]/, "",$2); out=$2 ".dat"; next }
/^END/ { close(out); next }
{ print > out }

vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+
 
Wow... fast replies!!

Thanks for your help. It works!!!
 
wellster34, here the correct syntax for your sed:
sed '1,/BEGIN 2ND FILE/d;/END 2ND FILE/q' main.dat > 2.dat

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ181-2886
 
Annihilannic,

I was curious if you could please explain the file name creation... When I run it, I was wondering where it gets the name. I tried with different names,

i.e.
BEGIN ACCOUNT|
END ACCOUNT|
BEGIN PCC|
END PCC|

The file names are P for the PCC and ACCOU for the ACCOUNT?
 
PHV,

So, if I have multiple files... I just keep increasing the numbers?

sed '2,/BEGIN 3RD FILE/d;/END 3RD FILE/q' main.dat > 3.dat

 
No, delete from line 1 to BEGIN line:
sed '1,/BEGIN 3RD FILE/d;/END 3RD FILE/q' main.dat > 3.dat

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ181-2886
 
wellster34 said:
I was curious if you could please explain the file name creation... When I run it, I was wondering where it gets the name. I tried with different names,

i.e.
BEGIN ACCOUNT|
END ACCOUNT|
BEGIN PCC|
END PCC|

The file names are P for the PCC and ACCOU for the ACCOUNT?

I have added comments to the first part of the code to explain:

Code:
# For every line that matches "BEGIN"
/BEGIN/ { 
    # Set the FILE variable to the value of the second field
    FILE=$2
    # Replace the last two characters of the filename
    # with nothing (to remove "ND" from 2ND, "RD" from
    # 3RD, etc.)
    sub("..$","",FILE)
    # Append ".dat" to the filename.
    FILE=FILE ".dat"
    # Skip to the next record
    next
}

Obviously that code may only apply to the original example you provided, you could probably just take out the "sub" part to create "ACCOUNT.dat", "PCC.dat", etc. if that's what you want.

Annihilannic.
 
You could also use csplit to split files based on context...
Code:
csplit main.dat '/^BEGIN/' '{*}'
This creates a series of files xx00, xx01, xx02, etc. If you want to rename to 00.dat, 01.dat, 02.dat, etc...
Code:
for i in xx??; do mv $i ${i#xx}.dat; done
[tt]==> 01.dat <==
BEGIN 1ST FILE|
1|2|3|4|5
6|7|8|9|10
END 1ST FILE|

==> 02.dat <==
BEGIN 2ND FILE|
A|B|C|D|E
F|G|H|I|J
END 2ND FILE|

==> 03.dat <==
BEGIN 3RD FILE|
.....ETC....[/tt]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top