Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Formatting each file and then joining each file to make one large file 2

Status
Not open for further replies.

hill007

Technical User
Mar 9, 2004
60
US
Hi,

I have a situation where I have large number of files and each of the files needs to be formatted and then all the files sequentially numbered will be merged to one large data file. For example, I have files as follows:

File 1: file_001.dat

A B C D
1 2 3
P Q R S
5 6 7

File 2: file_002.dat
E F G H
1 R 6
5 U Y T
6 1 R

etc...

I want to first format the files to:

file 1: file_001.dat
A B C D 1 2 3
P Q R S 5 6 7

file 2: file_002.dat
E F G H 1 R 6
5 U Y T 6 1 R

etc...

Once I have formatted the files, I want to make one large file containg all the files after they are formatted above:

the final file should look like:
A B C D 1 2 3
P Q R S 5 6 7
E F G H 1 R 6
5 U Y T 6 1 R
etc....

I have a script that will format each of the file individually, but I want to do a batch run where all the files will be sequentially be formatted once running the script.

Here is the script for formatting individual file:
{printf "%s",$0}
NR%2==0{printf "\n"}
END{if(NR%2)PRINTF "\n"}

These scripts works and does the formatting for each file. Can someone modify the script so that each of the files which are numbered sequentially as file_001.dat, file_002.dat ...... to file_547.dat to be formatted in one batch run.

Thanks for any help that you can provide.







 
something to start with:

Code:
FNR==NR { output="file_" sprintf("%.3d", ++fn) ".dat" }

{printf("%s",$0) >> output}
NR%2==0{printf("\n") >> output}
END{if(NR%2)printf("\n") >> output}

vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+
 
change

FNR==NR { output="file_" sprintf("%.3d", ++fn) ".dat" }

TO

FNR==1 { output="file_" sprintf("%.3d", ++fn) ".dat" }

vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+
 
Hi vgersh99,

The code is not working. I am running the following command : awk95 -f format.awk

where format.awk is the code that you suggested above. Its hanging up.

 
It's waiting for your input filenames. Try this.
awk95 -f format.awk file_[0-9][0-9][0-9].dat

This will make it take all files that begin with "file_", followed by 3 digits, followed by ".dat".
 
Hi mikevh,

When I am trying the command, it says cannot find the files and say's source line number 4.
 
Please post your code, and if possible the exact error message.
 
C:\AWK95>awk95 -f mfile_format.awk file_[0-9][0-9][0-9].dat
awk95: can't open file file_[0-9][0-9][0-9].dat
source line number 4

here is the source code:

FNR==1 { output="file_" sprintf("%.3d", ++fn) ".dat" }
{printf("%s ",$0) >> output}
NR%2==0{printf("\n") >> output}
END{if(NR%2)printf("\n") >> output}
 
Maybe your awk can't handle globbing that file expression. I'm using gawk, and it works for me.

Let's try this:
awk95 'FNR==1{print FILENAME}' file_[0-9][0-9][0-9].dat | more

and this
awk95 'FNR==1{print FILENAME}' file_*.dat | more

These should list all your input file names. Does either of them work?

Looking at the script, I see another issue. It looks like this will put the output for each file at the end of the existing file, which I don't think is what you want. You want all your output in one big file, right? Try the 2 little scripts above, then we'll get back to this.



 

No, it do not works. Yes, I want all the output in one big file.

C:\AWK95>awk95 -f 'FNR==1{print FILENAME}' file_[0-9][0-9][0-9].dat | more
awk95: can't open file 'FNR==1{print
source line number 1 source file 'FNR==1{print
context is
>>> <<<


C:\AWK95>awk95 -f 'FNR==1{print FILENAME}' file_*.dat | more
awk95: can't open file 'FNR==1{print
source line number 1 source file 'FNR==1{print
context is
>>> <<<
 
Try those again without the -f. -f is used to specify the name of a program file. In those 2 things I asked you to try, there is no program file. Note that there's no -f in my post.
 

BEGIN { ORS="" }
{ out = "new" FILENAME
if ( NR % 2 )
a = z = ""
else
{ a = " "; z = "\n" }
print a $0 z > out
}
END { if (NR%2) print "\n" > out }


Save as adhoc.awk and then type:
awk95 -f adhoc.awk file*.dat

For file_001.dat, newfile_001.dat will be created;
for file_002.dat, newfile_002.dat will be created,
etc.

To join all output files into one large file
(if using DOS):
copy /b newfile*.dat all.dat
 
This will avoid your file-globbing problem, as it doesn't require you to specify files on the command line:
Code:
BEGIN {
    for (i=1; i<=547; i++) {
        input = "file_" sprintf("%03d", i) ".dat"
        out = ""
        lineno = 0
        while (getline line <input) {
            out = out? out OFS line: line
            if (++lineno % 2 == 0) {
                print out
                out = ""
            }
        }
        if (out) print out
        close(input)
    }
}
I tried this with the 2 files you specified in your original post, and it worked fine. Output:
Code:
A B C D 1 2 3
P Q R S 5 6 7
E F G H 1 R 6
5 U Y T 6 1 R
This looks like what you asked for. Give it a try.
awk95 -f yourscriptname > youroutputname

HTH

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top