Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

annoying carriage return with paste and awk instructions

Status
Not open for further replies.

pathfinderpathfinder

Technical User
Feb 14, 2008
5
ES
Hi everybody,

First post here. I put all my hopes here, I m running out of time to get a file, and can t go further now, there s something wrong in my files.

I used a lot of times this instruction
paste M1_CGK025_K1K2K3T0T1T2T3.dat M1_CGK025_ITSIM.dat | awk 'BEGIN{OFS=" "};NF==9 {for (i=1; i<=NF; i++) printf ("%s ", $i);printf("\n") }' > M1_CG_K025_KTITSIM.dat

first file has 7 columns, 2nd file has 2 columns.
I wanted to paste both files, in a newfile (output file) of 9 columns
(50 lines in each of the files)

This works really well: the resulting file, M1_CG_K025_KTITSIM.dat (let s call it fileA) opened in matlab gives me a nice 50 * 9 matrix.

Now, I have another file, fileB. I can open it in matlab again with a simple load instruction, and it has 50 *8287 elements.

OK. So I want to repeat the same instruction, with paste and awk, to have a brand new fileC, whose ideal size loaded in matlab should be 50* (8287+9).

So i paste the last instruction:
paste fileA fileB | awk 'BEGIN{OFS=" "};NF==8296 {for (i=1; i<=NF; i++) printf ("%s ", $i);printf("\n") }' > fileC.dat

It seems to work, but actually matlab doesn t accept a LOAD on it, as it pretends the file has not same number of columns in line2...
And, actually, matlab is F right: If I have a look at the resulting file, it has a first line of 9 elements, and a second line of 8287.
The third has 9, then 8287.
Etc.

I m really out of ideas, I can t see how it bugs here, I m defeinitely not familiar with awk, sed or paste, but this worked till now. I have no clue and I m in a huge hurry right now... (3 days to create the whole bunch of files but reaching this point and it doesnt do the jobbbbb)

How can I sort this out?
I mean, I don t mind if I have to rearrange the fileC, or if there s a nice instruction from fileA and fileB, but pleaaaaase, it would be so kind if one of you could give me a hand...

Many thanks,

iorga

(files are too big for editors)
 
You may have stumbled on the cause of the problem in the title of your post. What operating system are you working from? Waht operating system did the input files come from? It's possible that one of the files you are using in the second case is a DOS/Windows format file which has CR/LF line terminators. If you are doing this on Unix, which only uses LF line terminators, then it may be causing your problem. Try using a dos2unix utility to convert the file to Unix format, or tr -d '\r' < windowsfile > unixfile if dos2unix is not available.

Failing that, you may be running into limitations with the version of awk, although the symptoms don't sound right for that... if you can tell us what operating system you are working on and what version of awk you're using that would help.

Annihilannic.
 
Thanks for your answer.
The OS used with AWK and paste is Ubuntu Gutsy 64 bits.
But the files created were created in fortran compaq visual studio, running in windows.
All my files were generated in a windows environment.
All the files were sent via sftp to a linux machine.
And through paste and awk, i created new files. All files being ok.
Finally, after many manipulations on these files, I got file1 and file2. And did again the same thing.
But it did not work then.

Is it still an OS problem?
I tried to convert one of the files, or both files, but i had no correct results. truth to tell, I had no clue of which one I should convert, I tried the two, one, second one,... no definite results.
 
You should convert any of the files that are not in the Unix format. You can use the file command to find out whether they have been converted correctly, e.g.

[tt]$ file windowsfile
windowsfile: ASCII text, with CRLF line terminators
$ unix2dos < windowsfile > unixfile
ksh: unix2dos: not found
$ tr -d '\r' < windowsfile > unixfile
$ file unixfile
unixfile: ASCII text
$[/tt]

As you can see on my system unix2dos was not available so I used the alternative.

I think the problem must be with the data files because the GNU awk doesn't seem to have any reported limitations of number of fields. I created a file containing 18000 fields per line and it processed that fine. I'm using GNU awk 3.1.3 (check using the awk --version command) on SLES9 to test.


Annihilannic.
 
$ file fileA.dat

fileA.dat: ASCII text, with very long lines, with no line terminators

$ file fileB.dat
fileB.dat: ASCII text, with CR, LF line terminators


There lies the problem: you pointed it perfectly!
fileA is weird, isn t it?
Though it s not a windows file, because when i edit it in windows with a tool like UltraEdit, it tells me whether I want to convert it to DOS format.

 
Ha! UltraEdit... a great editor; shame about the additional licence fees required to upgrade to newer versions or I'd have kept using it. It's Notepad++ for me now...

fileA is weird, it just sounds like a very long string of numbers all on the same line. If they are supposed to be separated into lines of 8287 fields, then that explains why paste isn't behaving as you expect.

How many lines of 8287 records were you expecting to be in the file?

You could try separating it into records using something like this:

Code:
awk '{ for (i=1; i<=NF; i++) { printf $i" "; if (!(i%8287)) { print "" } } } END { print "" }' fileA > fileA.fixed


Annihilannic.
 
?? Error using ==> load
Number of columns on line 2 of ASCII file pepito.new
must be the same as previous lines.

I tried what you said.
fileAfixed works fine, and loads in matlab like I want it to be loaded.
though:
$ file fileA.fixed
fileA.fixed: ASCII text, with very long lines, with no line terminators

thus, when I paste, same old, same old...
thanks for your help
i finally did the trick using matlab and a perl script a guy gave me. Thanks Eric!

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top