Problem creating tab delimited file

lmbylsma · Nov 16, 2003

I have the script that prints out a tab delimited file. However, when I open it in Unix it puts these ^M symbols after 2 of the columns. When I open it in windows in notepad they appear as rectangle boxes instead, and in Excel it interprets it as go to a new line, so my columns get all out of alignment.

Here is an example of the output I get as viewed from Unix:

1.37657^M 0 16.37657^M 16.37657
3.92090^M 0 19.26123^M 19.26123
10.37657^M 1 22.21257^M 22.21257
50.37657^M 1 214.23345^M 214.23345
159.56789^M 0 238.11100^M 238.11100
500.26123^M 0 287.04997^M 287.04997
6056.12612^M 1 301.06797^M 301.06797

Here is the portion of code that prints the information to the file:

open(OUT, "> $out_file&quot

;

for my $i(0..$#Array32){

print OUT "\n$Array1920[$i]\t$Array1314[$i]\t$Array32[$i]";

}

close(OUT);

Any ideas what the problem is? What baffles me is that it happens on some columns and not others. Also that first column when I used to print it as the last column it didn't have the M^'s but when I moved it to first position then it started doing the M^'s. It doesn't make any sense to me.

-Lauren

chazoid · Nov 16, 2003

Are you using chomp() after reading in data from the input file? My guess is that the column with the ^M is at the end of the line in the data. ^M being a carriage return control code. Chomp will remove it - something like:

while (<INP>){
chomp;

# etc...

}

lmbylsma · Nov 17, 2003

Yes I am using chomp. And if that was the problem, it wouldn't make sense that the M^s would appear for one of the columns when I print it out in one order and go away when I print them out in another order. Here is the full text of my code:

#!/usr/bin/perl -w
use strict;
use warnings;
use diagnostics;

#my $datafile='matt6-03.txt'; #input filename
#open(DATA, $datafile) || die qq(Can't open "$datafile" for input...$!\n);

my $col2=99;
my @Array13;
my @Array14;
my @Array1314;
my @Array19;
my @Array20;
my @Array32;
my @Array1920;
my @Final;
my $x19=0;
my $y19=0;
my $x20=0;
my $y20=0;
my $x13=0;
my $y13=0;
my $x14=0;
my $y14=0;
my $diff=0;

while (<>) {

chomp; #remove newline from $_

if (/^"CHANNEL" "13"/) { #set $col2 depending on whether we've seen '"CHANNEL" "19"' or '"CHANNEL" "20"'
$col2 = 4;
}elsif (/^"CHANNEL" "14"/) { #this tells us which array to push to, and is also the data we want in the second column
$col2 = 5;
}elsif (/^"CHANNEL" "19"/) {
$col2 = 1;
}elsif (/^"CHANNEL" "20"/) {
$col2 = 0;
}elsif (/^"CHANNEL" "32"/) {
$col2 = 3;

}

if (/^\d+\.\d+/) { #if we've seen proper numeric pattern, push $_ to appropriate array based on $col2

if ($col2 == 1) {
$y19=$_;
$diff=$y19-$x19;

if($diff>1.2){
push @Array19, "$_\t\t$col2";
}
$x19=$_;

} elsif ($col2 == 0) {
$y20=$_;
$diff=$y20-$x20;

if($diff>1.2){
push @Array20, "$_\t\t$col2";
}
$x20=$_;

} elsif ($col2 == 3) {
my @comp = split(/\t/, $_); #Remove the "????" and 0s
$_ = "$comp[0]\t$comp[2]";
push @Array32, "$_";

}elsif ($col2 == 4) {
$y13=$_;
$diff=$y13-$x13;

if($diff>1.2){
push @Array13, "$_";
}
$x13=$_;

}elsif ($col2 == 5) {
$y14=$_;
$diff=$y14-$x14;

if($diff>1.2){
push @Array14, "$_";
}
$x14=$_;

}

}
}

$"= "\n";

push (@Array1920, @Array19, @Array20); #Put @Array19 and @Array20 into @Final
push (@Array1314, @Array13, @Array14);

@Array1314 = sort { $a <=> $b} @Array1314; #sort @Array1314 numerically

my $len = 15;
my $i = 0;

@Array1920 = map { $Array1920[substr($_,$len)] } #sort @Final numerically
sort map { sprintf("%${len}s",(split /\t/)[0]).$i++ }
@Array1920;

for my $i(0..$#Array32){ #Add value in 32 only if corresponding value in 1314
if ($Array32[$i]=$Array1314[$i]){
push @Final, "$Array32[$i]";
}
}

#ask user for output filename
print "\nPlease enter the filename where you wish to store the results:\n";
my $out_file = <>;
chomp($out_file);

#print data to output file
open(OUT, "> $out_file&quot

or die "Failed to open $out_file ... $!";

for my $i(0..$#Array32){

print OUT "\n$Array1920[$i]\t$Array1314[$i]\t$Array32[$i]";

}

close(OUT);

print "\nDone!";

chazoid · Nov 17, 2003

I don't have time at the moment to look over your code, but another thing to check is how you're transferring the input and output file between platforms. I don't work with unix often, but my understanding is that if you're using FTP, you want to transfer in ascii to properly translate the cd/lf's since windows uses x0D0A and unix uses just x0D (I think)
If you have a hex editor, look at your input file to see what's at the end of each line

lmbylsma · Nov 17, 2003

I"m running the program in Unix. Then after it generates the output.txt file I open it with Pico and I see the M^'s. In windows after I transfer it, it interprets those in Excel as going to a new line. Transferring in ASCII makes no difference since the M^s are also there in Unix.

raklet · Nov 17, 2003

I don't know where they are coming from, but the problem is definitely related to newline characters. Unix only puts in a \n (linefeed) to indicate newline. Windows on the other hand, uses \n\r (linefeed and carriage return) to indicate newline. Windows does not know how to interpret a single \n so it print s out ^M instead. Notepad just shows the little block symbol.

lmbylsma · Nov 17, 2003

No, Unix is doing the M^'s, Windows is doing rectangle boxes. If Unix was interpreting it as a \n then it should be going to a new line rather than printing the M^

chazoid · Nov 17, 2003

You didn't mention whether or not you've been transferring files in ascii or binary - Also, do you see the ^M's when you view the input file in pico?

from a couple google searches:

" I have some HTML files that have been written on a PC/Windows system. Before uploading these files to the virtual server, should I convert the text to UNIX text or leave it in another form? Basically, I am asking does my text need to be converted in order to run on the Virtual Server? "

When you upload the files with your FTP client, just be sure that you upload all HTML files and CGI source code in ASCII mode. Upload all binary files such as images and compiled CGI programs in BINARY mode. The FTP client will automatically convert the text when transferred in ASCII mode.

===================

NOTE : VI is a better editor for scripts because, unlike pico, it can see the embedded end-of-line markers (^M) that windows editors leave behind at the end of each line, that invariably cause various scripts and programs to interpret them as actions, causing havoc.

lmbylsma · Nov 17, 2003

I already mentioned this. I do see the M^'s in Unix when using Pico. I also see the M^'s in emacs. Transferring in ASCII to windows doesn't help as the M^'s are already there in Unix.

chazoid · Nov 17, 2003

What I'm getting at is you may have mangled newlines in the input file which then end up in the output. Did the input file ever exist in windows, or was it transferred back and forth at any point?
In windows, if you chomp a string ending with x0D0D0A, it only removes the 0D0A, something similar could be happening on the unix box.
I suggest looking at everything in hex on both platforms to figure out what's happening.

raklet · Nov 17, 2003

>>No, Unix is doing the M^'s, Windows is doing rectangle boxes.

Oops. Got those backwards. Thanks for the heads up.

lmbylsma · Nov 17, 2003

I'm not sure on what platform the input files were generated, we received them from a colleague. I have no idea how to look at everything in hex. However, if that is the problem, that it only removes the 0D0A then how do I chomp it so that it removes the entire x0D0D0A? Alternatively, how can I convert the input files to the approrpriate format so there won't be this problem, if it is the case that they were transferred across platforms?

mactonio · Nov 18, 2003

try changing your print statement form
print OUT "\n$Array1920[$i]\t$Array1314[$i]\t$Array32[$i]";
to
print OUT "\n",$Array1920[$i],"\t",$Array1314[$i],"\t",$Array32[$i];

sometimes its the simple stuff

lmbylsma · Nov 18, 2003

Nope, that results in exactly the same thing. Still have the M^'s

mactonio · Nov 18, 2003

sorry can't tell why those are showing up, on another issue
this if statement:
for my $i(0..$#Array32){ #Add value in 32 only if corresponding value in 1314
if ($Array32[$i]=$Array1314[$i]){
push @Final, "$Array32[$i]";
}

you have '=' instead of '==' inside of the if statement

chazoid · Nov 18, 2003

I'm wondering if this has anything to do with it - $"="\n";
It's a special variable for 'list separator' but I don't really understand what it's purpose is. Try commenting it out and see if it makes any difference.

I'm pretty sure emacs will display hex, but if not, someone else (ircf) posted some code here recently which works very well for displaying the contents of a file in hex - try running it through this, then post some of the output here. (something containing 0d's and/or 0a's)

open F, 'matt6-03.txt';
binmode F;
undef $/;
$data = <F>;
close F;

$hex = unpack 'H*', $data;
push @list, substr $hex, 0, 2, '' while($hex ne '');
print join ' ', @list;

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Problem creating tab delimited file

lmbylsma

Programmer

chazoid

Technical User

lmbylsma

Programmer

chazoid

Technical User

lmbylsma

Programmer

raklet

MIS

lmbylsma

Programmer

chazoid

Technical User

lmbylsma

Programmer

chazoid

Technical User

raklet

MIS

lmbylsma

Programmer

mactonio

Programmer

lmbylsma

Programmer

mactonio

Programmer

chazoid

Technical User

Similar threads

Part and Inventory Search

Sponsor