Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Convert fixed format text output to CSV 2

Status
Not open for further replies.

pfildes

Programmer
Jun 17, 2004
54
GB
I have a fixed format text file which I need to convert to CSV. As this will need to be done repeatedly can you please tell me how I can do this via scripting?

My file currently looks something similar to this below;

[tt]Company Emp_ID Surname First_Name
1 123456 Bloggs Fred
1 123457 Smith John
[/tt]

I'd like for the output to be converted so that it looks similar to this;

[tt]Company,Emp_ID,Surname,First_Name
1,123456,Bloggs,Fred
1,123457,Smith,John
[/tt]

I understand AWK can do this, however I am not a whizz with this and would very much appreciate some help
 
One way:
Code:
sed 's!  *!,!g' /path/to/fixed >output.csv

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886
 
Hi

You forgot to mention which AWK implementation are you using. This solution is for [tt]gawk[/tt] :
Code:
awk 'NR==1{s=$0;while(match(s,/^[^ ]+ +/,a)){f=f RLENGTH" ";s=substr(s,RLENGTH+1)}FIELDWIDTHS=f 2010}{for(i=1;i<=NF;i++)printf"%s%s",gensub(/ +$/,"","",$i),i<NF?",":"\n"}'


Feherke.
 
Another way:

tr -s ' ' ',' < fixed.txt > comma.csv

The internet - allowing those who don't know what they're talking about to have their say.
 
Ken, not all [tt]tr[/tt]s allow squeezing and translations in one go...

In such cases:
[tt]tr -s ' ' <file.txt | tr ' ' ',' >file.csv[/tt]

HTH,

p5wizard
 
Hi

pfildes said:
I have a fixed format text file
While the OP explicitly mentioned "fixed format", I suppose that is the only sure thing.

Simple space to comma replacing will fail on column values containing space :

[tt]Company Emp_ID Surname First_Name
1 123456 Bloggs Fred[highlight] [/highlight]Jack
1 123457 Smith[highlight] [/highlight]&[highlight] [/highlight]Wesson John[/tt]

Of course, PHV's [tt]sed[/tt] code can be changed to replace only sequences of two or more spaces, but that is also based on the assumption that values will not contain two or more consecutive spaces.

So I would go with [tt]awk[/tt]. ( Not necessarily with [tt]gawk[/tt], my code can be easily modified to work with standard [tt]awk[/tt]. )

Feherke.
 
Good points both, thanks.

The internet - allowing those who don't know what they're talking about to have their say.
 
Well, a legacy awk way:
Code:
awk '
NR==1{n=NF;for(i=1;i<=n;++i){s[i]=index($0,$i);l[i-1]=s[i]-s[i-1]}}
{for(i=1;i<n;++i){x=substr($0,s[i],l[i]);sub(/ *$/,"",x);printf "%s,",x}print substr($0,s[n])}
' /path/to/fixed >output.csv

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886
 
I implemented the most recent AWK code. That has worked for me and resolved my predicament. [thumbsup2]

Thanks to PHV & also to those who offered their advice.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top