Parsing lines of continuous characters by column

Flipster · Aug 23, 2002

I have a file consisting of lines that are 140 characters long. Fields are designated by columns. For example... columns 1-3 contain a transaction ID, column 4 is a record ID, columns 5-8 are a name ID....etc.

I would like to either comma delimit or whitespace delimit this file so I can make it readable.

What is the best aproach for this?

vgersh99 · Aug 23, 2002

example lines, plz vlad
+---------------------------+
|#include<disclaimer.h> |
+---------------------------+

Grant · Aug 23, 2002

Hi Flipster,

I think you mean to say that the fields are determined by character position -- not delimeters such as spaces, tabs or commas.

So in a sample record might look like

11123333...

where '111' is the transaction id, '2' is the record id, '3333' is the name id, and so on. And your objective is to add delimiters between the fields.

Assuming you want comma-delimited fields, you could try the following approach:

{
tranid = substr($0,1,3);
recid = substr($0,4,1);
nameid = substr($0,5,8);
...

printf("%s,%s,%s,...\n",tranid, recid, nameid...);
}

Hope this helps,
Grant

Flipster · Aug 23, 2002

I get this

>awk 'commadelimiter' test.log
awk: syntax error near line 1
awk: bailing out near line 1
==========================================================
Am I using the syntax correctly?

Here are the first couple of lines from commadelimiter.

{
transid =substr($0,1,3);
recordid =substr($0,4,1);
name =substr($0,5,4);

CaKiwi · Aug 23, 2002

You need

awk -f commadelimiter test.log

If you are on Solaris, use nawk instead of awk CaKiwi

CaKiwi · Aug 23, 2002

Correction, awk works fine for this program on Solaris (but nawk is also ok) CaKiwi

Flipster · Aug 23, 2002

ROCK AND ROLL!!!

It works!!!

I love this site!!! Thank you all for the help!!!

Aug 23, 2002

Hi Flipster,

It looks like the problem is the way you executed your statement. Based on your reply, it looks like you typed

> awk 'commadelimiter' test.log

at the command line. Try typing

> awk -f commadelimiter test.log

instead.

Or, if you are working in a Unix environment, you might consider adding a shebang line as the first line in your script to make your script directly executable. The shebang line will typically look like this:

#!/usr/bin/awk -f

You will have to set the permissions to make it executable, which you can do in a Unix environment as follows:

> chmod 777 commadelimiter

Afterwards, you should be able to execute the awk script directly using the following syntax:

> commadelimiter test.log

Hope that helps,
Have a nice weekend.

saikrish · Sep 20, 2002

Though this is an 'AWK' forum I thought I shall talk on two other ways of solving the problem posed by Flipster.

Approach 1:-
The 'cut' command could be used to parse the input file into columns. To separate the first column use 'cut' and store the first column data from the entire file into a separate file. Do the same for the other columns. Having done this for all the columns there should be 'n'files each having a unique column data, corresponding to the 'n' columns in the source file. Now using 'paste' command paste each of the files created using 'cut' into a separate file. The 'paste' command has a delimiter option using which each of the columns can be delimited as desired ( a comma or a tab etc)in the final output file.

sample code:-
cut -c1-3 <input file> > <col 1 file>
cut -c4 <input file> > <col 2 file> and so on.

paste <col 1 file> <col 2 file> ... > <final o/p file>

However note that 'paste' can be used at a time only with 12 files.

Approach 2:-
Exploit 'sed' using 'tagged expressions'. Cut 'n' number of characters from the source file where 'n' is equal to the length of the first 9 columns and direct it to a file. Repeat the same if the number of columns to be parsed out exceed 9. Remember each such file will have only the length corresponding to each batch of 9 or less columns( in the penultimate alone).
Take the first file containing the characters corresponding to the first 9 columns and pass it to 'sed'. The command should be as below. Define two variables 'x' and 'y' say as follows as they repeat in the 'sed' command below. As should be evident in the 'sed' command below a '|'(pipe) has been used as a delimiter between each column.

x = "$.\{"
y = "\}$"
sed "s/${x}3${y}${x}1${y}/\1|\2/" <cut file1> > <out file1>

All the <out files> generated in each of the 'sed'
command as governed by the number of columns can be pasted together as briefed in approach one.

Hope the readers find the above two methods to be of use.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Parsing lines of continuous characters by column

Flipster

MIS

vgersh99

Programmer

Grant

Programmer

Flipster

MIS

CaKiwi

Programmer

CaKiwi

Programmer

Flipster

MIS

Guest

saikrish

Programmer

Similar threads

Part and Inventory Search

Sponsor