adding field separator 3

omoo · Jul 12, 2005

Hi,

I have the following data in the format as shown(note: there are more than 1 blank spaces between each field and the spaces are not uniform, meaning there can be one blank field between field1 and field2 and 3 spaces between field3 and field4):
________________________________________
num1a num2a num3a num4a num5a
num1b num2b num4b
num2c num3c num4c num5c
num2d num3d num4d num5d
num1e num2e num3e num4e num5e
num1f num5f
________________________________________

i need to include ; as the field delimited and the required output is:
________________________________________
num1a;num2a;num3a;num4a;num5a
num1b;num2b;;num4b;
;num2c;num3c;num4c;num5c
;num2d;num3d;num4d;num5d
num1e;num2e;num3e;num4e;num5e
num1f;;;;num5f
________________________________________

I tried using
sed -e 's/ */;/g'
but it will take consecutive blank fields as one blank field. Can anyone help?

stefanwagner · Jul 12, 2005

is the fieldlenght fixed, like

Code:

123456 123456 123456
1      123    12
123    1      1

you may do a

Code:

sed 's/ {6}/;/g;s/ */;/g;'

seeking a job as java-programmer in Berlin:

http://home.arcor.de/hirnstrom/bewerbung

duncdude · Jul 12, 2005

omoo

that is a nasty data file

i reckon this data structure is so ambiguous it will be almost impossible to solve this with no problems

BUT - i would imagine it would be best to iterate through the file once to find a line that contains the most values - and then assume that this line (or lines) with the most matches are a full complement of values - and then work from that basis

it is important this is understood - i.e. we can easily look at your data and visually see that there 5 values max. But what if you had the following:-

Code:

num1a           num3a   num4a   num5a
num1b                   num4b
                num3c   num4c   num5c
                num3d   num4d   num5d
num1e           num3e           num5e
num1f                           num5f

... does column 2 exist - or not!?

having a structure that has a indefinite number of spaces between the columns makes this very prone to error

Kind Regards
Duncan

feherke · Jul 12, 2005

Hi

Get it to a "regular" format filling the missing fields with a spaceholder, let say [tt]\t[/tt], then is easyer:

Code:

sed -e 's/^ \{5\}/\t/;s/ \{8\}/ \t/g;s/ \+/;/g;s/\t//g' file

Feherke.

http://rootshell.be/~feherke/

duncdude · Jul 12, 2005

Hi Feherke

Please explain, exactly, what a 'field' is in the above example

Kind Regards
Duncan

omoo · Jul 12, 2005

some of the field lengths are fixed while some are not. this is the problematic part.

yes it is possible that there are no data in the entire field. and it is the difficult part that i am trying to solve.

i cannot get it to a "regular" format because i output file always cannot detect the absence of some of the fields.

the fields in the example numbers, alphabets, dates or blank. for exmaple, TM5362F0000D-NBZ5, 4500727101, CR4900.00, 24, 06/29/2005.

omoo · Jul 12, 2005

the number of fields in each line are the same, but some of them can be blanks

vgersh99 · Jul 12, 2005

omoo,
youforgot to mention [from the other forum] that the number of fields is always the same on every line.

vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+

duncdude · Jul 12, 2005

i think this would be alot easier if you posted a chunk of your data - a few lines preferably

Kind Regards
Duncan

feherke · Jul 12, 2005

Hi

Duncan wrote:

Please explain, exactly, what a 'field' is in the above example

Just as in [tt]awk[/tt]. My first idea was to use [tt]awk[/tt] and set the [tt]FIELDWIDTHS[/tt] variable. Maybe this is why for me was obvious to use the field word.

Feherke.

http://rootshell.be/~feherke/

futurelet · Jul 12, 2005

omoo said:
there are more than 1 blank spaces between each field and the spaces are not uniform, meaning there can be one blank field between field1 and field2 and 3 spaces between field3 and field4)

some of the field lengths are fixed while some are not. this is the problematic part.

the number of fields in each line are the same, but some of them can be blanks

You don't have a valid data file. The field-delimiting information is gone.
It's impossible to restore it.

omoo, I have a word. Some of its letters (I don't know how many) have
been removed. I am left with "at". What was the original word?
Hat? Catch? Watch? Attach? What? Attitude?

Consider this possible line from the file (for clarity, spaces have been replaced with "."):
[tt]......foo......[/tt]

Let's change the spaces that make up fields to x's:
It could be like this:
[tt]xxxx..foo.x.x.x[/tt]

which should be converted to
[tt];foo;;;[/tt]

Or it could be like this:
[tt]xx.x..foo..x..x[/tt]

which should be converted to
[tt];;foo;;[/tt]

Or it could be like this:
[tt]x.x.x.foo...xxx[/tt]

which should be converted to
[tt];;;foo;[/tt]

I wish that Tom Noddy could have thought before he created this so-called "data file".

SamBones · Jul 12, 2005

The following [tt]sed[/tt] command will work if the fields are all the same width...

Code:

sed 's/......../&;/g;s/ *//g' file.dat

Using the sample in the original post this gives...

Code:

num1a;num2a;num3a;num4a;num5a
num1b;num2b;;num4b;
;num2c;num3c;num4c;num5c
;num2d;num3d;num4d;num5d
num1e;num2e;num3e;num4e;num5e
num1f;;;;num5f

If the columns for each field are a fixed width, but different widths, you can use [tt]cut[/tt] and [tt]paste[/tt]. The following should work...

Code:

cut -b 1-8 file.dat > field.1
cut -b 9-16 file.dat > field.2
cut -b 17-24 file.dat > field.3
cut -b 25-32 file.dat > field.4
cut -b 33-40 file.dat > field.5
paste -d\; field.[1-5] | sed 's/ *//g'

This gives the same output. Just adjust the field widths for each [tt]cut[/tt].

Both of these even handle duncdude's completely missing column 2 example above, as long as there's a span of spaces the size of the missing field.

BUT, this all assumes the fields all have a fixed length. If they vary from line to line with no clear delimiter, there's no way to parse it.

Hope this helps.

omoo · Jul 13, 2005

Hi,

The method by SamBones might be working.

may I know how can I delete the trailing empty spaces after each field?

omoo · Jul 13, 2005

ok i realise the sample code given by SamBones already delete the trailing spaces! oops. thanks!

SamBones · Jul 13, 2005

The commands I gave don't leave any spaces at all. The [tt]field.*[/tt] files have spaces in them, but they're just temporary files for the [tt]paste[/tt] to use. The final output has no spaces.

Hope this helps.

feherke · Jul 13, 2005

Hi

As I mentioned above, my first idea was using [tt]awk[/tt] with fixed widths. Seeing SamBones solution using similar steps, I think that wasn't so bad, so I post it, as another alternative :

Code:

awk '
BEGIN { FIELDWIDTHS="8 8 8 8 8 8" }
{ s=$1; for (i=2;i<NF;i++) s=s ";" $i; print gensub(" ","","g",s) }
' file.dat

Feherke.

http://rootshell.be/~feherke/

omoo · Jul 13, 2005

yeah it works! thanks SamBones.

I have another problem but I think it is not as bad.

I have data in the following format:

Code:

data1
data2 data3    data4
data5 data6
data7 data8    data9
      data10

I require the final output to be:

Code:

data1
data1
data1
data1

i only require the 1st line but I need to replicate it for all rows with data. In this example, there are 4 rows of data(excluding data1), so I need to replicate it in 4 rows.

feherke · Jul 13, 2005

Hi

And how should be that data1 recognised ? Is it a fix keyword, or something variable ?

If data1 is a keyword :

Code:

sed -n '/data1$/{h;n};g;p'

If data1 is whatever is in the first line :

Code:

sed -n '1{h;n};g;p'

If data1 is in a line alone ( no separator and other data ) :

Code:

sed -n '/ /!{h;n};g;p'

Feherke.

http://rootshell.be/~feherke/

omoo · Jul 13, 2005

Hi,

what is h and n supposed to be?

data1 is whatever in the 1st line. There is only one data in the 1st line. I need to duplicate this 1st line according to the number of rows.

feherke · Jul 13, 2005

Hi

omoo wrote :

what is h and n supposed to be?

[tt]sed[/tt] commands, according to [tt]man[/tt] :
h - Copy pattern space to hold space
n - Read the next line of input into the pattern space
g - Copy hold space to pattern space
p - Print the current pattern space

I think the second code from my previous post is exactly what you need.

Feherke.

http://rootshell.be/~feherke/

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

adding field separator 3

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Similar threads

Log in

Part and Inventory Search

Sponsor