Multiple records in one row 1

goodmans · Feb 26, 2010

Hi Unix Gurus,

I got a file, that will have only one row. There is no carriage return for each row. Columnss are delimited by comma, rows are generated in fixed width except header. each row length is always 50 example. There is no delimiter between rows. [] characters in below exampler are just to explain you how it looks like, but we dont have "[" "]" in actual data.

[Header_(col1,col2,col2..,colN)_length_100][Row2_(col1,col2,col2..,colN)_Length_50][Row3_(col1,col2,col2..,colN)_Length_50][Row4_(col1,col2,col2..,colN)_Length_50][Row5_(col1,col2,col2..,colN)_Length_50][Row2_(col1,col2,col2..,colN)_Length_50][RowN_(col1,col2,col2..,colN)_Length_50]

I want to convert it into multiple row.

Header_(col1,col2,col2..,colN)_length_100
Row2_(col1,col2,col2..,colN)_Length_50
Row3_(col1,col2,col2..,colN)_Length_50
Row4_(col1,col2,col2..,colN)_Length_50
Row5_(col1,col2,col2..,colN)_Length_50
Row2_(col1,col2,col2..,colN)_Length_50
RowN_(col1,col2,col2..,colN)_Length_50

How can i do this, please suggest me something. Help please.

Thanks
G

feherke · Feb 26, 2010

Hi

You said nothing about how the header is recognized, so I skip it. For the the rest of the rows I would do one of these :

Bash:

[b]while[/b] [COLOR=chocolate]read[/color] -n [purple]50[/purple] s[teal];[/teal] [b]do[/b] echo [green][i]"$s"[/i][/green][teal];[/teal] [b]done[/b] [teal]<[/teal] /input/file

Code:

sed 's/.\{50\}/&\n/g' /input/file

Tested with GNU [tt]sed[/tt].

Feherke.

http://free.rootshell.be/~feherke/

goodmans · Feb 26, 2010

Sorry

Thanks for quick response.
header length is 100. after header each row is recognized with 50 length.

Thanks,
G

feherke · Feb 26, 2010

Hi

Then some minor modifications will be needed :

Bash:

[teal]([/teal] [COLOR=chocolate]read[/color] -n [purple]100[/purple] s[teal];[/teal] echo [green][i]"$s"[/i][/green][teal];[/teal] [b]while[/b] [COLOR=chocolate]read[/color] -n [purple]50[/purple] s[teal];[/teal] [b]do[/b]  echo [green][i]"$s"[/i][/green][teal];[/teal] [b]done[/b][teal];[/teal] [teal])[/teal] [teal]<[/teal] /input/file

Code:

sed 's/.\{50\}/&\n/g;s/\(.\{50\}\)\n/\1/' /input/file

Feherke.

http://free.rootshell.be/~feherke/

goodmans · Feb 26, 2010

Sorry mate. I am not perfect in scripting,
When I tried this, its producing the out.txt with nothing.

sed 's/.\{49\}/&\n/g;s/$.\{28\}$\n/\1/' /test/in.txt > /test/out.txt

Thanks,
G

feherke · Feb 26, 2010

Hi

Well, probably your [tt]sed[/tt] implementation is not the GNU one. Then which ?

Feherke.

http://free.rootshell.be/~feherke/

goodmans · Feb 26, 2010

It am on ksh. I am getting this erro when i try the bash code. Obviously there is some difference. I guess.
./ebctoas.sh[5]: read: bad option(s)

I dont know when i try the bash code i am getting following error read(s) invalid

My File looks like this 4 records in one row.

00,"AB11200801","JANUARY 2008 ",20080131,01542201,00000,00210,001,00000010202,00000,00210,002,00000000403,00000,00210,003,000000154

I want output

00,"RP11200801","JANUARY 2008 ",20080131,011112
01,00000,00210,001,000000102
02,00000,00210,002,000000004
03,00000,00210,003,000000154

Thanks mates.

feherke · Feb 26, 2010

Hi

goodmans said:
It am on ksh.

Then it is probably [tt]ksh[/tt] 88. -n is available only since [tt]ksh[/tt] 93.

Feherke.

http://free.rootshell.be/~feherke/

goodmans · Feb 26, 2010

So what is the solution for me now. I might have millions of records in one row. I cant use like line by line either.

Regards
G

feherke · Feb 26, 2010

Hi

goodmans said:
I might have millions of records in one row.

Ouch ! Then I would think to use a scripting language :

Code:

perl -pe 's/(.{50})/\1\n/g;s/(.{50})\n/\1/' /input/file

[gray]# or[/gray]

ruby -pe '$_.gsub!(/(.{50})/,"[i]1\n").sub!(/(.{50})\n/,"[/i]1")' /input/file

But those "millions of records" are way too many, so string functions may be faster than regular expression :

Code:

perl -ne 'print substr($_,0,100,"")."\n";print substr($_,0,50,"")."\n"while$_' /input/file

perl -ne 'print substr($_,0,50);foreach$i(1..length$_/50){print substr($_,$i*50,50)."\n"}' /input/file

[gray]# or[/gray]

ruby -ne 'puts$_[0,100];$_[0,100]="";while !$_.empty?: puts$_[0,50];$_[0,50]="";end' /input/file
ruby -ne 'print$_[0,50];(1..$_.length/50).each{|i|puts$_[i*50,50]}' /input/file

Feherke.

http://free.rootshell.be/~feherke/

goodmans · Feb 27, 2010

Thanks, I will try it. Do i need any packages for this perl or ruby or does it work on ksh?

Regards,
G

goodmans · Feb 27, 2010

I have successfully tested these perl and bash code. Just waiting to test it on my work pc. But as I have already tested the bash code(which doesnt support), waiting to try perl code.

Thank you very much guys.

Regards
G

feherke · Feb 28, 2010

Hi

goodmans said:
Do i need any packages for this perl or ruby or does it work on ksh?

Both Perl and Ruby are standalone scripting languages, so their interpreters must be installed. They are not standard system tools, so they may be present or not. However Perl is quite old and very popular, so the only systems I saw without Perl, were the single floppy Linux distributions. Other then the interpreters themselves are not needed as those simple codes are not using any modules.

Feherke.

http://free.rootshell.be/~feherke/

goodmans · Feb 28, 2010

Hi Feherke,

Thank you so much for helping me with this.

But as you said like
But those "millions of records" are way too many, so string functions may be faster than regular expression :

Is perl way is quicker? or any other way is recommended?

Sorry I am quite new to this scripting.

Regards
G

feherke · Feb 28, 2010

Hi

goodmans said:
Is perl way is quicker?

PCRE usually performs better than other regular expression libraries. No idea which [tt]sed[/tt] implementation uses which library, but one thing is sure : Perl uses PCRE.

goodmans said:
or any other way is recommended?

The 3^rd Perl code, which uses [tt]foreach[/tt] and [tt]substr[/tt], should be the fastest from all those codes.

Feherke.

http://free.rootshell.be/~feherke/

goodmans · Feb 28, 2010

Thats great thanks,

This is working for me.
perl -ne 'print substr($_,0,20,"")."\n";print substr($_,0,15,"")."\n"while$_' emp.data

But when I am using this
perl -ne 'print substr($_,0,20,"")."\n";print length$_/15;foreach$i(1..length$_/15){print substr($_,$i*15,15)."\n"}' emp.data

I am getting some problem. like i am getting header + just 1 row. that to second detail row.

If i just try perl 'print length$_/15" the output is just 1.

I am surprized.

Thanks
G

feherke · Feb 28, 2010

Hi

Oops. Operator precedence problem. Sorry.

Code:

perl -ne 'print substr($_,0,50);foreach$i(1..length[highlight]([/highlight]$_[highlight])[/highlight]/50){print substr($_,$i*50,50)."\n"}' /input/file

Note that the code was based on you previous requirement, where the header row's length was twice the data row's length.

Feherke.

http://free.rootshell.be/~feherke/

goodmans · Feb 28, 2010

Sorry mate, I am really sorry, I said that for just an example. where as in my file its different like header is 61 and rowsize is 23. So does the code works with changes.?

Regards
G

goodmans · Feb 28, 2010

I mean when the header size is different from rowsize. and not like *2 etc. sorry its been pain. I am very new to this scripting. I am trying to change the increment values in the loop and trying different options. As i got no clue about this for loop. I am struggling.

Regards
G

goodmans · Feb 28, 2010

Hi feherke,

I got it with sample file after changing few things. Thanks a lot. you rocks mate. I am really appreciate your help.

perl -ne 'print substr($_,0,20)."\n";foreach$i(0..length($_)/15){print substr($_,($i*15)+20,15)."\n"}' emp.data

My SampleFile input file

empno,empname,empdobE1,ABC,01012001E2,BCA,01012001E3,BCC,01012001E4,BCD,01012001E5,BCE,01012001E6,ZCA,01012001

output
empno,empname,empdob
E1,ABC,01012001
E2,BCA,01012001
E3,BCC,01012001
E4,BCD,01012001
E5,BCE,01012001
E6,ZCA,01012001

Regards
G

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Multiple records in one row 1

MIS

Programmer

MIS

Programmer

MIS

Programmer

MIS

Programmer

MIS

Programmer

MIS

MIS

Programmer

MIS

Programmer

MIS

Programmer

MIS

MIS

MIS

Similar threads

Log in

Part and Inventory Search

Sponsor