Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Multiple records in one row 1

Status
Not open for further replies.

goodmans

MIS
Apr 23, 2008
63
GB
Hi Unix Gurus,

I got a file, that will have only one row. There is no carriage return for each row. Columnss are delimited by comma, rows are generated in fixed width except header. each row length is always 50 example. There is no delimiter between rows. [] characters in below exampler are just to explain you how it looks like, but we dont have "[" "]" in actual data.

[Header_(col1,col2,col2..,colN)_length_100][Row2_(col1,col2,col2..,colN)_Length_50][Row3_(col1,col2,col2..,colN)_Length_50][Row4_(col1,col2,col2..,colN)_Length_50][Row5_(col1,col2,col2..,colN)_Length_50][Row2_(col1,col2,col2..,colN)_Length_50][RowN_(col1,col2,col2..,colN)_Length_50]

I want to convert it into multiple row.

Header_(col1,col2,col2..,colN)_length_100
Row2_(col1,col2,col2..,colN)_Length_50
Row3_(col1,col2,col2..,colN)_Length_50
Row4_(col1,col2,col2..,colN)_Length_50
Row5_(col1,col2,col2..,colN)_Length_50
Row2_(col1,col2,col2..,colN)_Length_50
RowN_(col1,col2,col2..,colN)_Length_50

How can i do this, please suggest me something. Help please.

Thanks
G
 
Hi

You said nothing about how the header is recognized, so I skip it. For the the rest of the rows I would do one of these :
Bash:
[b]while[/b] [COLOR=chocolate]read[/color] -n [purple]50[/purple] s[teal];[/teal] [b]do[/b] echo [green][i]"$s"[/i][/green][teal];[/teal] [b]done[/b] [teal]<[/teal] /input/file
Code:
sed 's/.\{50\}/&\n/g' /input/file
Tested with GNU [tt]sed[/tt].

Feherke.
 
Sorry

Thanks for quick response.
header length is 100. after header each row is recognized with 50 length.

Thanks,
G
 
Hi

Then some minor modifications will be needed :
Bash:
[teal]([/teal] [COLOR=chocolate]read[/color] -n [purple]100[/purple] s[teal];[/teal] echo [green][i]"$s"[/i][/green][teal];[/teal] [b]while[/b] [COLOR=chocolate]read[/color] -n [purple]50[/purple] s[teal];[/teal] [b]do[/b]  echo [green][i]"$s"[/i][/green][teal];[/teal] [b]done[/b][teal];[/teal] [teal])[/teal] [teal]<[/teal] /input/file
Code:
sed 's/.\{50\}/&\n/g;s/\(.\{50\}\)\n/\1/' /input/file


Feherke.
 
Sorry mate. I am not perfect in scripting,
When I tried this, its producing the out.txt with nothing.

sed 's/.\{49\}/&\n/g;s/\(.\{28\}\)\n/\1/' /test/in.txt > /test/out.txt

Thanks,
G
 
It am on ksh. I am getting this erro when i try the bash code. Obviously there is some difference. I guess.
./ebctoas.sh[5]: read: bad option(s)

I dont know when i try the bash code i am getting following error read(s) invalid

My File looks like this 4 records in one row.

00,"AB11200801","JANUARY 2008 ",20080131,01542201,00000,00210,001,00000010202,00000,00210,002,00000000403,00000,00210,003,000000154


I want output

00,"RP11200801","JANUARY 2008 ",20080131,011112
01,00000,00210,001,000000102
02,00000,00210,002,000000004
03,00000,00210,003,000000154

Thanks mates.

 
So what is the solution for me now. I might have millions of records in one row. I cant use like line by line either.

Regards
G
 
Hi

goodmans said:
I might have millions of records in one row.
Ouch ! Then I would think to use a scripting language :
Code:
perl -pe 's/(.{50})/\1\n/g;s/(.{50})\n/\1/' /input/file

[gray]# or[/gray]

ruby -pe '$_.gsub!(/(.{50})/,"[i]1\n").sub!(/(.{50})\n/,"[/i]1")' /input/file
But those "millions of records" are way too many, so string functions may be faster than regular expression :
Code:
perl -ne 'print substr($_,0,100,"")."\n";print substr($_,0,50,"")."\n"while$_' /input/file

perl -ne 'print substr($_,0,50);foreach$i(1..length$_/50){print substr($_,$i*50,50)."\n"}' /input/file

[gray]# or[/gray]

ruby -ne 'puts$_[0,100];$_[0,100]="";while !$_.empty?: puts$_[0,50];$_[0,50]="";end' /input/file
ruby -ne 'print$_[0,50];(1..$_.length/50).each{|i|puts$_[i*50,50]}' /input/file


Feherke.
 
Thanks, I will try it. Do i need any packages for this perl or ruby or does it work on ksh?

Regards,
G
 
I have successfully tested these perl and bash code. Just waiting to test it on my work pc. But as I have already tested the bash code(which doesnt support), waiting to try perl code.

Thank you very much guys.

Regards
G
 
Hi

goodmans said:
Do i need any packages for this perl or ruby or does it work on ksh?
Both Perl and Ruby are standalone scripting languages, so their interpreters must be installed. They are not standard system tools, so they may be present or not. However Perl is quite old and very popular, so the only systems I saw without Perl, were the single floppy Linux distributions. Other then the interpreters themselves are not needed as those simple codes are not using any modules.

Feherke.
 
Hi Feherke,

Thank you so much for helping me with this.

But as you said like
But those "millions of records" are way too many, so string functions may be faster than regular expression :

Is perl way is quicker? or any other way is recommended?

Sorry I am quite new to this scripting.

Regards
G
 
Hi

goodmans said:
Is perl way is quicker?
PCRE usually performs better than other regular expression libraries. No idea which [tt]sed[/tt] implementation uses which library, but one thing is sure : Perl uses PCRE. ;)
goodmans said:
or any other way is recommended?
The 3rd Perl code, which uses [tt]foreach[/tt] and [tt]substr[/tt], should be the fastest from all those codes.

Feherke.
 
Thats great thanks,

This is working for me.
perl -ne 'print substr($_,0,20,"")."\n";print substr($_,0,15,"")."\n"while$_' emp.data

But when I am using this
perl -ne 'print substr($_,0,20,"")."\n";print length$_/15;foreach$i(1..length$_/15){print substr($_,$i*15,15)."\n"}' emp.data

I am getting some problem. like i am getting header + just 1 row. that to second detail row.

If i just try perl 'print length$_/15" the output is just 1.

I am surprized.

Thanks
G
 
Hi

Oops. Operator precedence problem. Sorry.
Code:
perl -ne 'print substr($_,0,50);foreach$i(1..length[highlight]([/highlight]$_[highlight])[/highlight]/50){print substr($_,$i*50,50)."\n"}' /input/file
Note that the code was based on you previous requirement, where the header row's length was twice the data row's length.

Feherke.
 
Sorry mate, I am really sorry, I said that for just an example. where as in my file its different like header is 61 and rowsize is 23. So does the code works with changes.?

Regards
G
 
I mean when the header size is different from rowsize. and not like *2 etc. sorry its been pain. I am very new to this scripting. I am trying to change the increment values in the loop and trying different options. As i got no clue about this for loop. I am struggling.

Regards
G
 
Hi feherke,

I got it with sample file after changing few things. Thanks a lot. you rocks mate. I am really appreciate your help.

perl -ne 'print substr($_,0,20)."\n";foreach$i(0..length($_)/15){print substr($_,($i*15)+20,15)."\n"}' emp.data


My SampleFile input file

empno,empname,empdobE1,ABC,01012001E2,BCA,01012001E3,BCC,01012001E4,BCD,01012001E5,BCE,01012001E6,ZCA,01012001

output
empno,empname,empdob
E1,ABC,01012001
E2,BCA,01012001
E3,BCC,01012001
E4,BCD,01012001
E5,BCE,01012001
E6,ZCA,01012001

Regards
G
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top