Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Splitting File into smaller batches

Status
Not open for further replies.

venkatpavan

Programmer
Feb 18, 2006
42
SG
Hi,

I'm a beginner on UNIX,So please forgive my Ignorance on this.

I'm trying to split a file into batches,Big file which i want to split looks like below one
00
XX
XX
99
00
XX
XX
99
00
XX
XX
99
00
XX
XX
99

Each batch starts 00 and ends with 99 records,I want to split this big file into batches,So that each batch should be 00-99.Can someone help me on this.

Thanks.....
 
It's very basic but, as a beginner, probaly easier to understand than the csplit or awk variants.
Code:
#!/bin/ksh
COUNT=0

while read line
do
  echo $line >> FILE$COUNT
  if [ $line = 99 ] 
  then
      (( COUNT += 1 ))
   fi
done < /input/file
This will split the file into FILE0, FILE1, FILE2 etc.

Ceci n'est pas une signature
Columb Healy
 
I think i messed up the question here,I'm sorry about that.I want to split the file by 00 to 99 as each batch,But 00 and 99 should be the first two characters of the record,In this file Each record is around 1018 bytes and each record is separated by either Carriage return line feed or line feed.Below is the example of the file,Due to the space issue here i used dots(...)

0000000000009000000000090000400023726..............
0100000000009000000000090000400023726..............
0200000000009000000000090000400023726..............
0300000000009000000000090000400023726..............
0400000000009000000000090000400023726..............
999999999999999900009000000000090000400023726
0000000000009000000000090000400023726..............
0100000000009000000000090000400023726..............
0200000000009000000000090000400023726..............
0300000000009000000000090000400023726..............
0400000000009000000000090000400023726..............
999999999999999900009000000000090000400023726
0000000000009000000000090000400023726..............
0100000000009000000000090000400023726..............
0200000000009000000000090000400023726..............
0300000000009000000000090000400023726..............
0400000000009000000000090000400023726..............
999999999999999900009000000000090000400023726

Once again i'm sorry about earlier.

Thanks.....
 
awk '/^00/{if(NR>1)close(f);f="file"++n}{print>f}' /path/to/bigfile

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886
 
For my script
Code:
#!/bin/ksh
COUNT=0

while read line
do
  echo $line >> FILE$COUNT
  if egrep -q ^99 $line
  then
      (( COUNT += 1 ))
   fi
done < /input/file

Ceci n'est pas une signature
Columb Healy
 
Hi

Columb, are you sure about this ?
Code:
egrep -q ^99 $line
Thet line by line file appending you already used was slow, but now you added the [tt]egrep[/tt] call which is even more slower. ( Note that for the [tt]time[/tt] test I corrected the above mentioned [tt]egrep[/tt] misunderstanding. )
Code:
[blue]master #[/blue] time my.sh 
real    0m0.225s
user    0m0.030s
sys     0m0.190s

[blue]master #[/blue] time columb.sh 
real    0m11.450s
user    0m5.630s
sys     0m5.510s
I would rewrite it like this :
Code:
[highlight #eee]#!/usr/bin/ksh[/highlight]

COUNT=0
end=no

[b]while[/b] [b]test[/b] $end; [b]do[/b]
  end=[i]''[/i]
  [b]while[/b] [b]read[/b] line; [b]do[/b]
    echo [i]"$line"[/i]
    [b]if[/b] [[ [i]"$line"[/i] = 99* ]]; [b]then[/b]
      (( COUNT ++ ))
      end=no
      [b]break[/b]
    [b]fi[/b]
  [b]done[/b] > FILE$COUNT
[b]done[/b] < /input/file
Tested with (pd)[tt]ksh[/tt].

Note : if there is an empty line at the end of file my script will output that too, in a file with the suitable order number.

Feherke.
 
feherke

I was going for simple rather than fast but yes, your code is better. Thanks for the amends.

My resoning for using a basic script is that venkatpavan is a beginner and will have to maintain the code. I'm always reluctant to provide awk scripts because I remember just how long it took me to get my head round the awk basics and even today the lemur book still lives on my desk.

Ceci n'est pas une signature
Columb Healy
 
Thanks a lot Guys,It's Working.You guys made it look lot easier,One thing is sure if we know UNIX,Life will be lot easier in Programming World.

Once again I appreciate all your Help.
 
Hi

Columb said:
I remember just how long it took me to get my head round the awk basics
Well, I remember for me was a nice autumn day's afternoon...

Thank you Columb for sharing your experience. I will keep it in mind when posting in the future.

Feherke.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top