Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Chop the file into small pieces 2

Status
Not open for further replies.

babeo

Technical User
Mar 30, 2000
398
0
0
CA
Hello,
I have a huge file, which combines of many small files inside, the best thing each small file ends with a line of "#******END". I am not good in scripting, could someone help me please? I know how to cut in row (grep) and column (cut -d), but don't know how to combine them. Is there a way to "unconcat" the file?

Thank you very much.
 
that does not work, it does not split the file as I want. I only want to split each file at the word "END" .
 
maybe 'csplit'? [as noted in the other forum]

vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+
 
You can modify the following script to your needs (you just need to change the end of file indication condition whereby I am using line counts):

#!/bin/ksh
NUMOFARGS=$#
if [ $NUMOFARGS -lt 2 ]
then
echo "Invalid usage"
echo "Usage: $0 <FileName> <NumberOfLines>"
exit 1
fi

INPUTFILENAME=$1
NUMBEROFLINES=$2
LINECOUNTER=0
FILECOUNT=1
FILENAME=${INPUTFILENAME}${FILECOUNT}

cat ${INPUTFILENAME} | while read -r LINE
do
LINECOUNTER=`expr $LINECOUNTER + 1`
echo $LINE >> $FILENAME
if [ "`expr $LINECOUNTER % $NUMBEROFLINES`" -eq "0" ]
then
FILECOUNT=`expr $FILECOUNT + 1`
FILENAME=${INPUTFILENAME}${FILECOUNT}
fi
done
 
Hi

Useless use of [tt]cat[/tt]. And [tt]ksh[/tt] is able to do arithmetic evaluation itself, no need for [tt]expr[/tt].

Optimize it abit. Not only will work faster, your code will look more clear.

Feherke.
 
Please tell me how
In your ksh man page pay attention to let and ((...))

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ181-2886
 
Thanks PHV. I could use let for assigning value to a variable. Can I do something about :-

if [ "`expr $LINECOUNTER % $NUMBEROFLINES`" -eq "0" ]

Also, how can I avoid using cat.
 
[ $((LINECOUNTER % NUMBEROFLINES)) -eq 0 ] && ...

while read line
do
...
done < $INPUTFILENAME

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ181-2886
 
Hi bansalhimanshu

Thanks for your code, and I try to rework and apply others' suggestions above, but still have something that does not work, this is the error I get:

"chop.ksh[11]: A: bad number"

Below is the code I rework:

#!/bin/ksh

INPUTFILENAME=$1
LINECOUNTER="A"
FILECOUNT=1
FILENAME=${INPUTFILENAME}${FILECOUNT}

while read -r LINE
do
echo $LINE >> $FILENAME
if [ $LINECOUNTER -eq "END" ]
then
FILECOUNT=`expr $FILECOUNT + 1`
FILENAME=${INPUTFILENAME}${FILECOUNT}
fi
done < $INPUTFILENAME
 
-eq is looking for a numeric comparison. Try =
 
Also as suggested by PHV and feherke, these are the changes I made to the original script:

#!/bin/ksh
NUMOFARGS=$#
if [ $NUMOFARGS -lt 2 ]
then
echo "Invalid usage"
echo "Usage: $0 <FileName> <NumberOfLines>"
exit 1
fi

INPUTFILENAME=$1
NUMBEROFLINES=$2
LINECOUNTER=0
FILECOUNT=1
FILENAME=${INPUTFILENAME}${FILECOUNT}

while read -r LINE
do
let LINECOUNTER=$LINECOUNTER+1
echo $LINE >> $FILENAME
if [ $(($LINECOUNTER % $NUMBEROFLINES)) -eq "0" ]
then
FILECOUNT=`expr $FILECOUNT + 1`
FILENAME=${INPUTFILENAME}${FILECOUNT}
fi
done < $INPUTFILENAME


Of course for text comparison change -eq to =
 
Hi bansalhimanshu

Ok, for quick test, I keep everything, just add a variable and change the line:
if [ $(($LINECOUNTER % $NUMBEROFLINES)) -eq "0" ]

to
CHECKVALUE="A"
....
if [ $(($LINECOUNTER % $NUMBEROFLINES)) -eq "0" ] && [ $(($CHECKVALUE = "END")) ]

I get the below errors:
- the file just copy to a new file name (eg test to become test1), and the content is the same (just less a few MB)

What did I do wrong?
 
Probably this condition $(($CHECKVALUE = "END")) is not getting satisfied after the first initialization of FILENAME. So the first file is getting created and subsequently no one.

Check 1) if you are assigning to the variable CHECKVALUE correctly (contents of a line) 2) Will the appearance of "END" mark the start of creation of new file? Make sure END should be on a single line without any space or tab etc.

Infact I think

if [ $(($LINECOUNTER % $NUMBEROFLINES)) -eq "0" ]

should be replaced with

if [ $LINE = "END" ]
 
feherke,
I happened to test the difference in time when using expr and using let. I found a huge difference. With the following script:

#!/bin/ksh
#Usage $0 <startcount> <quantity> <incrementby>
NUMARGS=$#
if [ ${NUMARGS} -lt 3 ]
then
echo "Usage $0 <startcount> <quantity> <incrementby>"
exit 1
fi

STARTCOUNT=${1}
QUANTITY=${2}
INCREMENTBY=${3}

COUNTER=0
RUNNINGNUMBER=${STARTCOUNT}
while [ ${COUNTER} -lt ${QUANTITY} ]
do
echo ${RUNNINGNUMBER}
let RUNNINGNUMBER=${RUNNINGNUMBER}+${INCREMENTBY}
let COUNTER=${COUNTER}+1
done
exit 0


Output of "time" if I use expr instead of let:
real 0m37.82s
user 0m10.42s
sys 0m28.78s


Output of "time" if I use let:
real 0m0.18s
user 0m0.14s
sys 0m0.03s

This is a huge difference. Thanks for the very valuable tip.
 
Hi

Far not so spectacular, but abit more speed optimization is possible :
Code:
[gray]# the original code[/gray]
[blue]master #[/blue] time { C=0; I=1; R=0; while [ $C -lt 1000000 ]; do let R=$R+$I; let C=$C+1; done }
   30.20s real    30.13s user     0.02s system

[gray]# without the dolar ( $ ) sign[/gray]
[blue]master #[/blue] time { C=0; I=1; R=0; while [ $C -lt 1000000 ]; do let R=R+I; let C=C+1; done }
   28.27s real    28.20s user     0.00s system

[gray]# with C syntax[/gray]
[blue]master #[/blue] time { C=0; I=1; R=0; while ((C<1000000)); do ((R+=I)); ((C++)); done }
   21.53s real    21.32s user     0.02s system
YMMV. I used [tt]pdksh[/tt] 5.2.14.

Feherke.
 
Thanks Feherke. This is very valuable information. Even I tried the same queries and had the following results first and last query only I mention (in consistent with yours):

time { C=0; I=1; R=0; while [ $C -lt 1000000 ]; do let R=$R+$I; let C=$C+1; done }
real 0m53.28s
user 0m53.22s
sys 0m0.00s

time { C=0; I=1; R=0; while ((C < 1000000)) do (( R+=I)); (( C+=1)); done }
real 0m18.65s
user 0m18.62s
sys 0m0.00s

On my machine C++ syntax didn't work. I don't how to check the version of the shell on my machine. Atmost I know that this is ksh from $SHELL variable.
 
Hi

Hmm... All three should work. Also with [tt]bash[/tt]. I see only a syntax error, a missing semicolon ( ; ).
Code:
time { C=0; I=1; R=0; while ((C < 1000000))[red][b];[/b][/red] do  (( R+=I));  (( C+=1)); done }

Feherke.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top