Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

csplit on large file - 1500+ segments

Status
Not open for further replies.

gecko27

IS-IT--Management
May 29, 2003
33
US
I gotten this script down to accomplish what I need, but it the csplit command is limited to 99 iterations on AIX/ksh.

I have a large file that is a reprint of a bunch of invoices. I need to split the file into a seperate file for each invoice and then rename it using the invoice number from inside the file.

Here's what I have right now. I know it's not the prettiest...

csplit -s -k -finv -n4 /REPORTS/TEMPINV.TXT /PROF/+2 {99}

for FILE in $(ls inv*)
do
INV=$(head -20 $FILE| grep "[0-9][0-9]/[0-9][0-9]/[0-9][0-9]"|awk '{ pr
int $1 }')
mv $FILE "$INV".txt
done

Unfortunately, AIX/ksh doesn't recognize using {*} in csplit, so the max is {99}. Here's an example of the end of one invoice

*** ORDER COMPLETED *** 0.00
COST 69.76
^M
TOTAL AMT DUE TOTAL PROF PROF %^M
105.60 29.86 29.97^L


And this is the line I'm using to get the invoice number. The 1st slpit has this at line 20; from the re on it's line 19.

1179738-01 103 02/08/05 160 6380 02/11/05

I have to be able to keep the formatting exactly the same in the output file so it can be reprinted on the preprinted form if necessary. I was trying read the file line by line and cat/echo/etc to the output file, but they all seemed to left justify and remove spaces from each line.

Does anyone know of a way to get the same function as csplit w/o a limit?

Thanks
 
Hi,
this sounds like a job for AWK. I stole this from the AWK forum so don't give me credit for this.

you will have to figure out when to switch files if it isn't always every 20 lines.

Also you should be able to make the file name based upon whatever is in the $1 of the first line of the invoice given you description above.

Code:
BEGIN { rows = 20 }

NR%rows == 1 { makefilename() }
{ print >filename }

function makefilename()
{ if (filename)
    close(filename)
  n = int( NR / rows ) + 1
  filename = "output" sprintf("%04d",n) ".dat"
}
 
After some toher issues came up in the original file, I got this working. I'll post the code so maybe it'll help someone else w/ th e looping fun that I had to get thru.

Code:
#!/usr/bin/ksh

#set variables
#################
#invoice number mask
INV_NO="[0-9][0-9][0-9][0-9][0-9][0-9][0-9]-[0-9][0-9]"

#set # of inv to split
SPLITS=99

#input file
INFILE=TEMPINV.TXT

#say that I'm working
clear
echo "I'm thinking about it."

#change to working directory
cd /tmp/inv

#remove null characters
tr '\000' ' ' < "$INFILE" > OUTFILE
mv OUTFILE "$INFILE"

#find number of invoices in file
COUNT=$(grep -c -e "THIS IS YOUR" < "$INFILE")
echo "$COUNT invoices to print."

#find out how many loops to do
LOOPS=`expr "$COUNT" / "$SPLITS"`
REMAIN=`expr "$COUNT" % "$SPLITS"`
if (( $REMAIN > 0 ))
then
	LOOPS=`expr "$LOOPS" + 1`
fi

# print # of passes to go
echo "pass(es) to go:  \c"

#loop thru file for invoices
while (( $LOOPS > 0 ))
do
	#loop counter
	if (( $LOOPS < 10 ))
	then
		echo  "\b\b $LOOPS\c"
	else
		echo "\b\b$LOOPS\c"
	fi

	#check for number of SPLITS
        if (( $COUNT < $SPLITS )) 
        then
                SPLITS=$(grep -c -e "THIS IS YOUR" < "$INFILE")
		#add 2 else it tries to read past the end of the file
		SPLITS=$(( SPLITS - 2 ))
	else
	        #decrement the counter by the last split
       		#add 1 since it starts counting at 0 
		COUNT=$(( $COUNT - ($SPLITS + 1) ))
	fi


	#split file after 'TOTAL' line
	csplit -s -k -finv -n3 "$INFILE" /"THIS IS YOUR"/+10 {"$SPLITS"} 
		

	#move last file to INFILE
	if [ -f inv100 ]
	then
		mv inv100 "$INFILE"
	fi


	#read invoice number and rename files
	for FILE in $(ls inv*) 
	do
		INV=`sed -n '19,22p' "$FILE" | grep -e $INV_NO | awk '{ print $1 }' | cut -c 1-10`
		mv "$FILE" "$INV".txt       
	done

	#decrement loop counter
	LOOPS=$(( $LOOPS - 1 ))

done

exit 0
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top