Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

MODIFICATION of SUMMATION script

Status
Not open for further replies.

ramneim

Programmer
Aug 26, 2012
1
hi guys,

i have a script that basically just sums up the values of 2 particular columns in a file, grouped by the columns specified as well

The script is quite long, but it's basically just repetitive.. just for each condition.

This script only accepts one type of input file.

#!/usr/bin/sh
###############################################################################
# #
# Description: #
# This Shell Script sums the 2 columns as grouped by the the columns #
# specified #
# #
###############################################################################
FILENAME=$1
DELIMITER=$2
FIRST_COL=$3
SECOND_COL=$4
SALESDATE_COL=$5
STOREID=$6
UPC=$7
GTIN=$8
PROMOID=$9
echo ""
echo ".:Summation Tool:."

for FILE in ${FILENAME}
do
gzip -t ${FILE} 2>/dev/null

if [ $? -eq 1 ];
then
comm=cat
else
comm=gzcat
fi

if [ $# -eq 5 ]; then
$comm $FILE | awk -v col=$FIRST_COL -v col2=$SECOND_COL -v col3=$SALESDATE_COL -v sourcefile=$FILE -F "$DELIMITER" '
{
if(NR!=1)
{
if($col3!="" && $col!="" && $col2!="")
{
salesdate[$col3] = $col3
v[$col3] += $col
d[$col3] += $col2
}
}
}
END{
printf("\n%s%s%s%s%s%s\n","sales_date","|","sum(POS_QTY)","|","sum(POS_AMT)|","<source_file>")
for (i in v)
{
if(salesdate!=1)
{
printf("%s%s%d%s%10.4f%s%s\n",salesdate,"|",v,"|",d,"|",sourcefile)
}
}
}'

elif [ $# -eq 6 ]; then
$comm $FILE | awk -v col=$FIRST_COL -v col2=$SECOND_COL -v col3=$SALESDATE_COL -v col4=$STOREID -v sourcefile=$FILE -F "$DELIMITER" '
{
if(NR!=1)
{
if($col3!="" && $col!="" && $col2!="")
{
salesdate[$col3$col4] = $col3 "|" $col4
v[$col3$col4] += $col
d[$col3$col4] += $col2
}
}
}
END{
printf("\n%s%s%s%s%s%s\n","sales_date|cust_id","|","sum(POS_QTY)","|","sum(POS_AMT)|","<source_file> ")
for (i in v)
{
if(salesdate!=1)
{
printf("%s%s%d%s%10.4f%s%s\n",salesdate,"|",v,"|",d,"|",sourcefile)
}
}
}'

.
. (same pattern for [ $# -eq 7 ] up to [ $# -eq 9 ]
.

elif [ $# -gt 10 ]; then
echo "Too many parameters passed. Please pass only the file name, delimiter, the number columns for POS_QTY and POS_AMT, and number of fields to GROUP BY."
elif [ $# -lt 5 ]; then
echo "Not enough parameters passed. Please pass the file name, delimiter, the number columns for POS_QTY and POS_AMT, and number of fields to GROUP BY."
else
echo "No parameters passed. Please pass the file name, delimiter, the number columns for POS_QTY and POS_AMT, and number of fields to GROUP BY."
fi
done


Usage of the script:

sh <script file name> <file or files> <delimiter> <column # to sum1> <column # to sum2> <column #s for grouping>

so basically sample execution would look like this

$ sh getsums.sh ft-gnct-3398-cd-2012-07-07-140112.txt.gz " " 22 23 1 2 4

.:Summation Tool:.

sales_date|cust_id|UPC|sum(POS_QTY)|sum(POS_AMT)|<source_file>
2012-07-05|PL_000000000034014|PL_2003007476012|75| 75.9200|ft-gnct-3398-cd-2012-07-07-140112.txt.gz
2012-07-03|PL_000000000034012|PL_2003000322606|19| 19.5000|ft-gnct-3398-cd-2012-07-07-140112.txt.gz
2012-07-01|PL_000000000034010|PL_2003005081201|38| 38.9800|ft-gnct-3398-cd-2012-07-07-140112.txt.gz


THE MODIFICATION NEEDED:
1. there is an additional input file (2nd file)
- first, there should be a code added to check whether there is a 2ND file
- if there is no 2nd file, just execute the first original script as is
- if there is a 2nd file
- first column needed from this file needs to be
stored to an array to be compared with UPC=$7 which is
coming from the first file
- all those values that match between UPC=$7 and first column
of 2nd file should be "excluded in summation"
(entire line for that match)
- SO, basically, the array from first column of 2nd file needs to
be looped to UPC=$7, to compare each line if there's a match
- those that will match will be stored to a separate new array
- this new array contains the list of values that will be
excluded during summation


if you guys can come up/suggest codes for this, that would be awesome.
i would really appreciate it.
i would also like suggestions, and guide me through the whole process of building it, im quite new to this stuff, so i basically don't know anything,
but i'm willing to learn.

i hope you can all teach me and help me solve this and come up with a
correctly modified script.

Thanks a lot guys! hope to hear from you guys soon.
i really need this script done as soon as possible.

Thanks a lot.
 
The first thing I'd do is try and get rid of that horrible repetitive code; you should be able to handle the additional columns in just one version of the awk script.

Do you have some sample input data?

Also, where do you propose to include the second file name in the command line... I can see the command line syntax becoming very confusing.

Annihilannic
[small]tgmlify - code syntax highlighting for your tek-tips posts[/small]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top