Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Need to automate "incremental" SQL Loader process

Status
Not open for further replies.

jlaw10

Technical User
Jul 28, 2005
54
US
I have a quick task to setup a job to load data directly into database from flat files. I was going to use SQL Loader.

Attached are the two sample data files. 1st priority is with the log.par file. I need to set up a process where both of these get loaded into the database on a continuous basis (also included the two Unix scripts which currently generate these log files). Need to somehow feed the data directly into the database (without modifying the application binaries), that would also be helpful.

Challenge:

+ New records are appended at end every hour
+ Need to load incremental updates to the database without re-entering already existing/duplicate data

Note: There are currently no tables in the database to field data so I must create the table.

### Exist Script #1

#!/bin/ksh

# PATH and LIBRARY variables
LD_LIBRARY_PATH=/usr/lib:/usr/ucblib:
export LD_LIBRARY_PATH

PATH=/sbin:/usr/5bin:/usr/sbin:/usr/ucb:/usr/ccs/bin:/usr/local/bin
export PATH

BINPATH="$HOME/base"
PROCPATH="$HOME/process"
DATAPATH="$HOME/data"
ETMSFILT="etmsfilter"

PATH=$PATH:$BINPATH
export PATH

cd $PROCPATH

#######################
# temp set
BACK_DATE=`/usr/bin/date +m%d%H%M%S`

SDATE=$1
ACCESSNAME=$2
RAWPATH=$3
if [ -z "$1" ] ; then
echo usage $0 date_yyyymmddhh name_list orig_data_path
exit 1
fi
ACCESSNAME=${ACCESSNAME:=$HOME/process/name}
NAME=$ACCESSNAME
RAWPATH=${RAWPATH:=$HOME/orig}

sleep 2
ps -ef | grep $ETMSFILT | grep -v grep > /dev/null
if [ $? -eq 0 ] ; then
ps -ef | grep $ETMSFILT | read f1 pid other
kill $pid
fi

ps -ef | grep $ETMSFILT | grep -v grep > /dev/null
if [ $? -eq 1 ] ; then
$BINPATH/$ETMSFILT -Fdata.pipe -T$SDATE -N9000 -D$DATAPATH -Llog.count >>$NAME 2>log.filterr &
fi

zcat $RAWPATH/orig.${SDATE}* > $PROCPATH/data.pipe
sleep 10

ps -ef | grep $ETMSFILT | grep -v grep > /dev/null
if [ $? -eq 0 ] ; then
ps -ef | grep $ETMSFILT | grep -v grep | read f1 pid other
kill $pid
fi

#### Exist Script #2

pswd=$1
BIN5="$HOME/base"
DATA_PATH="$HOME/process"
ORACLE_SID="etmsb1"
export BIN5 DATA_PATH ORACLE_SID

ORACLE_HOME=/opt/oracle/product/10.2.0
LD_LIBRARY_PATH=/usr/openwin/lib:/opt/SUNWmotif/lib:/usr/lib:/usr/ucblib:$ORACLE_HOME/lib
PATH=/sbin:/usr/5bin:/usr/sbin:/usr/ucb:/usr/ccs/bin:/usr/local/bin:/$ORACLE_HOME/bin
export LD_LIBRARY_PATH ORACLE_HOME PATH

CURRENT=${DATA_PATH}/.current_date
PARSER=etmspar
export CURRENT PARSER

#set -x
cd $DATA_PATH
if [ -f $CURRENT ] ; then
cat $CURRENT
fi
## PAR ##
ps -ef | grep $PARSER | grep -v grep > /dev/null
if [ $? -eq 1 ] ; then
${BIN5}/${PARSER} ${DATA_PATH}/name dummy $ORACLE_SID $pswd $CURRENT 150 >>${DATA_PATH}/log.par 2>${DATA_PATH}/log.parerr &
fi
#set +x
 
You may find it easier to use external tables as you will probably need to load to a staging table anyway.

 
Ok good idea. I think I can use external loader and load data using SQL Loader. Any recommendation on how to avoid loading duplicate data since the new data will just be appended to the flat file?
 
Was thinking of using the minus operation to identify new rows.
 
You could use minus or you could outer join the staging table to the load table e.g.

select decode(t1.id, null, 'N', 'Y') as target_exists,
s1.*
from target_table t1,
source_table s1
where s1.id = ti.id(+)

Another option if you also want to update rows might the MERGE command.

 
Also the source flat file has the following data. What is the best way to add the primary key ID field to the external table in this case? Also, do I need to add a comma to separate the fields?

data.2011082020_15 20110820223605 00:00:19 9008 474.105
data.2011082020_16 20110820223624 00:00:18 8988 499.333
data.2011082020_17 20110820223642 00:00:19 8587 451.947
data.2011082020_18 20110820223701 00:00:18 8038 446.556
data.2011082020_19 20110820223719 00:00:19 8560 450.526
data.2011082020_20 20110820223738 00:00:21 9046 430.762
 
You can't add anything to an external table, it is after all, a file.

You could move the data into a staging table and modify it there.

Regards

T
 
I meant that the external table has not been created yet. So do I need the unique field added to the flat file prior to setting up the external table?
 
When you define the parameters of the external table, you can define virtually any character to delimit your columns. Of course, you will want to use something traditional such as a comma or a <Tab> character.

Since Oracle's support for external tables appeared, I have used that method over direct use of SQL*Loader -- it is so much easier and cleaner.

As far as your protecting against duplicate data, yes, you can do set manipulation (e.g., MINUS) to bring into your staging table just the new rows. Another thought is to use your front-end application to forward only the new records to your staging step that processes the external table flat file.

[santa]Mufasa
(aka Dave of Sandy, Utah, USA)
“People may forget what you say, but they will never forget how you made them feel.
 
Why do you need to add a unique key to the external table? There must already be some combination of keys in the table which allows you to compare it with the target table or the process won't work. If you just need to add a unique surrogate primary key when populating the target table, you can do this when inserting to it using a sequence.

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top