finding duplicate files

rajeshtektips · Nov 7, 2005

Hi,

Iam new to shell scripting, and immediately need to write a shell script to find whether the incoming file is already exists in the directory or not.

Requirement is...

Daily we will get some files located in Incoming Folder.After that file is loaded to database the file is moved to Bakup folder.

The problem is sometimes we will get the file which is already exists in Backup folder but with different name.

We dont want to process those duplicate files.

I need to write a script to find whether the incoming file is existing or not in the Backup folder.

Please some one can help me in writing the script.

Thanks
raj

adimstec · Nov 9, 2005

olded,

Okay..now I understand what was the UUOC..
Thank you
( English is not my first language

...it is French )

rajeshtektips · Nov 9, 2005

Hi friends,

Thanks for all you support ...

In this code we are comparing the incoming file size with the existing files size.

If the incoming file size is 123 and one of the existing file size is 1234 still it gives me as duplicate file.

How should handle this?

Thanks raj

Annihilannic · Nov 9, 2005

It would help if you included the actual code you would like us to fix?

Annihilannic.

rajeshtektips · Nov 9, 2005

here is the code..

#!/bin/sh
#--- Set the environment
COMPMAN_TOP=/u61/compman; export COMPMAN_TOP
SCRIPT_DIR=$COMPMAN_TOP/scripts; export SCRIPT_DIR
DATA_DIR=$COMPMAN_TOP/incoming; export DATA_DIR
LOG_DIR=$COMPMAN_TOP/log; export LOG_DIR
BACKUP_DIR=$COMPMAN_TOP/backup; export BACKUP_DIR
#--------------------------------------------------------------
#--- Check for the presence of Duplicate files

cd $BACKUP_DIR
sum * > sumlist
cd $DATA_DIR
NEWSUM=`sum $DATA_DIR/*.dat|awk '{ print $1 }'`
FOUND=`grep $NEWSUM $BACKUP_DIR/sumlist`
if [ "$FOUND" = '' ]
then
# {process this file]
echo 'NewFile'
else
echo 'Duplicate'
fi

PHV · Nov 9, 2005

Replace this:
FOUND=`grep $NEWSUM $BACKUP_DIR/sumlist`
with this:
FOUND=`grep "^$NEWSUM " $BACKUP_DIR/sumlist`

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ181-2886

Annihilannic · Nov 9, 2005

Ah, it's the same code as above... in which case they are not file sizes you are comparing, but checksums.

Also I refer you back to olded's earlier post explaining that you can't compare the sums of the entire directory in one step, you will have to loop through each individual file.

Changing the grep to grep -w would fix the problem where it matches partial strings.

What flavour of Unix are you using?

Annihilannic.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

finding duplicate files

rajeshtektips

Programmer

adimstec

IS-IT--Management

rajeshtektips

Programmer

Annihilannic

MIS

rajeshtektips

Programmer

PHV

MIS

Annihilannic

MIS

Similar threads

Part and Inventory Search

Sponsor