Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

finding duplicate files

Status
Not open for further replies.

rajeshtektips

Programmer
Nov 7, 2005
7
US
Hi,

Iam new to shell scripting, and immediately need to write a shell script to find whether the incoming file is already exists in the directory or not.

Requirement is...

Daily we will get some files located in Incoming Folder.After that file is loaded to database the file is moved to Bakup folder.

The problem is sometimes we will get the file which is already exists in Backup folder but with different name.

We dont want to process those duplicate files.

I need to write a script to find whether the incoming file is existing or not in the Backup folder.

Please some one can help me in writing the script.

Thanks
raj
 
olded,

Okay..now I understand what was the UUOC..
Thank you
( English is not my first language :)...it is French )
 
Hi friends,

Thanks for all you support ...

In this code we are comparing the incoming file size with the existing files size.

If the incoming file size is 123 and one of the existing file size is 1234 still it gives me as duplicate file.

How should handle this?

Thanks raj
 
It would help if you included the actual code you would like us to fix?

Annihilannic.
 
here is the code..

#!/bin/sh
#--- Set the environment
COMPMAN_TOP=/u61/compman; export COMPMAN_TOP
SCRIPT_DIR=$COMPMAN_TOP/scripts; export SCRIPT_DIR
DATA_DIR=$COMPMAN_TOP/incoming; export DATA_DIR
LOG_DIR=$COMPMAN_TOP/log; export LOG_DIR
BACKUP_DIR=$COMPMAN_TOP/backup; export BACKUP_DIR
#--------------------------------------------------------------
#--- Check for the presence of Duplicate files

cd $BACKUP_DIR
sum * > sumlist
cd $DATA_DIR
NEWSUM=`sum $DATA_DIR/*.dat|awk '{ print $1 }'`
FOUND=`grep $NEWSUM $BACKUP_DIR/sumlist`
if [ "$FOUND" = '' ]
then
# {process this file]
echo 'NewFile'
else
echo 'Duplicate'
fi
 
Replace this:
FOUND=`grep $NEWSUM $BACKUP_DIR/sumlist`
with this:
FOUND=`grep "^$NEWSUM " $BACKUP_DIR/sumlist`

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ181-2886
 
Ah, it's the same code as above... in which case they are not file sizes you are comparing, but checksums.

Also I refer you back to olded's earlier post explaining that you can't compare the sums of the entire directory in one step, you will have to loop through each individual file.

Changing the grep to grep -w would fix the problem where it matches partial strings.

What flavour of Unix are you using?

Annihilannic.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top