Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

A fun script I need thoughts/ideas about writing...

Status
Not open for further replies.

mountainbiker

Programmer
Aug 21, 2002
122
GB
I got a Unix web file system with LOADS of duplicate files. Don't ask why :) These specific files have a very unique filename, e.g.,

/dir1/dir2/.../dirX/2002/08/24/20020824-0800-restOfFilename.

I would like to make ONE copy of the file to a new unique location, e.g.,

/dir1/dir2/2002/08/24/20020824-0800-restOfFilename.

At all the other locations I would like to delete the physical file and replace it with a Unix symbolic link (ln -s) to the new file. For example, I might have 6 locations with symbolic links to one location instead of 6 physical files. (Thus, freeing disk space and keeping web page hyperlinks functioning without problem.)

The web file system has almost a 800MB of files. Each file maybe duplicated up to about 10 times. There will be filenames that are not in the unique format (e.g., index.html, homepage.html, logo.gif, etc.) which can be ignored (at this time unless the contents, creation date, etc. are the same).

Thoughts/ideas/help?
 
Mountainbiker:

How do you know which object should be a link, and which should be the file?

Regards,

Ed
 
I need all the physical files under dirX/... to be replaced with sybmbolic links pointing to the physical file under /dir1/dir2/... The latter file has to be created with one of the instances under dirX/... before it is replaced with a symbolic link.

For example,
STEP 1:
The physical files (that are the same) are located
/data/articles/subject/a/2002/08/25/20020825-1200-Filename1
/data/articles/subject/b/2002/08/25/20020825-1200-Filename1
/data/articles/subject/v/2002/08/25/20020825-1200-Filename1
/data/articles/subject/x/2002/08/25/20020825-1200-Filename1
/data/articles/subject/z/2002/08/25/20020825-1200-Filename1

STEP 2:
The (first occurrence of) physcial file
/data/articles/subject/a/2002/08/25/20020825-1200-Filename1
gets copied
/data/articles/2002/08/25/20020825-1200-Filename1

STEP 3:
The physical files in step 1 are replaced with symbolic links to /data/articles/2002/08/25/20020825-1200-Filename1

 
# Hi:

# Obviously, not tested. This is what i've interpreted from
# your email. Use at your risk, and please check thoroughly:

#!/bin/ksh

destdir=/data/articles/2002/08/25
# STEP 1:
# find all the copies of the file
find /data/articles -type f -print |
while read file
do
# get the file name eliminating the path
f_name=$(basename $file)
destfile=$destdir/$f_name
# STEP 2:
if [ ! -r "$destfile" ]
then # do the initial copy if file doesn't exist
#cp $file $destfile
fi
# STEP 3:
rm $file # remove the original file
ln -s $destfile $file # and create the link
done
 
The above script looks conceptually correct. It could be optimised by moving the file(mv) rather than copying the file(cp)

Mountainbiker,
Are the other files index.html etc in a different directory structure ? According to the above script if you have two index.html in two differnt subdirectories you will end up loosing one of the files.


amit
crazy_indian@lycos.com

to bug is human to debug devine
 
Amit:

Incidently, my cp command was commented out above. It should not be.

>It could be optimised by moving the file(mv) rather than copying the file(cp)

A reasonable optimisation provided you don't have any problem mv'ing accross filesystems. You'd change the code thusly:

# STEP 2:
if [ ! -r "$destfile" ]
then
mv $file $destfile
else
# STEP 3:
rm $file # remove the original file
fi
ln -s $destfile $file # and create the link

>According to the above script if you have two index.html in two differnt
> subdirectories you will end up loosing one of the files.

I'd say your observation is correct.

Regards,


Ed
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top