Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Finding 5000 oldest files from a directory 1

Status
Not open for further replies.

balajipn

Programmer
Mar 30, 2004
65
IN
Hi,
I have a directory /logo/jpg/CustArtRepository which contains around a million files in it. My requirement is to search the oldest 5000 files from this directory and its subdirectories and move them to a archive directory. I am using the following command.

find /logo/jpg/CustArtRepository -type f -name "*" -exec ls -ltr {} \; 2>/dev/null

However, the above command take a very long time since there are nearly 10000 subdirectories under this. I would like this search to end as soon as it finishes its first 5000 files.

Is this possible? Can some one help me?


Thanks,
Balaji.

 
Maby it's something like this you'r looking to do?

To list files/folders by ctime, oldest first:
Code:
ls -1cr
only list the 5000 first (oldest)
Code:
ls -1cr | head -n5000
pipe it to awk to add rm -rf
Code:
ls -1cr | head -n5000 | awk '{ print "rm -rf",$0 }'
Check the output, and if it looks OK, then pipe it to the shell:
Code:
ls -1cr | head -n5000 | awk '{ print "rm -rf",$0 }' | sh

BTW I take no resp for lost files ;-)

HTH
 
You might not want to do "rm -rf". The "ls -1cr" will also list directory names, so the "rm -rf" will delete entire directory trees, not just individual files.

This might do it...
Code:
ls -1cr | head -n5000 | awk '{ print "rm -f",$0 }' | sh
It will fail on directories, but that might be what you want.


 
This would be far more efficient due to fewer invocations of rm:

Code:
ls -1cr | head -n5000 | xargs rm -f

Annihilannic.
 
Hi,

Thanks for your reply.

However, my directory in question (/logo/jpg/CustArtRepository) has nearly 10000 subdirectiries in it. The files are not directly under the /logo/jpg/CustArtRepository).

Thanks,
Balaji.
 
Ok maby this then:
Code:
ls -lR | grep '^-' | sort -k 6,7 | head -n5000 | awk '{ print $NF }' | xargs rm -f
ls -lR lists all files/folders recursively.
grep shows only files (line starts with a - )
sort sorts the output on the date and time fields.
head shows the 5000 first after its been sorted.
awk prints the last field (the filename), and last:
xargs do the actual deleting.

 
sort doesn't automatically know the collating sequence of dates and times does it?

Annihilannic.
 
I've got a script to delete files based on date and time, unfortunately my ftp server is down at the moment. As soon as I get access I'll post the script.

Mike

"Whenever I dwell for any length of time on my own shortcomings, they gradually begin to seem mild, harmless, rather engaging little things, not at all like the staring defects in other people's characters."
 
Let's not forget that his original post asked that the files be MOVED, not RE-MOVED.
Slightly different end result.

"Proof that there is intelligent life in Oregon. Well, Life anyway.
 
Hi,

ls -lR | grep '^-' | sort -k 6,7 | head -n5000 | awk '{ print $NF }'

The above command did not work as my ls -lR output format is different.

See my sample output.

-rw-rw-r-- 1 logoview samba 452134 Apr 02 2007 0728935_CA002_300.jpg

My aix date format is MMM DD CCYY.

Also, as motoslide mentioned, my intention is not to REMOVE the files. I just want to move them to an archive directory.

Thanks,
Balaji.
 
Sorry for mistaking the "move" bit for remove.
If AIX supports it then ls could sort the files...
Code:
ls -lctR | grep '^-' | tail -n5000

HTW
 
The sort would work if your version of ls outputs the time as 'yyyy-mm-dd HH:MM', although many (like the OP's) output it as 'mon d [HH:MM| yyyy]' (time is displayed if the file is < 6 months old, otherwise year is displayed). Of course, even then you still have to have the full output of the ls command before you can sort everything.

If you want something dependable that runs fast, you might want to code something up in C.

The program would keep an internal list of 5000 files, and as it pulls up each file, it would check to see if the file is newer than the newest in the list. If not, it inserts it (in order) and pops off the newest (the first 5000 files you hit would all go in the list though to initialize it). Once you're done, then go though your list and move the files. If the archive directory is on the same filesystem, this is as easy as link() and unlink(), if not then you have to essentially copy and delete. Or, you can just output the filenames and pipe the output to "xargs mv".

This brings up another question, though. You have several subdirectories under the main one. Do you want to maintain the directory tree under the archive directory or just put all the files in one directory?
 
Having ls do the sort won't work. It only does it on a per-directory basis, not the entire recursive listing.
 
I want to maintain the directory structure under the archive directory.

Thanks,
Balaji.
 
Try this:

Code:
find /logo/jpg/CustArtRepository -type f | xargs ls -l | awk '$8 !~ /:/' | sort -r -k 8,8 -k 6,6M -k 7,7 | tail -5000

sort is actually aware of how to sort months, but you have to tell it specifically when to use it.

It's still going to take a while because it needs to get the modification dates of all the files to perform the sort.

Note that it ignores any files with a : in the time/year field, presuming that they are too recent, and they would make sorting more complicated.

Annihilannic.
 
If speed is the issue, and if older files don't "appear" suddenly, you might want to think of building a database of times and paths via a cron job or something. This can be a C recursive progrm, much like the guts of "cp -R" or "chown -R" to get the logic. I would use the simple mktime time stamp which is seconds offset from Jan 1 1970 GMT so the format of the sort is not an issue.

Then when you need, you can pare off the first 5000 whose mod date has not changed/updated and those are moved. If you run out, rebuild the database.

I know of at least one HSM that uses this exact method to pick files to migrate.

eugene
 
Hi,

I tried the following command in my AIX.

find /logo/jpg/CustArtRepository -type f -mtime +365 -name "*.jpg" | xargs ls -l | awk '$8 !~ /:/' | sort -r -k 8,8 -k 6,6M -k 7,7 > /tmp/jpg.txt1

Unfortunately, the following error message is the one I received.
============================================================
Usage: sort [-Abcdfimnru] [-T Directory] [-t Character] [-o File][-y[Kilobytes]] [-z Recordsize] [-k Keydefinition]... [[+Position1][-Position2]]... [File]...

xargs: 0402-057 The ls command was not found or could not be run.
============================================================

It looks like the sort in our system is not supporting the 6M option.

I really don't want to code a c program for this as it complicates things (Approval, Promote process etc) in our organization.

In the worst case scenario, I am going to do this.

1. Run the following command.
find /logo/jpg/CustArtRepository -type f -mtime +365 -name "*.jpg" | xargs ls -l > /tmp/jpg1.txt
2. Grep for " 1990 " thru " 2004 " to find out the files by year and archive them.

I know this is not a clean way, but it is safe way of acheiving what I want.

Thanks,
Balaji.


 
If that's basically what you want to achieve, why not:

Code:
$ touch -t 200501010000 /tmp/2005
$ find /logo/jpg/CustArtRepository -type f ! -newer /tmp/2005 -name "*.jpg" | xargs ls -l > /tmp/jpg1.txt

Alternatively you could use awk to replace the months with sortable values, e.g.

Code:
find /logo/jpg/CustArtRepository -type f | xargs /usr/bin/ls -l | awk '
        BEGIN {
                m["Jan"]=1
                m["Feb"]=2
                m["Mar"]=3
                m["Apr"]=4
                m["May"]=5
                m["Jun"]=6
                m["Jul"]=7
                m["Aug"]=8
                m["Sep"]=9
                m["Oct"]=10
                m["Nov"]=11
                m["Dec"]=12
        }
        $8 !~ /:/ { $6=m[$6]; print }
' | sort -rn -k 8,8 -k 6,6 -k 7,7 | tail -5000

Incidentally, there was a potential error in my previous solution, I've added an 'n' to ensure the days of the month are sorted numerically. Also adding /usr/bin should help xargs find ls (even on AIX! :) )

Code:
find /logo/jpg/CustArtRepository -type f | xargs [COLOR=red]/usr/bin/[/color]ls -l | awk '$8 !~ /:/' | sort -r[COLOR=red]n[/color] -k 8,8 -k 6,6M -k 7,7 | tail -5000

Annihilannic.
 
Can the locate database be accessed in order to query it for the oldest files within the directory structure?
 
Hi Annihilannic,

Your awk solution worked like a charm. A big thanks for your help on this.

Thanks,
Balaji.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top