Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Counting files by date - filling unused dates with 0's 1

Status
Not open for further replies.

Me4President

Technical User
Sep 30, 2009
7
NL
Hello,

I am trying to count the number of files created per day with a given extension. I am searching a folder recursively.

What I have at the moment counts the number of files per day, but does not include those days where there are no files.

Code:
#!/usr/bin/env python

import os, glob, time
print '-'*60
root = raw_input("Folder to search:\n") + '/*'                  
#print root
ext = '/*.' + raw_input("File type to filter:\n")  
file_out = open(raw_input('Fileout name : \n'),"w")   
t_start = time.time()
print '-'*26 + 'Running' + '-'*27 # approx 20 seconds / 1000 files
date_file_list = []                                         	 
for folder in glob.glob(root):                              	 
    for file in glob.glob(folder + ext):                    	 
        stats = os.stat(file)                               	              	 
        date_file_tuple = time.localtime(stats[8]), file                	 
        date_file_list.append(date_file_tuple)              	                             
        daylist = []                       
        for file in date_file_list:                         
	    days = time.strftime("%d/%m/%y", file[0])              
            daylist += [days]                               
d = {}                              
from sets import Set                
for i in Set(daylist):              
    d[i] = daylist.count(i)                       
    file_out.write('%s \t%s \n' % (i,d[i]))      
print 'Total no. of %s files = %d\n%.2f seconds runtime\n ' % (ext, len(daylist),time.time()-t_start)
print '='*60

It is probably not as efficient as it should be - any tips on how to improve the loops would be very helpful :D

I import the out_file to excel to create a chart of the number of files created per day.

Does anyone know a good way of including those days where there were no files created?

Any help greatly appreciated!
 
Here's how I would go about it:
I'd glob (as you do) a list of the files.
Then I'd getctime another list of times corresponding to the creation times of each of those files.
Then I'd make a dictionary of {ctime:filename}'s
Then I'd sort the list of ctimes
Code:
import glob, os, time
startd='e:/python/test/'
dlst=glob.glob(startd+'*.*')
tlst=[os.path.getctime(f) for f in dlst]
d=dict(zip(tlst,dlst))
tlst.sort()
day0=time.strftime('%d',time.localtime(tlst[0]))
dayf=time.strftime('%d',time.localtime(tlst[-1]))
Now I'd create another list of the days I wanted (the simplest way is to work in POSIX as returned by getctime remembering that 1 day=86400 seconds).

_________________
Bob Rashkin
 
Thanks a lot for your help, it has taught me a lot! By the way, I am also a geophysicist, with just over a year of experience.

I have got the script doing what I want, using your method to create a list. I didn't need to know each file, just the frequency per day so I just made a list of the dates.

Code:
#!usr/bin/env python

import glob, os, datetime
startd='C:\\Computer\\MyPython\\'              
ext = '/*.*'
file_out = open(raw_input('Fileout name : \n'),"w") 

flst=glob.glob(startd+'*.*')
dlst=[datetime.date.fromtimestamp(os.path.getctime(f)) for f in flst]
dlst.sort()
start_date=dlst[0]
end_date=dlst[-1]
print start_date, end_date
from datetime import timedelta
daylst=[start_date+timedelta(n)for n in range((end_date - start_date).days)]
print 'start=%s, end=%s' %(daylst[0], daylst[-1])
d={}
for i in set(daylst):
    d[i]=dlst.count(i)
    print i, d[i]
    file_out.write('%s \t%d \n' % (i,d[i]))
file_out.close()

Many thanks!
Now I can't decide if I should be sorting the list by date in the script or simply in excel where I will make a histogram of it...
Is there an easy way of sorting the daylst by date?

 
daylst.sort() will sort the list in place.

_________________
Bob Rashkin
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top