Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Filer class has a bug or I'm missing something 2

Status
Not open for further replies.

dkean4

Programmer
Feb 15, 2015
282
0
0
US
I have been massaging this code all weekend and need some help.

Code:
oFiler = createobject('Filer.FileUtil')
oFiler.SearchPath = "F:\"
oFiler.FileExpression = "*.*"
oFiler.SubFolder = 0
oFiler.Find(0)

for ix = 1 to oFiler.Files.Count
  ? oFiler.Files(m.ix).Path, oFiler.Files(m.ix).Name
endfor

Works perfect! And it is quick. But it only works on the particular directory specified in oFiler.SearchPath. In this case the Search Path is the drive "F:\". The reason it works only for the immediate Path to the folder is because oFiler.Subfolder is set to "0". Works flawlessly.

So, let's change up oFiler.Subfolder to "1" and try it again:

Code:
oFiler = createobject('Filer.FileUtil')
oFiler.SearchPath = "F:\"
oFiler.FileExpression = "*.*"
oFiler.SubFolder = 1
oFiler.Find(0)

for ix = 1 to oFiler.Files.Count
  ? oFiler.Files(m.ix).Path, oFiler.Files(m.ix).Name
endfor

Well, now when I run it, and I tried it on 2 different WIN7 machines, it appears to lock up. But it is not locked up. It waits for the user to press ESC and subsequently runs the FOR loop. And it is ready to run the loop as soon as the user clicks the key. The speed of processing the entire list is impressive. For me that is a winner! I could not ask for more.

The only problem is "Why does it turn the cursor into a 'Processing' icon and requires for the user to press the ESCAPE key. What is it doing? What am I missing? Is there another switch that needs to be set to continue when the Subfolders switch is set to on or 1? I am baffled! Many people refer to this short script, on the WEB, and I assume it must work for them.

TIA

Dennis


 
Thank you for pioneering with this form of recursion.
I starred your routines because I hope I might now perhaps be able to replace my (extensive) ADIR() recursions with your excellent routines. Based on your results (so far), Dennis, Olaf, Mike, Atlopes, may I ask you.

1) Do you now recommend replacing ADIR() with Filer ... for recursing windows sub-directories (to conditionally 'get' files)?
2) Can you possibly estimate 'about' how much faster filer might be (vs. Adir()) for recursion (per se or specifically) ... considering your own computer specs.

Utmost thanks and blessings, in advance. Utmost apologies if I am out-of-order.
Philip
 
Philip,

I haven't worked on this as extensively as Dennis, Olaf or Atlopes. I can only give you the result of my own very limited testing.

The Filer code, as shown in the above posts, took longer for me that my own recursive ADIR() code. But it wasn't a like-for-like comparison. In my ADIR() testing, I was searching for a specific file - one file among about 100,000. Once the file was found, I displayed its name and path, and then stopped.

With the Filer code, we are getting the file names and paths of every file, putting them in a collection, and then looping through the collection, displaying the name of each of the files. The actual process of displaying the names probably takes much longer than getting the names in the first place. To do a fair test, it would be better to loop through the collection, displaying only the path and name of the target file, and then stopping.

In other words, if you are trying to decide which method to use, you really need to do your own tests. I think it's likely that the Filer solution would be faster, but you need to test that for yourself.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
Mike, you could also let Filer only look out for a single file instead of *.*, still the comparison wouldn't be fair, since filer will not stop at finding the first file with a name, there may be an option to do so, though, similar to the option to recurse or not recurse. Display of all files should obviously not be counted to the time, you're only interested in how long it needs to get a result.

I prefer convention over configuration over needing to search for a file, but that's just about a better conception of an application than about the core question what search would be faster, if you need it.

I can do the test later, I'd guess filer is faster, I know SYS(2000) is faster than ADIR, I think we had a discussion about these already. I don't know how Scripting.Filesystemobject would position in these four options, but indexing service could work quite immediately, if it would index all drives and folders, which it doesn't, even if users don't turn off that service.

Bye, Olaf.



 
mydocpro,

You are fine, my friend. Your questions are on point, but I was not able to answer them until I did more tests.

You asked:
mydocpro said:
1) Do you now recommend replacing ADIR() with Filer ... for recursing windows sub-directories (to conditionally 'get' files)?
2) Can you possibly estimate 'about' how much faster filer might be (vs. Adir()) for recursion (per se or specifically) ... considering your own computer specs.

The last two days I spent testing FILER and could not reply to you. In answer to your 1st question, it depends on what you will be searching. If you use search like *.* it generates a huge overload on the Filer.Files object and somehow Filer can't figure out when it has completed, so it takes a nap and leaves a sign at the door that says "I'm Busy Processing". A knock on the door with and ESC key will wake it up and it will continue to completion. A too early knock on the door will interrupt it before it has completed.

It turns out that if your search criteria returns more than X number of files, Filer doses off and you have to wake it up. One example on my pc is "*.jpg". I have lots of those and though it finds them all, it takes a nap. If I look for "*.bmp" it takes 31 seconds to touch all the files on my drive C:\ and I get all the *.bmps returned.

The tough question is "What is the value of X". I have not discovered a precise answer for that, and likely it varies with each PC configuration main memory etc... On my PC it finds 100,000 directory items without locking up. It is likely that the number is much larger.

To answer your question #2, the speed is great and certainly better than ADIR(). I have played with ADIR() 3 years ago and my recursion takes more time, but if you stop the search at the first file found it might be faster than FILER. If you search starting with the Drive name like C:\ the search will touch all the files and that is just the price you have to pay the piper. My drive C:\ takes 31-32 seconds and it has 350 Gigabytes of data on it. Also on my SSD drive PC, yesterday, with Filer I got nearly 80,000 items/second consistently and that makes me smile. You will likely get different delays, but run a test and see how long it takes you to find files with your ADIR(). Then do tests with the script I supplied below. With Filer the more data you have on the disk the longer it takes if you start at the root. If you select a a deeper path it will likely improve radically. Have fun my friend... And let me know how it turns out.

And this is the stripped down version of what I will be using:

Code:
CLEAR
SET ESCAPE ON 
LOCAL Filer
LOCAL FileFound
LOCAL LastTenMinutes AS Datetime
m.LastTenMinutes = DATETIME() - 600
m.Filer = CREATEOBJECT("Filer.FileUtil")
m.Filer.SearchPath = GETDIR()
m.Filer.FileExpression = "*.jpg"
m.Filer.SubFolder = 1
Filer.Find(0)

i=0
@ 30,10 say Filer.Files.Count && This prints the number of files found (number X)
FOR EACH m.FileFound IN m.Filer.Files
	i = i + 1 
	@ 10,10 say STR(i)+" files | "+FileFound.Path + m.FileFound.Name
ENDFOR




Simplicity is the extreme degree of sophistication.
Leonardo da Vinci
 
Hi Dennis, interesting result of your observation. Let me test my c: Drive, which should be the most problematic, but also quite fast, being an SSD, too.

I still think it's not the mere number of files causing Filer to go into a nap, there has to be some loopback making it not finish, like a junction or symlink in some subfolder pointing back to where the tree traversing already was, causing an endless loop.

But indeed, if I wait long enough, also about 30 seconds or longer, I always need to press a key and get about 1.3 million files for the system drive. I can wait much longer and the count does not rise any more.

So we/you found a weirdness in filer. That in itself is an unfortunate bug with no real workaround. You can't see from outside when Filer.Files.Count is stagnating and the find call should be terminated. Maybe with a timer? Nope, I just tried that and timer events don't occur while Filer does its find, also with AutoYield, but since Find is just one call of the COM object autoyield would only work after Filer.Find(0) finishes.

I don't see anything further, no property or method or parameterization, which could make filer stop:
Bye, Olaf.
 
Olaf,

You, pretty much, grasped the idea of what is going on. Like I said in my last post, It seems to be related to Filer.Files.Count. When that exceeds some X number it jams up. So, to get around the problem I tried a two stage approach, which works pretty well, except, once again when oFiler.Files.count on individual subdirectory branches exceeds X items it jams up too. But we have been inadvertently, or foolishly been testing Filer under the worst of circumstances, using "*.*", which returns all the files. And to me it is a miracle that it returns anything, especially EVERY file after we press ESC. It is an amazing feat.

Give it a try and see how it does with you. Of course, my assumptions could still be wrong, but not that far, though. I am pleased with the results.




Code:
**  Two Stage search  using FILER  (createobject('Filer.FileUtil'))	
**  Stage 1 is guaranteed to succeed here because					
**    oFiler.SubFolder is set to 0. That never locks up for me		


CLEAR
_screen.fontsize = 10
_screen.fontName = "Ariel"
_screen.fontBold = .F.
_screen.ForeColor = RGB(0,0,0)
_screen.BackColor = RGB(255,255,255)
_screen.Picture = ""

oFiler = createobject('Filer.FileUtil')
oFiler.SearchPath = "C:\"   &&  My C:\ drive always locks up... 				
							&&   so we search it with oFiler.SubFolder = 0		
oFiler.FileExpression = "*.*"
oFiler.SubFolder = 0  		&&  oFiler.SubFolder set to zero always succeeds.	
? oFiler.find(0)			&&  This is the actual search 						
SET ESCAPE ON

I = 1

*  Use CURSOR in stage 2, to iterate the subdirectories found in stage 1 
IF USED("mainSubDirs")
	USE 
ENDIF 
CREATE CURSOR mainSubDirs(sub_dir C(50))
SELECT mainSubDirs

FOR EACH FileFound IN oFiler.Files
	IF FileFound.size = 0 AND LEFT(FileFound.Name,1)<>"."
		APPEND BLANK 
		REPLACE sub_dir WITH FileFound.Name  &&  Fill up the cursor 
		? FileFound.Name
		I = I + 1 
	ELSE 
		? STR(i)+"|"+FileFound.Path + m.FileFound.Name  &&  These files are found in stage 1 
	ENDIF 
NEXT 


**  Stage two does the recursive process to list all items   

SearchPathAbove = oFiler.SearchPath

SCAN 
	oFiler.SearchPath = ADDBS(ADDBS(SearchPathAbove)+ALLTRIM(sub_dir))
	oFiler.FileExpression = "*.bmp"
	oFiler.SubFolder = 1
	? oFiler.find(0), oFiler.SearchPath, oFiler.Files.count
	I = 1
	J = 1
	FOR EACH FileFound IN oFiler.Files

		IF LEFT(FileFound.Name,1)<>"."
			IF FileFound.size = 0 
				*  This is the slowest process in the recursion
				*  Typically you would just look for a specific file.
				*  You do that in oFiler.FileExpression = "<file name>"
				*  To test the speed eliminate all the print comamnds (?) 
				? STR(i)+"|"+FileFound.Path + m.FileFound.Name  &&  These are found subDirs 
			ELSE 
				? STR(i)+"|"+FileFound.Path + m.FileFound.Name  &&  These are found files 
			ENDIF 
		ENDIF 
		i = i + 1 
		IF J>45
			CLEAR
			J = 1
		ENDIF 
		j = j + 1 
	NEXT 
ENDSCAN 

*!*	SELECT mainSubDirs
*!*	USE 
*!*	RELEASE ALL

Regards,

Dennis

Simplicity is the extreme degree of sophistication.
Leonardo da Vinci
 
By the Olaf,

The two stage method jams up a lot less than the one we used before...


Dennis

Simplicity is the extreme degree of sophistication.
Leonardo da Vinci
 
Olaf,

While all this is fresh in my mind, compare the jam ups with the following search types "*.*", "*.jpg", "*.bmp" and "*.tif"

That should give you a comparison to rationalize why it jams up on high count of files. Use the original method. And if you have high number of all those, try something like "*.bat".

Then use the Two Stage method and you will see be able to compare and see why I suspect the high count of files. All the search types with lower file counts will go through. If it was some end of list boundary failing it would remain on the same branches. But it does not. I now can navigate through the branches which locked up with stage one. And the only common denominator is the number of files being returned.

Of course, the solution to that is to become a tapeworm and follow down the gauntlet to a lower count directory... ha ha ha... Just want you to know that I did that too. And the result is that the jam up disappears... You can test that with my Two Stage script.

Regards


Dennis





Simplicity is the extreme degree of sophistication.
Leonardo da Vinci
 
Thank you Dennis, Mike, and Olaf. Apologies for late response (work)

After pondering on Olaf's URL-site and Dennis' excellent work-around; I launched Dennis' code) (ssd, I7, 250GB drive, vfp9, (print on)) with 'command prompt' results:

1) About 19 seconds on 1st trial searching for *.bmp files.
2) About 7 seconds on 2nd and 3rd trial "" "" "".
3) About 20 seconds on 1st and 2nd trial for *.jpg files (which are about 400x more numerous on my pc ... spread out into thousands of (client) (sub) directories

Conclusions:
The slower initial recursion (#1) I cannot explain ... perhaps that is due to slow initial read-writes and/or caching of vfp and/or windows. I do not know.
Else, my results seems to 'concur with' what all of you have thoughtfully stated or inferred.
My hope: Dennis' recursion engine seems ideal for swiftly finding the "needle-file in the haystack (multitudes of windows directories)" and using as a replacement for ADIR() ... but further tests on my part are needed.
Olaf's URL-site seems to provide other properites to search besides those found in ADIR()s arrays (iirc) ... like "LastAccessTime" and "LastWriteTime"

Utmost thanks and blessings,
Philip
 
Slower initial performance surely is about the caching of the file system infos you read. You have to take this as the end user time rather than any later calls, the first runs are more to the point, because the normal state of a computer is not having all file system directory infos cached. To make a fair comparison of anything you'd therefore always need to time the first run with any code, otherwise what you test as second/third alternative always wins from profiting from any cahcing, OS or VFP side cahcing, even hardware caching. That's a very general rule about benchmakring performance.

>Olaf's URL-site seems to provide other properites to search besides those found in ADIR()s arrays (iirc)

File times are also in the ADIR generated arrays. but only last modified. So, yes, Filer has more file time properties, but thee are tons of ways to determine all info on files. Foxpro has no function returning file creation time, that's correct, you always have to use something else, but when did you ever need a file creation time in code? The most interesting file time is that of last modification. Even for the use case to determine server time by creating a file on a server side file share: In that moment last modification is equal to creation time.

Bye, Olaf.
 
mydocpro,

I'm glad you found it useful. Last week, testing it, the algorithm evolved and I applied it within a form and also made a version which acts as a Function which requires parameters and returns the time taken, number of files found and the listings of the files. The parameters required include the file Type, the starting directory and a list of paths which I do not want to have searched. Why waste time searching huge directory paths I know for sure do not contain the files I'm searching for. Sometimes I send an empty string and it searches all and other times I have a list of all the biggies like Documents and Settings, Program Files, Program Files (x86), ProgramData, Users etc... With that, Most of my results are now sub 3 seconds.

My next goal is to get it under 0.5 seconds across all the paths. I think that it is possible. I tested it manually and it got it done in 0.8 seconds. But to be sure that I am not delusional, I will not post that until I check my sanity thoroughly. I embarrassed myself once already.

This has been a pain in my neck ever since the 1 Gigabyte drives came out in the 80's.


So, here is my FUNCTION version of it for you mydocpro. This is the one I use with my automation app, currently. And I added a new section to the code which includes files found in the starting directory. I don't know if you noticed that the files you search for are not collected from the starting point directory in the two stage I posted earlier. Evolution, you know... ha ha.. This will do it right...

Code:
**  Two Stage search  using FILER  (createobject('Filer.FileUtil'))	
FUNCTION File_Search 
	LPARAMETERS file_type, startPath, firbiddenPaths
	LOCAL Result

	startTime = SECONDS()
	SET MEMOWIDTH TO 8192
	SET ESCAPE ON

	oFiler = createobject('Filer.FileUtil')
	IF EMPTY(ALLTRIM(startPath))
		oFiler.SearchPath = GETDIR()   	&&  My C:\ drive always locks up... 			
	ELSE 
		oFiler.SearchPath = startPath
	ENDIF 
									&&   so we search it with oFiler.SubFolder = 0		
	oFiler.SubFolder = 0  			&&  oFiler.SubFolder set to zero always succeeds.	
	*ThisForm.PF.ActivePage = 2
	*Thisform.prompting.Visible = .T.

	**  Use CURSOR in stage 2, to iterate the subdirectories found in stage 1 	
	**																			
	oFiler.FileExpression = "*.*"
	oFiler.find(0)			&&  This is the actual search without the subdirectory switch						
	IF USED("mainSubDirs")
		USE 
	ENDIF 
	CREATE CURSOR  mainSubDirs(sub_dir C(50)) 
	SELECT mainSubDirs
	FOR EACH FileFound IN oFiler.Files
		IF FileFound.size = 0 AND LEFT(FileFound.Name,1)<>"."
			APPEND BLANK 
			REPLACE sub_dir WITH FileFound.Name  &&  Fill up the cursor  
		ELSE
			*? STR(i)+"|"+FileFound.Path + m.FileFound.Name  &&  These files are found in stage 1 
			*ThisForm.PF.Page2.Edit1.Value = ThisForm.PF.Page2.Edit1.Value + CHR(13) + FileFound.Path + m.FileFound.Name
		ENDIF 
	NEXT 

	**  Fetch all the files in the root path before digging into subdirectories 
	**	New section to add files in the starting directory																		
	oFiler.FileExpression = ALLTRIM(File_Type)
	oFiler.find(0)					&&  This is the actual search 						
	I = 0
	*ThisForm.PF.Page2.Edit1.Value = ""
	Result = ""
	FOR EACH FileFound IN oFiler.Files
		IF FileFound.size = 0 
			*? STR(i)+"|"+FileFound.Path + m.FileFound.Name  &&  These are found subDirs 
		ELSE 
			Result = Result + CHR(13) + FileFound.Path + m.FileFound.Name
			I = I + 1 
		ENDIF 
	NEXT 

	**  Stage 2  does the recursive process to list all items   
	**															
	FOR II = 1 TO MEMLINES(firbiddenPaths)
		btn_x = ALLTRIM(MLINE(firbiddenPaths,II))
		LOCATE FOR sub_dir = btn_x
		IF FOUND()
			DELETE 
		ENDIF 
	NEXT 

	SearchPathAbove = oFiler.SearchPath
	SCAN 
		oFiler.SearchPath = ADDBS(ADDBS(SearchPathAbove)+ALLTRIM(sub_dir))
		oFiler.FileExpression = ALLTRIM(File_Type)
		oFiler.SubFolder = 1
		oFiler.find(0)
		
		* oFiler.SearchPath, oFiler.Files.count
		FOR EACH FileFound IN oFiler.Files
			IF LEFT(FileFound.Name,1)<>"."
				IF FileFound.size = 0 
					*? STR(i)+"|"+FileFound.Path + m.FileFound.Name  &&  These are found subDirs 
				ELSE 
					Result = Result + CHR(13) + FileFound.Path + m.FileFound.Name
					I = I + 1 
				ENDIF 
			ENDIF 
		NEXT 
	ENDSCAN 

	endTime = SECONDS()

	Result = "Processed in : "+STR(EndTime-startTime)+" Seconds"+CHR(13)+Result
	Result = "       Found : "+STR(I)+" Files" + CHR(13) + Result
	SELECT mainSubDirs
	USE 
	
	RETURN Result 
ENDFUNC

Cheers,



Simplicity is the extreme degree of sophistication.
Leonardo da Vinci
 
Your recursion only goes one subfolder level down, doesn't it? Real recursion of directories in the end will have a self call somewhere in the code, and I don't see any code line in File_Search calling File_Search.

Bye, Olaf.
 
Olaf,

The recursion is internal to Filer when you set the switch.
Code:
oFiler.SubFolder = 1
That is why I appealed to this method. No one so far seems to have taken advantage of this feature, in this way. That's what my fascination and clamoring is about! Simple and beautiful.



Dennis

Simplicity is the extreme degree of sophistication.
Leonardo da Vinci
 
OK, now I see you using Subfolder=1 ther, but how will you then avoid processing unwanted folders not in the first list? The list of unwanted folders then can only be first level folders, can't they? Then that's what I'd see as imperfection of this. It might be sufficient. It might even be sufficient to avoid recusing %WINDIR%. But I assume (untested) also other large drives with lots of files make Filer hang. Overall it makes more sense to work on a whitelist of folders to search in, but that's opinion.

Bye, Olaf.
 
Olaf,

It is possible and it will require a little more dabbling. You can take over the recursion and do all the subdirectories with oFiler.SubFolder = 0, but an average of 3 seconds is way better than the Object Filer offered in form style, which I was struggling with. To do the equivalent it took often more than 3.5 minutes. 60 times faster is good, my friend.

And if you are looking for a specific file, the 3 seconds drop to an average of 1 second. I tested it manually and will implement that next.


Simplicity is the extreme degree of sophistication.
Leonardo da Vinci
 
Olaf,

You can decide to do the first stage to a depth of 5 levels and then switch over to 2nd stage recursion for the rest and it will give you more flexibility. But the deeper you go into blocking search the more you have to handle on the front end. However, it is quite possible to do it dynamically, where you decide up front how deep you want to perform the recursion. And that should not be that hard. But there is a break even point. Personally I like it on the 1st level and will try it on level 2 just for kicks.


Dennis


Simplicity is the extreme degree of sophistication.
Leonardo da Vinci
 
My ideas are not only about performance. Looking at the problem of finding a file is in first order a problem of tidiness. Why does anyone need to search anything? To rather offer an interface asking the user in which start folder he wants to find something at least educates users to have the least discipline to create a system of folders where there is a precedence of where things go.

Granted, there might be situations you don't have control and don't know where certain files go, that's most often situations only the related application has to know.

Even a picture organizer not knowing what folders in a computer contain pictures should rather let this be aconfiguration item then make a full search. You find lots of images belonging to system or applications you don't want to organize with such a software, most of them would just clutter the database you build up with nonsense. Just an example. A user trying to manage his photos, for example, should know one, perhaps two or three places on hdd where they are stored.

Of course that's not the only use case of file finders, I do remember situations I have search all computer for certain things, but often enough then don't find anything. If we talk about source code management you should have working directories, repositories, for example, if we talk about anything, I can't think of a situation something is scattered all over the system.

Bye, Olaf.
 
Small enhancements:
Code:
IF USED("mainSubDirs")
	USE 
ENDIF 
CREATE CURSOR mainSubDirs(sub_dir C(50)) 
SELECT mainSubDirs
Instead:
Code:
USE IN SELECT("mainSubDirs")
CREATE CURSOR mainSubDirs(sub_dir C(50))
When created, CURSOR is already selected, and
Code:
APPEND BLANK 
REPLACE sub_dir WITH FileFound.Name  &&  Fill up the cursor
Should be:
Code:
INSERT INTO MainSubdirs FileFound.Name
 
Olaf,

I try to be very orderly and keep my photos, videos etc... in specific zones. But I also have many PCs and my management system has evolved over the years. I have dozens of drives. I have hundreds of VFP Dev Projects. I do not mix up my projects in one common directory. As a consequence photos , PRGs, DBFs VCXs, SCXs are sprawled out in many places on my VFP drive, like "G:\". Occasionally I start a new project and copy everything from a similar project and as you can see files get sprawled all over the place. My file management efforts always fail in part, when I try to do a task in a hurry.

So, while I hear you, the reality is that this tool is great for helping to relocate files, duplicate them, design a manager which can manage and organize much faster than flipping through two Explorer windows+directories to move, copy or manage files. Mostly, however, I need my Automation Tools to be able to perform many of these tasks on their own, without me looking at the files once. It is the primary reason I created this tool. I cannot thank you guys enough for reminding me of the existence of FILER. I played with it many years ago and nearly forgot its existence. Today I was working on the "Search for String in Files". Works great. It turns out to be a terrific tool to intersect with various criteria and find what you are looking for in a river of Gigabytes.

Your contributions to this forum (meaning Mike, you, ATlopes and others) is so appreciated, Olaf. And my love for VFP is growing back up. Is there anything one cannot do with VFP??? NADA! RIEN! NISHTA!

Cheers to my good friends,


Dennis


Simplicity is the extreme degree of sophistication.
Leonardo da Vinci
 
drdolittle,

Thanks for the enhancements. It is very appreciated. I am not someone who endorses the "m.Variable" format and some of the other pedantics, but yours are good tips which I always appreciate. My VFP is so rusty... It has been a while...

Regards,

Dennis

Simplicity is the extreme degree of sophistication.
Leonardo da Vinci
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top