Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Performance problem 4

Status
Not open for further replies.

jamesrav

Programmer
Sep 17, 2017
29
MX
I have about 90,000 dbf files in a single folder. They contain from 0 to 100 records, but generally 5-20. I have a .prg file that cycles thru all the tables and does some processing based on contents. The problem/puzzle is that sometimes this .prg file can go thru all 90,000 in about 1 minute, and other times it can take 15 minutes. No change to the .prg at all. I've checked the task manager and it's not the case that the CPU is occupied by other processes when the .prg is functioning slowly. It just seems that VFP in some situations 'knows' about the tables and can move thru them quickly, and in other cases it is seeing the data for the first time (and I should add that I cycle thru the 90,000 multiple times with different criteria, and when it's operating well, the 1 minute time is consistent , and when slow, it's 15 minutes every time thru - never improves). So is this a VFP issue or Windows in general? I've tried adding some Sys() commands at the start (based on the Help file) but it does not improve things.
 
Mike Yearwood, you're right, to be fair inserting the filenames Syys(2000) returns would add to its time.

For the case of using all dbfs of a directory you won't need that, though.

It's easy enough to deliver these times for comparison:
Sys(2000) filling an array or cursor is fast in any case. About a second.

AdirEx fails with more than 65000 files. If I do
Code:
u=Seconds()
ON ERROR v=Seconds()
nFiles2 = ADirEx("adirex",'*.dbf',0,ADIREX_DEST_ARRAY)
? v-u
I get about 3.5 seconds for the first 65000 files.
The FLL seems to set a limit that actually doesn't exist any more in VFP9. The array has 65000 rows and 7 columns, though. Previous versions only allowed 65000 elements all in all, so only about 9300 files, if I remember that correctly.

I don't know where and how you made bad experiences with Sys(2000), but I used it quite often, when I had to do directory stuff and the file name was all I needed. If you want to be fair in respect of also adding other fileinfo you get from Adir or AdirEx, I'd rather go for WinAPI fully, as that's where you'll end up getting all the other file infos.

To be cleear, jamesrav this part is really only for getting the file list, not even opening any DBF yet. Adir is really only usable for normal directories, not such data repositories. All this actually poses another question: Are all these files necessary? Couldn't there be a process cumulating the data that gets there when it gets there. So pointing back to Mike Yearwoods notification idea.


Chriss
 
There's a couple ways I can approach this. I haven't heard from Christian Ehlscheid yet. I'd kind of like to get his input on how he'd like to have it done.

I'll go ahead and implement a stack-based approach for recursion due to the way he has it coded.

--
Rick C. Hodgin
 
I have it implemented. Christian E. wrote it in Visual Studio 2003, so I left it in that version. I can recompile with a more modern version of Visual Studio if required.

The function is called ADirREx(). It works identically to ADirEx(), but will recurse folders and returns the FULLPATH() of each entry. It does not convey explicit single entries for folders traversed, but rather includes them in the files it encounters. If no files are encountered, that folder will not be included. The single entries and empty folders can be added if needed.

Please see attached. The ZIP contains the FLL plus the source code I used for this change (in case something needs tweaked and I get hit by a bus).

--
Rick C. Hodgin
 
 https://files.engineering.com/getfile.aspx?folder=e857c5fb-402c-4b9b-b750-696a88099b73&file=vfp2c32__with_adirrex.zip
Rick, nice.

While you're at it, can you find where AdirEx limits results to 65000 rows? And can you remove that limit?

And, of course, it would be nice to have this in the main branch of Christian Ehlscheid's vfp2c32 project, but you can always clone and start your own fork. The VFPX shared software license governs that and more: and you find Christian pointing to it in source code if you search license in his repository. As parts of it are taken from others GNU and other licenses also apply to these parts, so look into all search results to get the details.


Chriss
 
Will look at the 65K row constraint.

Also, there's a bug in my ADirREx() function. It will only return FULLPATH() to an array. Will fix that today.

--
Rick C. Hodgin
 
What exactly is the 65K constraint? Is it a limit on the number of rows in an array? I see there is a limit of 65K on the number of array objects, but I'm not sure what that means. According to the VFP Help, the "normal" limit is 2 GB.

Could someone perhaps clarify this.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
In Christian's code, in vfp2ccppaip.h, line 1959, it says:
Code:
assert(nRows + m_Loc.l_sub1 <= VFP_MAX_ARRAY_ROWS); // LCK only supports array's up to 65000 rows

The m_Loc is a Locator structure provided by VFP in pro_ext.h, line 399:
Code:
// A reference to a database or memory variable
typedef struct {
    char        l_type;
    SHORT       l_where;        /* Database number or -1 for memory     */
    USHORT      l_NTI,          /* Variable name table offset           */
                l_offset,       /* Index into database                  */
                l_subs,         /* # subscripts specified 0 <= x <= 2   */
                l_sub1, l_sub2; /* subscript integral values            */      // these are not changed to 4 bytes so FLL compatibility is maintaines
} Locator;

Which uses USHORT values there, which are limited to 16-bits. The 65,000 limit is below the maximum allowed 65,535 for a 16-bit limit, so it may just be an arbitrary cutoff inside of VFP since its system capacity constraints mention 65K.

--
Rick C. Hodgin
 
I see, this is in the LCK pro_ext.h

So in the VFP9 runtime it is changed to 4 bytes, in the pro_ext.h declaration it wasn't changed to let this be compatible with earlier FLLs. That means you could change it to 4 bytes because you don't need to maintain compatibility with earlier FLLs. I wonder if that's enough, because we only have the object files ocxapi.lib and winapims.lib, not their sources.

Edit: I think you could still get to longer arrays up to 65000*65000 elements, because in the end not the 2 subs matter, but l_sub1*columns+l_sub2.
That means if you give the array 14 instead of 7 columns, we could get there and as a last step the array just needs to be redefined to 7 columns while doubling the rows.

Chriss
 
You can overcome the limitation in the current design by populating into a cursor, or using a callback. Those won't suffer from a sizing cap.

--
Rick C Hodgin
 
That's true. Just for measuring the adirex performance it would be a less useful change than adding recursion. But its good to keep in mind the struct for (array) variables in the LCK is backwards compatible and within VFP9 that differs. That would also make binary files created by SAVE TO incompatible from earlier versions. I guess they are.

Chriss
 
I've contacted Christian about what code changes to make. I didn't want to break any existing functionality. When he gets back to me I'll tweak it.

Until then, ADirREx() lives. :)

--
Rick C. Hodgin
 
Fullpath fix?

If I work with ADIR and CD into a path, then use *.* or *.pdf or whatever file skeleton, the code using the result array is actually prepared to have just the filenames without path. I see it's a nice solution for recursive usage, but not necessary for every use. I would add a mode parameter that works with bit flags 1-recurse, 2-full path, so 3 is recurse and full path and 0 is standard ADIR behaviour, 1=ecurse without a full path should obviously return relative paths. I didn't look how you implemented it, so sorry if that or a similar solution already is the case but that would be my idea of extending it.



Chriss
 
Rick C. Hodgin said:
The function is called ADirREx(). It works identically to ADirEx(), but will recurse folders and returns the FULLPATH() of each entry.

It only uses FULLPATH() when you use the recursive version.

I don't know if Christian Ehlscheid is still around or not, but before I make any changes to his existing function I will wait to hear from him. This recursive version is separate and standalone and conveys the recursive nature in its function name. That is sufficient for now.

In addition, the source code has been provided, so Christian or anyone else can tweak it as needed. See the full project forked on GitHub.

--
Rick C. Hodgin
 
Rick, I'm not against your approach of a separate new function, but I think Mike has a point people realize when they will need to use AT, ATC, AT_C, or ATCC for different scenarios. Your thought is to keep up downward compatibility, but an additional parameter here could be 0 by default and optional, thus no harm is made and it'd still be a fork Christian could merge into the main branch or not. No harm done.

Chriss
 
It's simple: VFPx is not my code. When / if I hear from Christian and get with him I'll change it to be whatever you guys want in conjunction with his guidance on how he wants his project extended / modified.

In the meantime, the code exists. The changes I made use the same internal engine for both functions. I added an intercept to pass in the extra parameter. With minimal effort the code I wrote can be tweaked to combine all functionality into a single ADirEx() function.

If it's such an issue, take the code and tweak it to be that way.

I would not appreciate someone coming into my project and redefining things I've setup to now be some other way without me knowing about it. How long has VFPx existed? I'm not going to jump in as some Johnny Come Lately and mess with Christian's long-established project. If he's okay with me tweaking ADirEx() I'll do it.

I place a greater emphasis on his POTENTIAL objections than I do on your wants, wishes, and desires, because he's the owner of the code base.

I'm sorry if that offends you, but the reason I readily provided the source code was so someone could take my offering and tweak it to meet their needs. I honestly expect Christian will not accept my pull request, but instead will probably rewrite what I did in some other way for his project that addresses his needs. And that's how it should be. He's the owner. It's his castle.

--
Rick C. Hodgin
 
I'm not offended, I think nobody is, and I hope you're also not offended by our wishes instead of the cheers you might have expected.

I think it's good what you did, you just can do what we want without offending Christian, it'll still be in his hands to merge your fork or keep it a separate thing.

I know how it feels if you do some favor and then get asked for yet another one or even two different ones. Yes, I think I can tweak it as I'd like and I thank you for offering the forked version on GitHub, too.

I just wanted to point out that you actually don't influence Chrsitians design decisions and also it's part of the general VFPX license to allow these forks. You can take all your arguments and just turn them ariund, in that the decision likely would be to extend AdirEx features then having a separate function name. And he can turn it into a new function name of his choosing, if he would want to. It's nice to be that cautious and you gave your reasons, but the way software version control works I'D argue it's better to offer a fork to a non-active project owner the way he likely would want to integrate it. Much words about a little work, I know.

As you're surely annoyed leave it as is now, I can live with it.

Chriss

PS: I even like the pun you madde with AdirRex being the king of all directory functions.
 
Mike, now Rick surely won't fulfill your wish, but alas, you still have AdirRex.

myearwood said:
Get your ego out of the picture.
Actually Rick should get his ego in here, and do the modification with more self esteem after he recreated Foxpro itself already. But doesn't matter to you, I know...

Chriss
 
I defend his right to leave it after he did it wrong. But as I know your stance I know that you don''t appreciate that. As Rick I would now simply make the fork private , so you'd be faced with getting nothing instead of what he's given.

Chriss
 
Attached is a version which removes ADirREx(), and extends ADirEx() to include three new values for nFlags. Add these constants to the vfp2c.h file near the other ADIREX_ contants:

Code:
#DEFINE ADIREX_DEST_ARRAY_RECURSE       0x80
#DEFINE ADIREX_DEST_CURSOR_RECURSE      0x100
#DEFINE ADIREX_DEST_CALLBACK_RECURSE    0x200

When using those, it will recurse. When using the originals, it will behave like normal.

Here's the code change on GitHub.

--
Rick C. Hodgin
 
 https://files.engineering.com/getfile.aspx?folder=dfe94339-28da-4fcd-af2c-b568ea347a0d&file=vfp2c32__with_adirex__recursion.zip
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top