Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Buffering 2

Status
Not open for further replies.

GuitarSlayer

Programmer
Dec 16, 2004
29
0
0
GB
Hi, I wonder if anyone has any advice they can give me on the following scenario:-

Firstly,I am using Visual Foxpro 9, and the task is that I am comparing two data files which have around 276,000 records in each, the dbfs are about 50 Mb each and the cdxs are 14 Mb each.

Table 2 will have latest uptodate data.
Table 1 will be the customers version.

To simplify things, I just scan table 2 and inside that scan try and locate current record in table 1, and update table 1 accordingly.

At the start of the process I have buffering turned on (buffermode 5) for table 1, I tableupdate at the end of the process not after every add, change or delete!

My problem is that the process is very slow, and yes I have checked the tags I am using and done a sys(3054) etc, etc.
(The process is so slow that I back out so I never get to Tableupdate stage!)

However, when I turn buffering off and write directly to the table, (same code) it sails thru! I'll give you the timings I am getting, with Buffering on it takes 2 hours just to process 33,000 recs out of 276,000 with buffering off it takes 36 minutes to do the lot! Huh! I am confused,



 
Hi GuitarSlayer,

The behaviour you are seeing is exactly what I would expect. In fact, I had the same problem myself recently.

When you think about it, the behaviour isn't surprising. With table buffering, VFP has to store every updated record in the buffer. Then, whenever it does the next update, it has to search the buffer to see if the updated record is already there. The more updates you do before you commit the buffer, the longer each one will take.

Basically, this is the price you pay for the ability to revert the updates.

You might get better performance if you use a row-buffering mode, and explicitly call TABLEUPDATE() separately for each (updated) record. On the other hand, the extra overhead of the TABLEUPDATE() might more than cancel the benefit. You'll have to try it and see.

When I had this problem, I tried writing the updates to an intermediate cursor, then, at the point I would have committed the buffer, I scanned the cursor, writing all the data back to the physical table (which was not buffered). However, this was just as slow.

The only other solution I can think of is not to use buffering at all. Ask yourself if the ability to revert the updates is more important than the performance. After all, this is a batch process, so it's not as if you need to give the user an undo mechanism.

Mike


__________________________________
Mike Lewis (Edinburgh, Scotland)

My Visual FoxPro site: www.ml-consult.co.uk
 
Mike, thanx for your comments, I already made my mind up prior to enquiring that I would leave buffering off, but I just wondered whether there was some trick out there somewhere. Again thanx very much.
 
The reason its slow is you are doing a Locate (I am assuming) rather than an Indexseek. Locate will start at the top and if not found will go to the end of the table. thats what is slowing the process down.
=sys(3050,1,<<half the memory on your machine>>)
select table2
Scan
cindexkey = table2.indexkey
select table1
if indexseek(cindexkey)
=indexseek(cindexkey,.t.)
**** do something
endif
select table2
endscan
=tableupdate(1,"table1")

This process for 500,000 records takes about 10 minutes if Not less.
Indexseek, if the record is not found will not move the cursor i.e. go to eof, the .T. flag will move the pointer to the found record. Without table buffering the process will speed up by about 3-5 minutes. In a program there is No need for buffering.
 
Imaginecorp,

The reason its slow is you are doing a Locate (I am assuming) rather than an Indexseek. Locate will start at the top and if not found will go to the end of the table.

I disagree. LOCATE only behaves that way if it cannot be Rushmore-optimised. If the expression matches the index tags (which GuitarSlayer says was the case), LOCATE will be approximately as fast as SEEK and INDEXSEEK.

Also, GuitarSlayer pointed out that, by removing the buffering, the problem was solved.

In a program there is No need for buffering.

That's rather a sweeping statement, don't you think?

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

My Visual FoxPro site: www.ml-consult.co.uk
 
Mike: with or without Rushmore, do you agree that if the record is not found, locate or Seek will move the Pointer to the End of File (EOF())? As opposed to an Indexseek, So everytime a record is not found, The whole file has to be read. This slows the process down
As you are aware, I am sure, Rushmore will only kick in for optomizable expressions and requires applicable indexes. If not it will build temporary ones. That also contributes to the slowdown.
In a program as opposed to a Form, where there is no user input, thus the programmer is in total control, there is absolutely no need to buffer tables. Unless the programmer is worried about server timeouts then Begin & End Transcations is the candidate.
276,000 records - 36 minutes is too long.
 
Imaginecorp,

do you agree that if the record is not found, locate or Seek will move the Pointer to the End of File (EOF())? As opposed to an Indexseek, So everytime a record is not found, The whole file has to be read.

No, I don't see that. I agree that the record pointer goes to end of file, but that certainly doesn't require the whole file to be read (though I agree that that will happen if you do a non-optimised LOCATE, but that's not the case here).

Rushmore will only kick in for optomizable expressions and requires applicable indexes.

But the whole point of GuitarSlayer's question was the he did have applicable indexes.

In a program as opposed to a Form, where there is no user input, thus the programmer is in total control, there is absolutely no need to buffer tables.

I see now. You are talking about a program as opposed to a form. That's a bit clearer. (I would have said a batch process as opposed to an interactive one.)

In that case, I tend to agree that there might be an argument for not buffering within a batch process. In fact, I am specifying a program today that includes a very lengthy batch process. I am specifically stipulating that buffering will not be used, for exactly the reasons that I suggested for GuitarSlayer.

However, it is often easier to use the same generic routines regardless of whether a give operation is batch or interactive. In my own class library, I have a data manager that does everything I need to do when updating any kind of data -- including checking for multi-user conflicts, logging the update to an audit trail, and a lot more. The data manager assumes that the tables are buffered. Although I might get a bit more performance if I didn't use it in a batch process, I would have to balance that against the extra costs of performing those functions by some other means.

Mike



__________________________________
Mike Lewis (Edinburgh, Scotland)

My Visual FoxPro site: www.ml-consult.co.uk
 
Buffering in your network card can also create similar problems.
 
Cricket,

Buffering in your network card can also create similar problems.

That's interesting. Can you explain what you mean. I didn't know it was possible to buffer a network card.

Mike


__________________________________
Mike Lewis (Edinburgh, Scotland)

My Visual FoxPro site: www.ml-consult.co.uk
 
I should have been more precise. Buffering will be done by the network client, and this should be switched off for maximum data integrity, but new Giga cards do have significant buffering capability.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top