Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Indexing large DBF file > 3500 meg

Status
Not open for further replies.
Mar 20, 2006
589
0
0
US
When some of my DBF's grow over 350 meg with over 3->4 millions records I see problems creating indexes for them. Has anyone encountered any of this type of anomalous behavior with large DBF's & their indexes. I am running 5.2e in an NT environment. Generally everything works fine but I will see various index issues with the largest of the files I create.

Thanks
 
It's rather crucial info missing here: What RDD are you using?
a) DO NOT, repeat, DO NOT use DBFNTX fot this kind of database sizes
b) If possible use either DBFCDX, SIx or Comix drivers using CDX indexes or their native index types.

If speed is a problem, switch to xHarbour ( using their superb-speed DBFCDX implementation.

HTH
TonHu
 
TonHu, Thanks for the reply, the RDD is DBFNTX. Nothing is missing perse, we process in over 500 subdirectories and I am creating some global processes that pull all of the data into one set of DBF tables. We have trouble creating NTX's for the biggest table and I was looking for a work around. In the short term the data is going to get pushed to a web app but I still need to run various reports against the combined tables. I'll try switching the RDD to DBFCDX as we create these tables and access the data for the reports.
What I see is the index operation runs thru or it hangs, it it runs thru the index is functional.

Is there an upper limit on size or record count for the DBFNTX RDD or is just something that can occur at some point as the table grows? Also does the DBFCDX size or records number constraints? One last thought does the xHarbour have any advantages over CLIPPER as a straight reporting tool, piping data to EXCEL? Stuff like that.

Thanks,
 
Hi, JCreamerII

I agree completely with TonHu.

However, to answer your question, the problem is not with the absolute number of records but rather seems to have to do with memory/tempfile bugs in the DBFNTX index creation. It depends how complex the indexes are and how long the keys are rather than the actual number of records.

As a quick workaround you can try:

- Making the index creation a standalone compiled module doing nothing lese but creating the indexes to reduce the code footprint and allow max memory for indexing work space

- Be sure to use the E0 option in the set clipper envar because otherwise the problem is worse.

xHarbour is better at NTX indexing, and much quicker, so at least you will find your indexes are no good that much quicker (grin), but CDX are more reliable and also reduce overhead in both environments.

I have also found xHarbour to be at least 10 times quicker in import and export operations with delimited files (append from and copy to). In some cases closer to 50 times faster (converting tabs to commas then importing).

Jock

Jock
 
Jock, thanks for the reply, I had already gone to a standalone index routine and that didn't help. The index key was a compound DTOS date field & a 10 char ID field. I switched to an alternate 6 char ID and got it to work, but it isn't as clean programming wise. I do think that it is a function of records * key size, because the index functions fine on more limited # of records. I like the sound of the CDX indexes so I think will switch to that RDD and see if that allows us to continue to function as is. Haven't looked at xHarbour yet, were moving to a WEB based platform in the future, I have a couple of more years left with this legacy DBF structure before I can migrate this app to a the WEB based solution.

I am just trying to get a better understanding as to why some of this stuff reacts the way it does. It's just harder to plan an attack on a project if I don't know how the program is going to react when completed.

Thanks to you & TonHu for your thoughts,
 
Hm, the vanilla Clipper 5.x DBFNTX RDD is, in one word, CRAP. (Told this often before)
This goes for quality, performance and stability. With this kind of filesizes, do not be surprised that after deleting 1 record, most of your data seems to have vanished. It's actually still there, but there is a ^Z (0x1A) in the 'Deleted' byte that exists in every record of the DBF. Fix it to a '*' and the record is still deleted, but the rest of the file is again accessible :D
The 'SET CLIPPER=F252;E0' environment setting is required for DBFNTX to even produce a correctly working indexfile. (Even CA confirmed this as being the only working 'solution' to the corruption problem)

As said before: You are far better off, even with Clipper, to switch to DBFCDX.
And like Jock already confirmed: Need speed? go the xHarbour way!

HTH
TonHu
 
TonHu thanks for the specifics. There is zero volatility in the large files I mentioned. They are aggregrate files pull nightly from a number of subdirectories and then the dbf's are passed to another process using VB & SQL to parse the data to a WEB app. I do intend to switch to the DBFCDX RDD ASAP for the reasons you mentioned and to allow me to sleep at night.

Thanks again
 
I'm having simular problems with a history index. it's only 600,000 records. It's writen is clipper summer 87 been working for the last 10 years. It's on a 2003 sever, with cytrix - three office in three cities and about 25 users. When you look at the history screen, on about 100 clients the history is gone, but will reappear with the file is reindexed. At times it will leave off part of the clients history. I notice TonHu mentioned that the clipper setting in the autoexec.bat file should be set clipper=F252;EO. In my autoexecbat file I have
SET CLIPPER=S1
SET CLIPPER=F96;R16:V10
Do I neet to add the F252;EO as a separte line or on the end of one I already have. Does it work for Summer87?
Also do I need to put in the commit command. I have never done this as in most cases I lock the file, append, unlock and close databases, Thanks for your help
 
The clipper emvironment setting should be combined to one line, not two separate, as the last would replace the first. But the E0 setting doesn't apply to S'87 AFAIK, because it has a much older NTX driver wich isn't replacable to start with. It is known to lose indexing from time ot time, but I guess the OPLOCKS problem, running on a Windows server is the real problem you are seeing. Search this forum for OPLOCK and you should be able to find the solution. Note, the advised settings must be applied to both the server _and_ the client PC's, and they have different parameters!

HTH
TonHu
 
emucode I had a similar issue, it ended up being a WINDOWS issue, server & desktop. WINDOWS wasn't writing back every record to disk. It you looked at the dbf in DBASE it didn't see any the extra records, but if you looked at the file from EXCEL you could see the data. The network guys fixed the problem, I'm not certain what they went thru to fix it. We keep this app on a isolated server that doesn't get updated so that I can keep it working 24/7. From time to time these Clipper apps can react differently to changes to the desktop or network evironments. And a Summer '87 is a twenty five years old database with a twenty five year old language.
 
JCreamerII,

That's the OPLOCK problem I tried to describe ;-)

HTH
TonHu
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top