pack a dbf in shared mode 2

Chris Miller · Feb 12, 2022

Well, at least mostly in shared mode.

An idea to do PACK without exclusive access is to fill the gaps from deleted rows by moving undeleted rows into these gaps.
Eventually all deleted records are at the end of the DBF and it can be truncated. That still needs write access - exclusive access - to be able to adjust the reccount in the dbf header, writing an EOF byte and truncate the file. But these operations only need a short period exclusive access.

Anyway, presenting the first draft of the idea, that uses scatter/gather to move whole records:

Code:

Lparameters tcDBF

If Pcount()=0
   tcDBF = Dbf()
Endif

Local lcDeleted, lcAlias
lcDeleted = Set("Deleted")
lcAlias = Alias()

Local lcSuffix, lnFirstDeleted, lnLastUndeleted
lcSuffix = Sys(2015)

Local lcAliasIsDeleted, lcAliasUnDeleted, ('loRecord'+lcSuffix)
lcAliasIsDeleted = 'Deleted'+lcSuffix
lcAliasUnDeleted = 'UnDeleted'+lcSuffix

Set Deleted Off

* Use the passed in DBF file twice for two record pointers on deleted and undeleted records
Use (tcDBF) In Select(lcAliasIsDeleted) Shared Again Alias (lcAliasIsDeleted)
Use (tcDBF) In Select(lcAliasUnDeleted) Shared Again Alias (lcAliasUnDeleted)
tcDBF = Dbf()

Erase (Getenv("TEMP")+"\IsDeleted.idx")
Erase (Getenv("TEMP")+"\UnDeleted.idx")

Select (lcAliasIsDeleted)
Index On Recno() To (Getenv("TEMP")+"\IsDeleted.idx") For Deleted() Additive
Locate
lnFirstDeleted = Recno()

Select (lcAliasUnDeleted)
Index On Reccount() - Recno() To (Getenv("TEMP")+"\UnDeleted.idx") For Not Deleted() Additive
Locate
lnLastUndeleted = Recno()

* While the first deleted record is before the last undeleted records, move that undeleted record to fill the gap:
Do While lnFirstDeleted<lnLastUndeleted And ;
      Not Deleted(lcAliasUnDeleted) And Deleted(lcAliasIsDeleted) And ;
      Not Eof(lcAliasUnDeleted) And Not Eof(lcAliasIsDeleted)

   * copy undeleted row with scatter/gather
   Scatter Name ('loRecord'+lcSuffix) Memo
   Select (lcAliasIsDeleted)
   Recall && causes next first deleted row to move to the top of IsDeleted.idx
   Gather Name ('loRecord'+lcSuffix) Memo
   Locate && positions on the row that's the first deleted row now.
   lnFirstDeleted = Recno()

   Select (lcAliasUnDeleted)
   Delete && causes next last undeleted row to move to the top of UnDeleted.idx
   Locate && position on the row, that's the last undeleted row now.
   lnLastUndeleted = Recno()
Enddo
Set Order To 0
Set Index To
Set Deleted &lcDeleted

* post process: truncate file to remove all deleted records.
Local lnFile, lnLastByte
lnLastByte = Header()+lnLastUndeleted*Recsize()+1


Use In Select(lcAliasIsDeleted)
Use In Select(lcAliasUnDeleted)
Select 0

Erase (Getenv("TEMP")+"\IsDeleted.idx")
Erase (Getenv("TEMP")+"\UnDeleted.idx")

lnFile = Fopen(tcDBF,1) && buffered write access
If lnFile<0
   * ? "can't truncate DBF"
Else
   * change reccount in DBF header offset 4:
   Fseek(lnFile, 4)
   Fwrite(lnFile, Chr(Bitand(lnLastUndeleted,0xff)),1)
   lnLastUndeleted = Bitrshift(lnLastUndeleted,8)
   Fwrite(lnFile, Chr(Bitand(lnLastUndeleted,0xff)),1)
   lnLastUndeleted = Bitrshift(lnLastUndeleted,8)
   Fwrite(lnFile, Chr(Bitand(lnLastUndeleted,0xff)),1)
   lnLastUndeleted = Bitrshift(lnLastUndeleted,8)
   Fwrite(lnFile, Chr(Bitand(lnLastUndeleted,0xff)),1)
   * change byte after last undeleted record to EOF (1A)
   Fseek(lnFile, lnLastByte)
   Fwrite(lnFile, Chr(0x1A),1)
   * truncate file
   Fchsize(lnFile, lnLastByte)
   Fclose(lnFile)
Endif

If Not lcAlias==""
   Select (lcAlias)
Endif

This is programmed with least side effects in mind. Default path isn't changed to TEMP (would make some code shorter), currently selected workarea and DELETED mode are restored and indexes needed temporary are not generated in the CDX of the DBF but as separate IDX files, which eventually are erased from TEMP.

There are some limitations:
1. With A primary or candidate index copying records will cause violation of the uniqueness of the index. Big bummer, as every table should have a PK index.
2. The record copying doesn't work with autoinc fields. Another big bummer, as autoincs are quite often used for IDs. Could be avoided using other PK types, but limitation 1 still applies.
3. As SCATTER writes a copy of the memo field values, this won't be done by letting the record point to the same offset in the FPT as the source record, it will be done changing the FPT the same way as a REPLACE or UPDATE would, in the worst case causing memo bloat.

Number 1 is the only real problem 2 and 3 are restricting use of autoinc and getting rid of memo bloat can be done afterwards with PACK MEMO, size effects on CDXes are less often problematic, but could indicate using REINDEX. On the positive side: In a DBF with very few deleted rows, the rearranging of records will be much faster than PACK doing a complete dbf/fpt/cdx rewrite.

Limitations 2 and 3 could be overcome by not using SCATTER/GATHER, but instead using low level file operations FREAD and FRWITE to copy records. That leaves CDX and FPT as is. But that comes with another even nastier problem: CDX tags will point to the old record numbers, so indexes become wrong and need a REINDEX.

For sake of completeness the method to move undeleted rows from the end to the front of the DBF causes physical reordering. If that plays a role rethink your programming, tables should be treated as sets of data and physical order shouldn't play a role. And last not least: If the code fails to get a file handle for writing, it can still be considered an optimized table as seeks or locates start from top, so after shared packing even without truncation your LOCATEs and SEEKS should go faster again. Net undeleted data is the same with or without truncation.

And by the way: If you do recall and BLANK DEFAULT AUTOINC instead of append blank or inserts, you can recycle the space of deleted rows, too. In that solution autoincs are also no problem. But you only get to a DBF free of deleted rows in the long run, when finally the space of all deleted rows is reused.

Chriss

ggreen61 · Feb 13, 2022

Chriss--

Thank you for the idea and code. Another idea is to use 'record recycling'. Before you do an insert into the table, first do a SEEK() or LOCATE for a deleted record. If found, then RECALL the record and use GATHER TO repopulate the record values. If not found, then do an INSERT. I think this would be 'safer' than trying to move the records since another user might be accessing the record that you are moving when in shared mode.

Greg

Chris Miller · Feb 14, 2022

Hi Greg,

thanks.

I mention record recycling in my last sentence, too. Instead of GATHER to repopulate you should BLANK DEFAULT AUTOINC the recalled record.

The scenario someone might be working on a record you shift away to fill the gap of a deleted row should be covered by the primary key, that causes his tableupdate() to update the correct record, though it shifted. If record buffering is off, then even more so, I'd also shift all changes done so far. I can imagine some other cases aren't covered, but imagine using it as the finishing move of saving changes. If they involve deleting some records, recycling waits for new records to fill the new gaps, with this you already have an optimized table again. All in all it might not be that necessary and practical, but I remember in some occasions PACK caused problems and this is just a concept demo of doing PACK without file recreation.

The variation without SCATTER/GATHER will need exclusive access again, so that'll be a safe and potentially faster PACK, though it'll require a REINDEX, but on the other hand no PACK MEMO.

Chriss

Chris Miller · Feb 15, 2022

Mike,

yes, that's an idea, thanks. ON one hand I would like this to work generally in the end and the next approach I work on uses FRead/FWrite instead of scatter/gather. Then index violations don't become a problem at all, I'll just need to also update the index leaf nodes with correct record numbers.

And on the other hand filtered indexes would be a problem for developers like you bound to VFP6, too. I know VFP9 has made some changes in optimal usage of indexes on deleted(), including indexes filtered on deleted or not deleted(), but IIRC not generally, only for some operations like optimizing COUNT. I'll give it a go to see what it does, but even in the ideal case it would require VFP9 to not have negative side effects on Rushmore, just to be able to use the alternative PACK.

By the way, another thought that I had about your hint on multi user problems. In the end these operations are like any other shared mode operations. What's really new is the way it moves a primary key to another recno. I might be wrong about how TABLEUPDATE would use the PK, the cursor properties of primary key I thought of only apply to views, not buffered access to DBFs and then there also still is unbuffered DBF access. I'll not push the envelope for shared PACK and will concentrate on fast pack in exclusive access, but without file recreation.

Chriss

Chris Miller · Feb 15, 2022

Mike Yearwood said:
The primary key is not used for Rushmore

Wehn you have a normal index without filter that's your choice. So you only use a filtered primary index to care for uniqueness and a normal for rushmore.
Okay. There are further index types like candidate. Maybe it makes RECALL a vulnerable point instead of SCATTER.

See my newer thread, I have another solution that also circumvents the readonly autoinc problem.

Chriss

Chris Miller · Feb 15, 2022

Yes, uniqueness is the problem to circumvent. And I understand how the filtering helps with that. And I've used another solution, still.

Chriss

Chris Miller · Feb 17, 2022

The REINDEX is necessary in case of my second alternative, not here. You try to fix a problem that's already gone by moving into a totally different direction.
If you want to say I'd just need to stick with this and apply your recommendations to make it wholesome, I just say I'd not provide a shared pack that has prerequisites. I thought I already had said it.

Chriss

Chris Miller · Feb 17, 2022

If you finally would read up you'd know that I announced working on the CDXes without REINDEX. There's a new thread.
Also, I was talking about prerequisites about the DBF to be packed, there are none now, an alternative would ideally be an alternative to any DBF, not just special cases.

Chriss

Chris Miller · Feb 18, 2022

Mike Yearwood said:
I am not interested in

Then in the sense of what you wrote yourself and what Mike Lewis suggested: If you're not interested, perhaps just ignore it?

Calvin Hsia has given the necessary toolset to understand CDX tags and it can be used to modify them, too. Especially as I'm only touching the leaf node recno values. I thought you're all up for anything that can improve performance, but then that's that. It's simply your choice of ignoring all this, or take it and continue along your own ideas.

You also ignore the concerns I try to address aside of the bad performance of recreating the file triple (in the most general case) instead of manipulating the existing files. The complications that arose since Vista and which may have ceased to exist by now, but which I think can still bite you, even considering the usual recomendations of excluding file extensions dbf/cdx/fpt and/or data directories from virus scans.

Chriss

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

pack a dbf in shared mode 2

Chris Miller

Programmer

ggreen61

Programmer

Chris Miller

Programmer

Chris Miller

Programmer

Chris Miller

Programmer

Chris Miller

Programmer

Chris Miller

Programmer

Chris Miller

Programmer

Chris Miller

Programmer

Similar threads

Part and Inventory Search

Sponsor