Index doesn't match data

MgtHargreaves · Dec 6, 2021

I think it is fair to say that in this situation, it would be expected that the index needs to be recreated. However, we have a situation where the index is actually correct, but the data is wrong. It came to light because the customer ran a routine which did a pack, and therefore the index was recreated. Because of the duplicate records, the index was deleted.

Looking in the data (not sorted), we have 3 records with identical information in a field which is the primary key :

These 3 records are duplicated - every single field has exactly the same information.

Reading the data using the primary key index, we get :

Is it possible to find the actual value of the field in the index ?

Thank you
Margaret

Mike Lewis · Dec 6, 2021

Margaret,

I'm not sure I understand why this is a problem. Is there any reason that you can't delete two of the three duplicated records and then recreate the index?

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads

MgtHargreaves · Dec 6, 2021

Hi Mike

Yes, that is what we did. However, these duplicate records have overwritten 2 other records, and I need to know what was in those records.

Margaret

GriffMG · Dec 6, 2021

I don't think you can recover the data from a deleted record once a table has been packed.

Short of a back-up.

Regards

Griff
Keep [Smile]ing

There are 10 kinds of people in the world, those who understand binary and those who don't.

I'm trying to cut down on the use of shrieks (exclamation marks), I'm told they are !good for you.

MgtHargreaves · Dec 6, 2021

I have a backup of the data from BEFORE this all went wrong. So, this backup has the duplicate records - I just need to know what was in the index which is pointing to these duplicate records.

Margaret

Mike Lewis · Dec 6, 2021

So, are you saying that the index might still contain the reg numbers of the two overwritten records? And, if so, you could use those values to figure out which records were overwritten?

I don't know any way of finding that out within VFP. The best I can suggest is that you try to hack the CDX file itself.

Open the CDX in a hex editor (I use HxD from Maël Hörz, but there are others). Search for the index expression (in this case, it might be [tt]Reg_No[/tt] or perhaps[tt] UPPER(Reg_No)[/tt]). Immediately after that string, you should see a lot of binary zeroes, followed by the actual reg numbers.

If possible, copy those numbers out of the hex editor and place them in a text file (HxD lets you do this). It should then be possible to look for the two reg numbers that are missing from the actual data. You could do that by getting the data from the text file into a cursor (within VFP), and using [tt]SELECT ... NOT IN (...[/tt] to find the values in question.

I know this sounds a bit complicated, but I can't think of any other solution. Give it a try and come back if you get stuck.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads

MgtHargreaves · Dec 6, 2021

Hi Mike

Yes, I believe the index does contain the reg numbers of the two overwritten records, simply because of this (reading the file using the index) :

and that fact that I have identified which 2 records are missing using other data supplied by the customer.

What I was hoping to do, was put some code into the start up routine of the systen, which could check the value in the reg_no field in the data against the reg_no field in the index. Then if they are not the same, clearly we have a problem. This problem has only happened 4 times in 20 years and for just 2 of our 160 customers - so not a big issue. However, the customer is not happy, and we are trying to come up with a way of identifying when this happens.

I had thought of looking in the cdx file using a hex editor, but hoped there might be a simpler way to do it code.

Thank you

Margaret

Mike Lewis · Dec 6, 2021

What I was hoping to do, was put some code into the start up routine of the systen, which could check the value in the reg_no field in the data against the reg_no field in the index.

No, I don't know any way of doing that.

How about saving a snapshot of the reg numbers in a DBF: just the one field, in a free table. You would update it every time you insert or delete a record in the main table, or change the reg no field. Then, at start up, you can look for values in this table that are not in the main DBF, thus indicating that the problem exists.

In effect, what you would be doing here is creating a shadow copy of the index, but in the form a DBF, which is much easier to search and compare.

Just a thought.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads

Chris Miller · Dec 6, 2021

Calvin Hsia once wrote an article of (re)creating data from an index.
But in an index about the reg_no you'll only find the reg_no, other fields would need to be recovered from other indexes.

If you have backups, then you likely also have earlier backups, I would identify changes between backups until you find the ones related to the reg_nos that were overwritten.

The screenshot you post shows a double value. That's in the data. does SEEK point to a record with a different value than the value seekd? Then you'd have a case the index still knows an old reg_no and points to a wrong recno (because that's what all index nodes, primary or unique or normal, point to in the end).

Your customer has no reason to be "unhappy" because backups are the state of art of recovering from such data losses and if he's not willing to go to a restore because that puts data back a few days, then he's not getting what you can do with restore and current data in sum.

I don't even find Calvin Hsias blog article, he may have purged a lot of them related to VFP, I don't know. But it was merely meant as an education about the index structure, not as aninstrument to read data from an index.

What you can do without knowing about the index structure is seeking for guessed values and position to the recno the index knows. But in a packed table that's not telling you this record is overwritten, this record number once had that reg_no, but the record number it had was before packing it. You'd need the combination of older data and an index that's in sync with that dbf. If that still would point to a record with a different value than you seek, then this is an indicator of a broken index, or more likely an index that's intact but out of sync for any reasson.

Is there any use of an unofficial ODBC driver? Or a driver only valid for old dbf formats? If the customer uses such a thing, he's causing such index outliers and that's what has to stop.

Chriss

MgtHargreaves · Dec 6, 2021

Hi Chris

Just knowing the Reg_no (from the index) would be enough, because the customer has paper records to find out the rest of the information. That is the reason I was interested to know whether there was some 'easy' way of getting that information from the index in a routine which could be run frequently.

Unfortunately this issue happened in August 2021, but has only just come to light because a routine was run which does a pack - thereby recreating the index.

I am fairly certain - because of the records before and after these 'duplicates' (they are added in identifiable batches), that the problem happened when the records were being written to the file, rather than at a different time.

Margaret

Chris Miller · Dec 6, 2021

the problem happened when the records were being written to the file

If that's the case how should the index have the right reg_nos if they never were stored? If you wouldwant to recover info from the index, it needs to have been in there at first and data changed later without changing index, so data could have survived in the index. There's no logical way the index would have a different value than the dbf, it always first has to enter the dbf before it is also added as index node. And differences in data could only happen when the files are not kept in sync for some reason (like network quirks).

If you're lucky someone can remember Calvin Hsias blog article, otherwise you could SEEK all values between 21KF02077 and 21KF02080 and see whether that gives a FOUND() at all, then such a value once existed in the dbf. I see some ranges that could be checked like that. It doesn'T matter if the found record does not have the seeked vaue, the found() indicates this value wasin the index and thus in the data.

But again, you can only hope to see such things, if you havethem in a backup of the index, and if you have that you also have a backup of teh data, don't you? It's much easier to see what data you had in earlier backups.

If I really get you right and you say the correct reg_nos never even once entered the data, then there is nothing that could reconstruct it. How should there be knoledge about things never saved to a dbf and its index?

Chriss

Rick C. Hodgin · Dec 6, 2021

Margaret,

One thing we do is to bring our tables into a container and use update, insert, and delete triggers. These will log the changes to each field to a file we call system_logger.dbf.

It slows down data operations, but if you're live and people are doing record entry through an input form, it won't be that noticeable. In addition, we have it call stored procedures which test for the existence of a logical memory variable with a name like "glBypassSystemLogger", and if it's found and is set to .t. we skip the logging process, otherwise we log.

In this way, our system_logger.dbf file slowly grows, and keeps a record of the old value, the new value, the date and time changed, who changed it, and if there's an internal process number for whatever is running presently it logs that too.

Depending on the system it might be overkill to have something like that. But, our systems process financial data and we keep full logs of everything, including user actions on forms, etc. It's saved us many times in reconstructing how something happened, or why some data was the way it was, etc.

--
Rick C. Hodgin

Chris Miller · Dec 6, 2021

That's very good advice, Rick.

And happy to see you back, by the way.

So Margret, if you want to be safer against such data corruptions or losses don't look for ways of retrieving data from indexes or tools for fixing corrupt tables. I know there always is the pressure of having most recent data.

But in this case you're saying the data itself already has some age, it just recently got corrupted.

What Rick suggests is also known by the name of audit trail and there are ready to use stored procedures for it. But it requires a dbc, as only that allows to react to any kind of changes, be it manually in browse by you the developer or an admin or by any code done from the command window, it's the only way to have that in stored procs and only a DBC offers that.

I'm sure right now you're still more interested to get back data from what you have. I'm still awaiting the answer what you think you can find in the index but not the dbf. If I take your title into account you already realized the index doesn't match data, so are records missing in it? Then you surely won't find more in the index you also don't find in dbfs or backups. Do you find a record by seeking a value and the found record contains another value? Is that the mismatch, then you have an outdated index node that points out the value you seek was at that record number in an older version of the dbf, eg before pack.

Let me also point out how to do backups in a way you can sort out errors that don't show up immediately or after a few days only, it's actually the common sense way to do backups with the grandfather, father, son rotation scheme as described in detail here:

https://en.wikipedia.org/wiki/Backup_rotation_scheme

I don't know what you're doing, but I've seen to many only having a backup for the last week, that's actually asking for disaster.

Chriss

MgtHargreaves · Dec 7, 2021

Thank you everyone for your input.

The facts:
[ul]
[li]The problem happened 25th Aug but only came to light last week, because the customer ran a routine which did a 'pack'[/li]
[li]The data was written to the database, but then overwritten - both these events happened on the 25th Aug[/li]
[li]Doing a SEEK on the 5 possible missing records does show that 3 don't exist, but 21KF02079 and 21KF02080 (the 2 I had previously identified as the missing records) point to the 2 'duplicate' records - proving that the data was in the database, but has been corrupted.[/li]
[/ul]

We have specialist vertical market software used by 160 customers - some multi user but most single user. This problem has only happened 4 times in the last 20 years with 2 customers (both multi-user, and 3 of the occurences by the same customer). I believe that the issue happened when 2 users were inputting at the same time. Clearly multiple users inputting at the same time is a common occurence, so something else must have happened as well - as Chris said in an earlier post 'Like a network quirk'.

Backups are carried out by the customer. We recommend daily backups, but can't force people to do them. In any case - it is unlikely that if backups were carried out daily, the backup from over 3 months ago would still be available.

The reason for this post was to find out if there was a way (in code) to pick up the value of Reg_no from the Index, so that we could put some checks into the system to identify when problems happened immediately, rather than 3 months later. This would then also give us the value of reg_no for the missing records - so that the customer could reinput the data for those missing records. Mike has already indicated that this is not possible. However, I do like Rick's suggestion of an audit trail, and we will look to implement that.

Thank you again for all your input.

Margaret

Mike Lewis · Dec 7, 2021

The reason for this post was to find out if there was a way (in code) to pick up the value of Reg_no from the Index ... Mike has already indicated that this is not possible.

Well, to be exact, I said that VFP itself does not provide a way to do that. But it would still be open to you to hack the CDX. Not on an ad hoc basis as I first thought, but rather to write a generic program that would go through a CDX, extracting every value for a given index expression (or tag).

That would meet your needs, but of course you may well decide it was not worth the effort.

However, I do like Rick's suggestion of an audit trail,

So do I. This is what I was suggesting in my previous post in this thread. But Rick improves on the idea by suggesting the use of triggers, which makes good sense. Margaret, I would suggest that you seriously consider this idea.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads

Chris Miller · Dec 7, 2021

Thanks for that feedback. Margret,

that's much more thoroughly than what e get usually.

Regarding backups I showed you how a rotation scheme should be. You wouldn't keep every day from the last 3 months, but you'd have say one day of each of the last 12 months, then one day of the last 12 weeks and the last week every day, for example.

The possibility of an overwrite by parallel editing/entering of data is only possible when you don't have a primary index type (or candidate for old/free DBFs), so that would be another way to prevent that from happening in the first place.

And finally you have understood how seeking a missing value and finding it in the index can point out that value was once in there. At the same time it's a rare occurrence and no good way of finding such overwrites, most of the overwrites will also wipe out the evidence from the CDX, because usually when the index is correctly updated with the DBF, that means the overwritten reg_no value also is removed from the index. It would obviously be bad, if an index would point to any record that has or once had some value, the index only reflects current values, not past ones.

A check you can do at any time is determining gaps in reg_no and seeking their values to see if they are in the index. But I don't think it's worth it, unless you're really having a special interest of finding such index errors. Doing an audit trail is the much better option.

The point is: A index seek of missing values is likely to not reveal all overwritten data. The question why parallel entering could result in such a thing comes up. If you bind to tables directly and create a new record with APPEND BLANK you can be sure only that one user has a record pointer to that recno and another user doing an APPEND BLANK adds another blank record already and points to that, they can't overwrite each other. If you work with local or even remote views, SPT or cursoradapter, then you don't have that benefit, but also a solution using an INSERT to add the newly entered record about some reg_no would not overwrite existing records, INSERTS work even when done massively parallel. You might not be able to do multiple APPEND BLANK at the same time, as that introduces double value. Which is a reason to have automatic values for columns that don't allow duplicates, and that points back to primary/candidate indexes but also a self-determined value like autoinc as an id, so you can ensure multiple users can append blank before they can enter the reg_no. And then get warned when saving and the reg_no is already in the data with another autoinc id.

Chriss

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Index doesn't match data

MgtHargreaves

Programmer

Mike Lewis

Programmer

MgtHargreaves

Programmer

GriffMG

Programmer

MgtHargreaves

Programmer

Mike Lewis

Programmer

MgtHargreaves

Programmer

Mike Lewis

Programmer

Chris Miller

Programmer

MgtHargreaves

Programmer

Chris Miller

Programmer

Rick C. Hodgin

Programmer

Chris Miller

Programmer

MgtHargreaves

Programmer

Mike Lewis

Programmer

Chris Miller

Programmer

Similar threads

Part and Inventory Search

Sponsor