Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Index Advise

Status
Not open for further replies.

Scott24x7

Programmer
Jul 12, 2001
2,826
JP
Hi all,
I was thinking about best practice and Index creation. I remember some big index "Gotcha's" from my Fox 2.6 days, and I'm somewhat skeptical that perhaps, some of them have "gone away" in the later issues.
I'm using VFP9 with all the patches up through and including Sedna.

I noticed in some of my earlier VFP apps that I still have examples of floating around, that for "primary ID" of any major table, I utilized Character data types, but with numeric values. The plus side to this as I recall is, it is then very easy to make compound keys for references into tables like: clientid+contactid (Where these are each 7 digit numbers, and then as an index, they would look like: 10000071000067 (with 1000007 being the client ID and 1000067 being the contact ID).
The upshot to this is, if you just give it the "Client ID" for things like a "DO WHILE" you could make this stupid fast by first doing a SEEK CLIENTID then follow with a DO WHILE CLIENTID.CLIENTID = CONTACT.CLIENTID <blah blah foo foo>.
What was nice about this as I recall is, NO FILTER CONDITIONS.
But... does that matter now in VFP?
I had started using INT values for my primary/candidate key fields. But now I find if I try to do a compound index on that, it doesn't work... Because clientid+contactid now would = 1000074 instead of 10000071000067. On the other hand I can still do a DO WHILE CLIENTID = CONTACT.CLIENTID BUT... now I lose the addition of having them in order. The other thing that I recall being VERY nice about this was I could index on CLIENTID+LASTNAME which would then enable me to keep a "CLIENTID" relationship set with the parent, BUT I could also get the ORDER on the table I wanted without having to use a separate index.
Probably some of you are groaning and thinking what a horrible idea... I'm ok with that, but I'm looking for the "modern technique" than.
My other thought was you convert the INT values at Index time, for something like ALLTRIM(STR(CLIENTID))+ALLTRIM(STR(CONTACTID)) as the compound index. The annoyance there is, you then have to convert all the attempts to seek on those values to STR values as well before you go looking for them. SEEK ALLTRIM(STR(CLIENTID)) for example.
Both seem... messy to me. SO I feel I am probably missing the elegant solution here.
Since my enterprise app is starting to take shape, before it gets too far down the track on one philosophy, I'm very anxious to hear what those of you who have been working in this for yonks now would suggest.


Best Regards,
Scott
ATS, CDCE, CTIA, CTDC

"Everything should be made as simple as possible, and no simpler."[hammer]
 
I think the concensus would be to use numeric.

Personally, in VFP, I use character strings - I don't know why just habit.

The only gotchas for VFP indexes are file size (max 2GB), and index key length which is 200 chars if I remember correctly.



Regards

Griff
Keep [Smile]ing

There are 10 kinds of people in the world, those who understand binary and those who don't.

I'm trying to cut down on the use of shrieks (exclamation marks), I'm told they are not good for you.
 
Scott, here is my EUR 0,02 worth:

Every one of my tables has an autoinc integer primary key, always named ID. I can't tell you how much hassle this saves me. That's partly because I never have to think: is this table's PK CustID or CustNum or CustomerNumber or what? It is always simply ID. It also helps with designing data-related classes. For example, I have a data-aware form that takes a table name and an ID as parameters. The form then displays the specified record (whose ID is passed) in the specified table. I don't have to tell it the name of the ID field. I know this doesn't sound very exciting, but it has helped me out many times.

Regarding compound keys, like ClientID + ContactID - I used to use these a lot, but nowadays I rarely do. I can't give you a good reason for that. Simply, I find it easier to achieve the same goal with a join. Similarly with your DO WHILE example ... well, I tend to do this sort of thing less and less now. I usually find I can achieve the same thing with some basic SQL.

One other small piece of advice: If you do use character keys, keep them short. For example, if you are indexing on last name (because you want reports, etc. to be in last name order), keep in mind that an 8-character key is considerably more efficient than a 16-character key. Now, you're going to tell me that there are a large number of names in your table that exceed eight characters. What I'm saying is that you should weigh the inefficiencies of a longer key against the small imperfections in sort order that will arise from truncating the name in this way.

Just a few thoughts. Feel free to disagree.

Mike



__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
I am with Mike on naming primary key fields so they are easy to remember. However, I name my primary key fields "k_" plus whatever the name of the table is. Like Mike, I then never need to remember what a primary key is for any given table. In addition, I avoid any collisions when I need a secondary key in a child table pointing back to the parent table as I just name the secondary key in the child table the same name as the primary key in the parent table.

As far as field type is concerned, all of my keys are character keys. I prefer them rather than numeric keys. To me, for my uses, character keys are easier to use. But, as Mike noted, one must be careful with the length as there can be a noticeable time delay when they are too long.

Like Mike I seldom use compound keys. It seems to be much easier for me to use SQL whenever I need to extract information from tables. And, if I remember correctly, when I did use compound keys I had issues with sluggish performance.

mmerlinn


Poor people do not hire employees. If you soak the rich, who are you going to work for?

"We've found by experience that people who are careless and sloppy writers are usually also careless and sloppy at thinking and coding. Answering questions for careless and sloppy thinkers is not rewarding." - Eric Raymond
 
Mike & Griff,
Thanks. It's funny but I've been back and forth on it as well. In my "new" application, I had created all primary key index as Int (automatic). I totally get the advantage of this from my previous work both with VFP and with Fox 2.x. One thing I remember from my 2.x days is that fewer indices is good. (i.e. it is better to have one index of "CustomerID+CustLastname" than to have two of CustomerID and CustomerLastName (or CustomerID and CustomerID+CustoemerLastName). But like Mike, I always use ID as well, but with the table name, because I always know what tablename I'm using even in a tabbed form (because I set a property on each page called "ActiveDBF" so then I can never worry, I just macro-sub in the "ActiveDBF.ActiveDBF+'ID'" (you get the idea), and Bob's your uncle.

I also always convert all INDEX on character values to UPPER and the use UPPER versions to seek against.

But... a lot of these issues have been made better/obsolete with VFP. Some of them I know exist, but I haven't the experience to implement them, so I'm being avoidant as I try to keep my initial development cycle short, because until I at least have a working "version 1" we have nothing... and in that case a "broken" or "slightly inefficient" application is better than no application, and we can focus more on those issues later.
I also remember my experience from FoxPro on a 386 machine, where I was able to make some ridiculously fast systems that contained millions of records, so the notion of "the old way" given current hardware (I think my current PC is somewhere along the lines of 17,000x faster than the box I used in 1994 and that's not an exaggeration) is probably ok when dealing with slower indices. I hated having to have routines for generating the next ID values (Which are messy with optimistic buffering, but draconian with pessimistic buffering, as I recall). One of the key ways of doing that was only updating a single index on a table, even if it needed to be 3 compound fields deep (thought the first was ALWAYS unique, the others "attached" were for sort order simplicity elsewhere). When you start getting millions of records, updating 2 or more indices gets ugly.

So another question then, do you use VFP's DATAENVRIONMENT to manage relationships? Or do you do it in your forms where you have other control?


Best Regards,
Scott
ATS, CDCE, CTIA, CTDC

"Everything should be made as simple as possible, and no simpler."[hammer]
 
Some things are slowly coming back to me, so bear with me, as I think I just had an "ahhh" moment.
For example, I have several grids on the main tab, which have the purpose in life of showing what the child/grand-child/great-grand-child (and may get deeper) relationships are and their contents for easy navigation. (Perhaps that now sounds counter-intuitive, but it makes for quick record location of some very complex relationships, sometimes Many:Many.)

So instead of using a control source for the grid, are you doing something like in the init "SELECT <filed1, field2, field3> FROM <sourcetable> WHERE <someID = some value>" in the RecordSource, with RecordSourceType property set to 4?

(Is RECNO() one of the items selected, so you know how to "find" it when it is picked, or you use all the primary key elements you need to find it, if you double click on it, and it takes you to that record?)
Then the Grid is just there to be filled and at the same time you could populate some field every other item .F. and .T. for grid dynamic back color control? (Sorry, I'm jamming about 3 or 4 posts together now, but must of you have been following my groanings on this for a few days. :)

Best Regards,
Scott
ATS, CDCE, CTIA, CTDC

"Everything should be made as simple as possible, and no simpler."[hammer]
 
Three simple answers:

1. Integer fields can be used in compound indexes via BINTOC(intID). Code has to be adapted of course.
2. You can still use SEEK first ID and SCAN REST, I don't know why you think this is not going in the same order as before!? BINTOC(intfield) without any of the further parameters is correctly converting for index usage, an int is converted to 4 chars, so it also does not get longer.
3. You can still make use of partial index expression for relations

You gave some erroneous memories, too: ALLTRIM(STR(ID1))+ALLTRIM(STR(ID2)) eg results in "123" for both ID1=1 and ID2=23 or ID1=12 and ID2=3, you can't accept that collision of the same index expression value for different ID combinations. You rather only would do STR(ID1)+STR(ID2), making use of STR padding to 10 chars, keeping the singel IDs separated as with the solution storing numbers in char(7) fields, which also atimaticalla pads them. It doesn't matter whether left or right. In fact VFP7 had errors in indexes on STR(). Again BINTOC() comes to the rescue.

The general rules a) keep index expressions as short as 120 chars, when wanting to make use of other than MACHINE COLLATION, for rushmore use MACHINE indexes up to 240 chars. In both cases let each field of a compound index have fixed positions, so PAD or make use of fixed length conversions. ALLTRIM is not for INDEX expressions, unless you PAD the alltrim result value.

The new Integer Autoinc has some positive and negative effects, eg it's a read only field making trouble when APPENDing.
I am useing GUID, also in conjunction with MS SQL Server and MySQL. It takes more bytes, obviously, but today that doesn't matter much. It's less easy to filter known IDs, but it's the only disadvantage.

Bye, Olaf





 
In regard of making use of indexes for relations I'm with Mike and Merlin: SQL has become so versatile and rushmore can make use of many indexes to optimize a query, that it doesn't pay to use the schema of SEEK first record and DO WHILE or SCAN REST. You already had a grid relations question, the solution still is fine, I rather am forced to use treeviews in applications. I dislike their development efforts and the need to go from cursors to populating treeview nodes, it needs much more efforts to sync treeview context with updatable cursors than simpler binding of grids and other native VFP controls. That's my varied milage.

I already said even in parent/child/grandchild grid relations I see a disadvantage of the strong coupling relations give you. You don't have exact control on when the relations act when touching which record pointers, so it's too easy to mess with what's displayed, also the other way around the change of record pointers in child and further level data is a rippling effect not necessarily reflected in the UI without REFRESH, and then REFRESH may cause a further rippling effect in relations.

I don't miss the comfort of the grid relations against manually doing the requerying of data in beforerowcolchange/afterrowcolchange. And no, I don't make use of sql query record source type, that's not working good. Grid reconstruction is solved by having a readwrite cursor you ZAP and repopuplate with currently needed data. That way the grid is never unbound and reconstruction is prevented. Grids are no controls to edit data, only display, therefore you never need to bind anything updatable, buffered, you bind queried data, indexed for easy sorting. That needs many of the programming efforts I complained about for treeviews, but for me the full control about what is queried when and where and how independent record pointer movements can be is the advantage outweighing disadvantages to me.

Last not least: Once you start making use of more SQL than xbase data access you gain better portability and you also begin thinking of other UI to represent the data than related grids. Treeviews have their advantages of eg seeing child data of several parents, grids related only show one child group at any time.

Bye, Olaf
 
Olaf,
Thanks for clueing me into BINTOC's advantage. I'm reworking my relationships on the tables now, and partly in response to making my Grids more useable. To that point, I think I see the excellent point you are making about using a Cursor that only has the purpose in life to display small sets of data to your grid. And makes it easy to incorporate that grid display property, because then you don't care... you actually just show grid data.

I think I'm getting closer now, but still not quite there.
So now I create the grid with RecordSourceType set to 4.
And then I add a SELECT statement in the Refresh Method of the Grid?


Best Regards,
Scott
ATS, CDCE, CTIA, CTDC

"Everything should be made as simple as possible, and no simpler."[hammer]
 
No, you make the query in code and use Recordsourcetype Alias (2?). Doing a query as recordsource is having the problem of regenerating the cursor and causing grid reconstruction. It's onyl suitable for a once only initialisation.

Bye, Olaf.
 
Hi Olaf,

your The general rules a) keep index expressions as short as 120 chars Please consider to narrow this down to 99, which is by itselve already a huge number.
If you go beyond 99 you tend to be bitten one day, due to "For a standalone index (.idx) file, the length of an index key must be between 1 and 100 characters"
Better to be on the save side, dont you think?
Rgds,
Jockey(2)
 
AH... I see the wisdom in it now. Just got one of them working in my main form. So need to fix up a few others. I see now as well I need to change how I manage the double-click (as I want it to take me to another tab with that record on it, so I use that (RecordID) search against the table, and it should appear in that tab when it appears.


Best Regards,
Scott
ATS, CDCE, CTIA, CTDC

"Everything should be made as simple as possible, and no simpler."[hammer]
 
The maximum index key length is 220 characters, just shy of the maximum path length, which is a pity



Regards

Griff
Keep [Smile]ing

There are 10 kinds of people in the world, those who understand binary and those who don't.

I'm trying to cut down on the use of shrieks (exclamation marks), I'm told they are not good for you.
 
I have no intention of getting close to that length.
For the moment, I'm combining, at most 3 ID fields all of which are INT and if I use Olaf's suggestion of BINTOC that would keep them all at a maximum (or exact) length of 12 characters. In one other case I have ID+name field, but that is still way below the length.

As mentioned, previous index experience tells me, we want as short and as few as possible from a "write" perspective, especially once our tables start to get big.

Breaking away from persistent relationships now is making use of indexes more effective.

Best Regards,
Scott
ATS, CDCE, CTIA, CTDC

"Everything should be made as simple as possible, and no simpler."[hammer]
 
Indexes of 100 characters? 128 characters? 220 characters?

These are much too long. They will be hugely inefficient. The time it takes to traverse your index, or to insert a new key, increases rapidly with the length of the key. With a large file, and an index key more than about 16 characters, you will notice a drop in performance both in updating and retrieving records.

And why would you need such a huge key? It's hard to imagine a practical application that needs them.

Integer keys are by far the most efficient. But if you do need character keys, keep them as short as possible. (By the way, this isn't a VFP thing; it applies generally to indexed database tables.)

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
I never said 100+ characters. I said 12 and 54...


Best Regards,
Scott
ATS, CDCE, CTIA, CTDC

"Everything should be made as simple as possible, and no simpler."[hammer]
 
I have an application that uses 200 character keys.

It processes all the files in a folder structure, so in practice the 220 limit is actually a pain - because the max path length
is 254 (I think).

So if I want to find an entry in the table, I seek the first 200 characters and skip through all the entries that match... until
I find the exact match (if it exists). As very few filenames have more than 200 (in real life) it is a pretty good compromise.



Regards

Griff
Keep [Smile]ing

There are 10 kinds of people in the world, those who understand binary and those who don't.

I'm trying to cut down on the use of shrieks (exclamation marks), I'm told they are not good for you.
 
Griff,
I feel your pain there.
I have many project directories structures that hit that limitation all the time. I didn't realize VFP's limit was 200... that's an oddly arbitrary number (i.e. not a factor of 2, like 256)

Best Regards,
Scott
ATS, CDCE, CTIA, CTDC

"Everything should be made as simple as possible, and no simpler."[hammer]
 
It is a bit odd.
I *think* there are two critical limits - the key itself is limited to 220, but there is probably a limit on the length
of the expression as well.



Regards

Griff
Keep [Smile]ing

There are 10 kinds of people in the world, those who understand binary and those who don't.

I'm trying to cut down on the use of shrieks (exclamation marks), I'm told they are not good for you.
 
It's either limited by memory, or around the 64k length as I recall for an express, so that shouldn't be an issue.

The one limitation I really hate is the 10 character for index TAG. What a uselessly small length. I don't know why they did that.

Best Regards,
Scott
ATS, CDCE, CTIA, CTDC

"Everything should be made as simple as possible, and no simpler."[hammer]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top