impact of expressions in composite indexes

steve4king · Aug 5, 2013

Maintenance on an old program:
No candidate keys, but the candidate key for this file would be:
tranid char(7) + tranline number(4,0)

I need to add an index to speed a routine that looks for missing rows in a mirror table.

Currently the semi applicable index is:
tranid + DTOS(date)

Just adding an index to "tranline" sped up the process 10x.. but I think there is room for more speed.

I'm curious what the impact to writes(or something else I might be overlooking) might be if I were to add an index as such:
tranid + alltrim(str(line,4,0))

Unfortunately, unlike a true SQL database, I can't just index on both without a cast.. right?

-Stephen

JRB-Bldr · Aug 5, 2013

I need to add an index to speed a routine that looks for missing rows in a mirror table.

Depending on what you want to do when you find the missing records...

Code:

SELECT * ;
   FROM OrigTable ;
   WHERE tranid + str(line,4) NOT IN ;
     (SELECT tranid + str(line,4) AS FindExpr FROM MirrorTable) ;
   INTO CURSOR MissingRecords READWRITE

SELECT MissingRecords
< do whatever >

Good Luck,
JRB-Bldr

Olaf Doschke · Aug 5, 2013

You can index on an expression, as long as the expression is valid, eg you can't index a c(7) + n(4) field, but you can index on tranid+str(tranline,4). You need to cast fields to one type to be able to concatenate, in that sense you need to cast, cast types, if you meant that: Yes. In other cases you could also cast to an overall numeric value, eg you can index on n1*100+n2, if n2 never is higher than 99 and the overall n1*100+n2 does not leave the value range of double float.

You can index on such an expression and the final cdx is the same as a cdx on a single field would be, if that single field would contain the expression values, in short, as if you could have computed fields in VFP (in the way they are n SQL Server) and index on that field instead of an expresssion. There's a tiny hit on the index creation of course, as the expression is evaluated. But otherwise it's not part of the index tree itself, the index nodes are on the final expression values. The expression of course is stored as such to add or change index nodes via evaluating it and of course also for rushmore to find out, if where clauses or joins can be optimised with the index, but the cdx itself is working exactly the same as an index on a single field. As said you can think of it as adding a computed field to the dbf, and indexing on that, but without storing this additional field in the dbf itself.

In regard to SQL Server (a bit off topic):
In reverse, it's quite impossible to do this in SQL Server, but you can index on a list of fields, which is easier and that has it's pros and cons. More pros than cons, I think.

For example once you have an index in VFP on tranid+str(tranline,4), and you have WHERE tranid+str(tranline,4) = .... this is rushmore optimised in VFP, no matter if the length of .... is 11 chars (as the length of the total index expression is) or less. I don't know if this (or a similar T-SQL WHERE clause tranid+convert(tranline...) ) would be optimised in SQL Server, it might or might not make use of the compound index. I'd have to try and show plan to see what's done. I guess it's not using that index, then. But you wouldn't write such a where clause for T-SQL queries, the query strategy for SQL Server rather is to keep your where clauses separate on the separate fields and THEN Sql Server can STIL make use of compound indexes, even if there is no compound expression, so you would use WHERE tranid=... and tranline = .... and SQL Server would make use of a compound index, while you could only optimise this in VFP with two indexes on the single fields and that therefore needs two turns in rushmore optimisation.

I have to admit I didn't dive so deep into query optimisation in SQL Server, but I know indexing on expressions is a VFP exclusive, and it's not only a good one, it makes things including index usage for optimisation more complex than the simpler form of compound indexes other databases offer.

Bye, Olaf.

Olaf Doschke · Aug 5, 2013

once you have the index you can

Code:

SET ORDER TO indextag IN mirrortable
SELECT origtable
SET RELATION TO tranid+str(tranline,4) INTO mirrortable
BROWSE FOR EOF("mirrortable")

Bye, Olaf.

steve4king · Aug 5, 2013

I'm not sure your suggestion really answers my question, as your query might be pretty slow without the index I suggested in my original post.

What I've got is nearly fast enough, and I like giving the user feedback as it goes(if it has the potential to take more than 2 seconds).
The chances of something being missing is pretty rare.
I'm not sure switching the work to a single query rather than in-line is ideal.

Here's what I'm doing on another table that completes in less than a quarter of a second because it has a simple index:

Code:

SCAN
   lnRecNo = RECNO('ctem')
   WAIT WINDOW "Synchronizing: " + PADL(ALLTRIM(STR(lnRecNo,6,0)),7,' ') + " / " + PADR(lcRecCount,7,' ') NOWAIT 
		
   IF ctem.tranid = "SomeStr" OR ctem.invoice = "SomeOtherStr" &&Fairly rare, so better to be slowish here than add additional indexes
      SELECT ctra
      LOCATE FOR invoice = ctem.tranid AND custID = ctem.custID AND date = ctem.date AND time = ctem.time AND total = ctem.total
      lbFound = FOUND()			
   ELSE 
      SELECT ctra
      lbFound = INDEXSEEK(ctem.invoice) &&indexseek rather than seek or locate because I don't need to move the record pointer, so this should be the fastest option.
   ENDIF 
 
   IF !lbFound
      SELECT ctem
      SCATTER NAME oCash
      
      SELECT ctra
      APPEND BLANK 
      GATHER NAME oCash
      RELEASE oCash
      
      lnFixCount = lnFixCount + 1
   ENDIF 
ENDSCAN

ctra will potentially have several million records, while ctem will likely have less than 500.

-Stephen

steve4king · Aug 5, 2013

Sorry, my response was a little slow,(referring to your first) you had already elaborated by the time I posted my response.
Thanks!

-Stephen

Olaf Doschke · Aug 5, 2013

As a side note use INSERT FROM NAME object:

Code:

INSERT INTO ctra FROM Name oCash
* instead of
*	SELECT ctra
*	APPEND BLANK 
*	GATHER NAME oCash
*	RELEASE oCash

There is no need to RELEASE oCash every time, this just takes time. Do this after ENDSCAN.

There still is nothing against using an index expression for the index you use with INDEXSEEK. SQL will be faster than single inserts anyway, if you index. I'd not use the NOT IN subselect, but an INSERT FROM SELECT:

Code:

Create Cursor origdata (tranid C(7), tranline N(4))
Insert Into origdata Values("abc",1)
Insert Into origdata Values("abc",2)

* first mirror operation
Select * From origdata Into Cursor mirrordata Readwrite
Index On tranid+Str(tranline,4) Tag xprime

Insert Into origdata Values("abc",3)
Insert Into origdata Values("xyz",1)

INSERT INTO mirrordata ;
 Select od.* from origdata od ;
 left join mirrordata md ;
 on od.tranid+Str(od.tranline,4) = md.tranid+Str(md.tranline,4);
 where md.tranid is null && where no join was found

Select mirrordata
browse

I also tested JRB-Bldr's query and it's also optimising the join of intemediate results (main and subselect) via the index.

Bye, Olaf.

craigber · Aug 6, 2013

Stepping back a bit.

How so you know exactly which line is slow? Have you tested this? I ask because the LOCATE will be slow unless each column in the LOCATE is indexed individually. Just adding an index on each one could give you what you need.

My recommendation is to use the Coverage Profiler and find out where the slow down is really happening.

One more thing that I didn't see explained. Your suggestion for an index is tranid + alltrim(str(line,4,0)). This would result in a variable length index, which has bad side effects in VFP and should not be used.

Craig Berntson
MCSD, Visual C# MVP,

http://www.craigberntson.com/blog

Olaf Doschke · Aug 6, 2013

Good point about performance logging and index expression length. The simple answer to the question about compound indexes nevertheless sinmply is, you can use them and stay with your INDEXSEEK.

Here's another solution to merge new data into a mirror table:

1. Assume the new data (updated or inserted) is in a cursor curNew eg generated from SELECT * From origdata INTO CURSOR...
2. Assume the dest table is Mirrordata.dbf
3. Assume a simmple primary key field ID.

Code:

Update Mirrordata FROM curNew SET &lcUpdateFieldlist WHERE Mirrordata.ID = curNew.ID
Delete curNew From Mirrordata Where Mirrordata.ID = curNew.ID
* Now all remaining records in curNew are new and can be appended:
Select Mirrordata
Append From Dbf("curNew")

The harder part is what I left out, generating lcUpdateFieldlist, it's not merely a list of fields, but the part of UPDATE SQL setting all fields, eg "field1 = curNew.field1, field2= curNew.field2" etc. etc. I construct this string from AFIELDS() or SET("FIELDS") and then use macro substitution here.

If you're not interested in a merge but only in appending new data, you can also skip the UPDATE and simply DELETE all rows in curNew, which are not new, then APPEND, so in that case the merge is simpler:

Code:

Delete curNew From Mirrordata Where Mirrordata.ID = curNew.ID
Select Mirrordata
Append From Dbf("curNew")

And instead of Mirrordata.ID = curNew.ID you can also reuse curNew.tranid+Str(curNew.tranline,4) = MirrorData.tranid+Str(MirrorData.tranline,4) to make use of the "VFP compound" index expression.

As your table containing new data is much shorter than the destination table, that would always take low time. In the worst case you only scan through the shorter curNew table fully and need to decide which records to delete before appending what remains.

And if your curNew is a table you want unchanged, you can simply query data from there into a cursor, for example all records with a insert/update datetime value higher than the last mirroring time. You can also mirror deletions, if you SET DELETED OFF before querying ID, DELETED() from the original data table and apply all these IDs for a DELETE-SQL on the mirror table.

Well, but that's for the more general case of real mirroring and merging of tables.

Bye, Olaf.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

impact of expressions in composite indexes

steve4king

IS-IT--Management

JRB-Bldr

Programmer

Olaf Doschke

Programmer

Olaf Doschke

Programmer

steve4king

IS-IT--Management

steve4king

IS-IT--Management

Olaf Doschke

Programmer

craigber

Programmer

Olaf Doschke

Programmer

Similar threads

Part and Inventory Search

Sponsor