Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Help speeding up an SQL query over a network

Status
Not open for further replies.

Luiz Eduh

Technical User
Jan 10, 2012
51
US
Hello all,

Everyday I'm learning and getting better at SQL, with the help of some folks here at the forum I have made good improvements.
I just need some help, the code below works but takes a bit to get the data from 2 tables that are sitting on a server in a network. Can you please glance at my code and let me know if there's a better way to re-write this that can help to improve the time it takes to get the data?

These are my tables;
Table1 fields:
findcode, itemnum, skunum, descript, shipqty, ordqty, price, trannum, transferrable

Table2 fields:
pur_type, trannum, pur_date

SQL:
SELECT table1.findcode, table1.itemnum, table1.skunum, table1.descript,;
SUM(IIF(pur_type=LOWER('i'), table1.shipqty*table1.price, 0.00)) as Price,;
SUM(IIF(pur_type=LOWER('i'), table1.ordqty, 0)) as Qty_ordered,;
SUM(IIF(pur_type=LOWER('i'), table1.shipqty, 0)) as Qty_Shipped,;
SUM(IIF(pur_type=LOWER('i'), table1.ordqty-table1.shipqty, 0)) as short_qty,;
SUM(IIF(pur_type=LOWER('i'), table1.price*(table1.ordqty-table1.shipqty), 0.00)) as Loss,; 
SUM(IIF(pur_type=LOWER('c'), table1.shipqty, 0)) as cmQty,;
Sum(IIF(table2.pur_type=lower('c'), table1.shipqty*table1.price, 0.00)) as credit;
from z:\foxqp\table1.dbf;
inner JOIN z:\foxqp\table2.dbf;
ON table2.trannum = table1.trannum;
group BY table1.findcode, table1.itemnum, table1.descript, table1.skunum;
WHERE table2.pur_date >= DATE(2016,08,01) AND table2.pur_date <= DATE(2016,08,31);
AND table2.pur_type in('i','c') AND (table1.findcode is NOT null AND table.findcode <> ' ');
AND table1.transferrable = 'Y';
INTO CURSOR temp5

Thank you
 
The first thing that stands out for me is your use of LOWER(). It has to be called 6 times for each record which will slow things down some. Maybe not a lot, but it could. Since the characters are already lower case, use == instead. As in:
pur_type == 'i' instead of pur_type=LOWER('i')
Double equal sign will only be true if it's an exact match.


-Dave Summers-
[cheers]
Even more Fox stuff at:
 
Thank you DSummZZZ,

I made those changes and will test to see if there's a difference, now let me ask you since im new at this, I have been reading that creating indexes can improve this too. I have no Idea what's an index or how to even create one. Do you think i might need this to speed up the query? If I were to create indexes for my query above, what columns should be indexed?
 
Indeed indexes are the thing to know for improving query performance. The whole concept of rushmore optimizations is about picking indexes to accelerate queries. But they have to exist. You know what you query and thus you know what to index. Rushmore seldom creates temp indexes, when you have a very complex query with subqueries and it pays to create a temp index most probably also only on an intermediate result. Such indexes never are done permanently, rushmore doesn't work for you in that sense of automatically creating indexes.

What sticks out here is you want to sum data of a month, so for sake of sanity of humanity put an index on the pur_date column, if you don't have one: USE thetable, INDEX on pur_date TAG pur_date. This only needs to be done once, the index is maintained automatically from that point on forward. At some point it may be a good idea to reindex as maintainance task, but you don't index before each query, that would slow things down. Indexes are a part of the table you define with the table itself, they are permanent like the fields of the table are. And they may change or be added to, with new versions of the software with new features, just like you add fields and do new queries. I've seen people indexing right before queries and that's not the concept of indexes, that's quite like creating the table and appending all data to it before each query. So once more stressed out: Indexes are created once only, in that moment the index tag creationg iterates/scans all rows and builds up a binary tree structure in a CDX file. That is then changed, whenever you INSERT/APPEND/UPDATE/REPLACE so there is no need to INDEX ever again.

Bye, Olaf.
 
Since the characters are already lower case, use == instead. As in:
pur_type == 'i' instead of pur_type=LOWER('i')
Double equal sign will only be true if it's an exact match.

I think Dave might have inadvertently given the impression that the double-equals operator compares strings while ignoring case. This is not so. It is not true that[tt] x == y[/tt] is the same as [tt]x = LOWER(y)[/tt]. What the double-equals operator does is to compare strings including trailing spaces.

However, in this case, it's perfectly true that you should remove the LOWER()s, since the characters you are testing against are already lower-case (or maybe you need to put the LOWER()S around pur_type?). But, as Dave rightly says, it won't make a huge difference. Focus instead on indexes.

Mike



__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
Another index making sense in both table1 and 2 is on trannum, as that is the join condition.

An index on transferrable might also help, but that depends on how many Y and Ns are in there. If a very clear minority of data has transferrable ='Y' it pays. Any index, which can limit data very clearly, eg filter it down to less than ie 10% is paying a lot. If you only have values 'Y' and 'N' that's a candidate for a logical fields with .T. or .F., simplifies the condition on Field itself, as that is already either .T. or .F. and doesn't need to be compared with 'Y' or 'N'. Also you can index logical fields with a binary index.

Indexes on key fields you use in joins always also pay, but indexes on data, which already is shrinked down to split portions by main other indexes is seldom helpful. Generally spoken: It doesn't pay to index all columns, that blows up the CDX file and causes slow updates and inserts, so you have to have a balance. Conditions, which need to be checked without looking up an index mean a "full scan", but when data already is limited, this doesn't mean much and when the pur_date index shrinks down read data from millions to below 1000, reading/traversing the index to find 400 of 800 records can take more time than reading in 400 rows too much records to find out they're not needed afterwards, because reading indexes also has a cost. You find the notion to create all indexes for rushmore to be able to "fully" optimize a query, but that is not taking the index usage cost into account, it is always based on the assumption reading indexes does only cost split time.

So finally, when designing indexes ask the question how much an index can help to shrink down the portion of data to query. Why indexes on key values always help is, because you look up a single value in the joined table, and that's found fast via an index, only slow via full scan. If you have date fields in all tables, though, the join lookup only needs to scan the portion of data within the date range. Not all queries will also do a date range limitation, though. Simple cases to think of is querying a simple hierarchy like order/orderdetail/product/inventory or product/orderdetail/order/customer. Whenever you start with just one row and have cascading lookups in child/grandchild and so on tables you lookup key values.

Bye, Olaf.
 
Hi,
Since [highlight #FCE94F]pur_type[/highlight] is a field of table2, and since you have [highlight #FCE94F]AND table2.pur_type in('i','c')[/highlight] in your WHERE clause, you may want to shorten your sql statement as below - without being too invading on the structure of your tables.

Code:
SELECT table1.findcode, table1.itemnum, table1.skunum, table1.descript,;
SUM(IIF(pur_type='i', table1.shipqty*table1.price, 0.00)) as Price,;
SUM(IIF(pur_type='i', table1.ordqty, 0)) as Qty_ordered,;
SUM(IIF(pur_type='i', table1.shipqty, 0)) as Qty_Shipped,;
SUM(IIF(pur_type='i', table1.ordqty-table1.shipqty, 0)) as short_qty,;
SUM(IIF(pur_type='i', table1.price*(table1.ordqty-table1.shipqty), 0.00)) as Loss,; 
SUM(IIF(pur_type='c', table1.shipqty, 0)) as cmQty,;
Sum(IIF(pur_type='c', table1.shipqty*table1.price, 0.00)) as credit;
from z:\foxqp\table1.dbf;
inner JOIN z:\foxqp\table2.dbf;
ON table2.trannum = table1.trannum;
group BY table1.findcode, table1.itemnum, table1.descript, table1.skunum;
WHERE BETWEEN(table2.pur_date,DATE(2016,08,01),DATE(2016,08,31));
AND table2.pur_type IN('i','c') ;
AND NOT (ISNULL(table1.findcode) or EMPTY(table1.findcode));
AND table1.transferrable = 'Y';
INTO CURSOR temp5

Btw you might want to consider to change the field "transferrable" from caracter to logical - unless you have good reasons not to do so.

hth

MK
 
Even more..., since the fields have unique names across the 2 tables

Code:
SELECT findcode, itemnum, skunum, descript,;
SUM(IIF(pur_type='i', shipqty * price, 0.00)) as Price, ;
SUM(IIF(pur_type='i', ordqty, 0)) as Qty_ordered, ;
SUM(IIF(pur_type='i', shipqty, 0)) as Qty_Shipped, ;
SUM(IIF(pur_type='i', ordqty - shipqty, 0)) as short_qty, ;
SUM(IIF(pur_type='i', price * (ordqty - shipqty), 0.00)) as Loss, ; 
SUM(IIF(pur_type='c', shipqty, 0)) as cmQty, ;
SUM(IIF(pur_type='c', shipqty * price, 0.00)) as credit ;
from z:\foxqp\table1.dbf ;
inner JOIN z:\foxqp\table2.dbf ;
ON table2.trannum = table1.trannum ;
group BY findcode, itemnum, descript, skunum ;
WHERE BETWEEN(pur_date, DATE(2016,08,01), DATE(2016,08,31)) ;
AND pur_type IN('i','c') ;
AND NOT (ISNULL(findcode) OR EMPTY(findcode)) ;
AND transferrable = 'Y' ;
INTO CURSOR temp5

hth
MK
 
Using VFP functions instead of SQL options (eg ISNULL(field) vs field IS NULL) is actually NOT recommended. It makes no difference performancewise and it makes a query less portable to other backends.

Bye, Olaf.
 
Thank you all for taking the time to instruct me on this.
 
It seems that the fastest way I get this to run is to copy the tables into a local temp folder, run the query and delete the tables, is this a good practice? I would need to put this in a .prg, something like this:
Code:
STORE '"z:\foxqp\table1.dbf"' TO cSourceFile1
STORE '"c:\tempfolder"' TO cTargetFile1
COPY FILE (cSourceFile1) TO (cTargetFile1)

STORE '"z:\foxqp\table2.dbf"' TO cSourceFile2
STORE '"c:\tempfolder"' TO cTargetFile2
COPY FILE (cSourceFile2) TO (cTargetFile2)

then run the query

SQL:
SELECT findcode, itemnum, skunum, descript,;
SUM(IIF(pur_type='i', shipqty * price, 0.00)) as Price, ;
SUM(IIF(pur_type='i', ordqty, 0)) as Qty_ordered, ;
SUM(IIF(pur_type='i', shipqty, 0)) as Qty_Shipped, ;
SUM(IIF(pur_type='i', ordqty - shipqty, 0)) as short_qty, ;
SUM(IIF(pur_type='i', price * (ordqty - shipqty), 0.00)) as Loss, ; 
SUM(IIF(pur_type='c', shipqty, 0)) as cmQty, ;
SUM(IIF(pur_type='c', shipqty * price, 0.00)) as credit ;
from c:\tempfolder\table1.dbf ;
inner JOIN c:\tempfolder\table2.dbf ;
ON table2.trannum = table1.trannum ;
group BY findcode, itemnum, descript, skunum ;
WHERE BETWEEN(pur_date, DATE(2016,08,01), DATE(2016,08,31)) ;
AND pur_type IN('i','c') ;
AND NOT (ISNULL(findcode) OR EMPTY(findcode)) ;
AND transferrable = 'Y' ;
INTO CURSOR temp5

what else do you think i would need on the .prg? any ideas will be greatly appreciated.
Thank you
 
Copying over data is having the base time of copying all data, that's never faster than creating indexes to optimize queries, because you initially copy all data. No, that's not the way to go. Just imagine what initialisation time you have, if a table grows to 100MB or even 1GB. If your query only processes partial data, it'll not read as much and thus will be faster. If that was not the case centralising tables in a network share would be a broken feature of VFP, a compromise and bad part of VFP, but it isn't. Learn indexing and rushmore optimzation and index your data.

Bye, Olaf.
 
Hi,
If the query is slow, you might also (in addition to what Olaf said) want to check your net speed.
hth
MK
 
Thank you,

Just one final questions...sorry
Im using SYS(3054,1) to see if there's any optimization and im getting the following message;
Rushmore optimization level for table table1: none.
Rushmore optimization level for table table2: none.

Is this a hint of why the query takes too long to get the data out of the tables?
 
Yes. No index is used. Rushmore optimization is based on indexes. When you just copy DBF files you don't have any index.
It seems like you wake up. I'm talking about indexing for the third day already. You have to have indexes to get best query performance. Local drives being faster also help, but mainly you save reading large chunks of the DBF with indexes only.

Bye, Olaf.
 
Here's a small demo of what Rushmore does:

Code:
Close Tables All

Create Cursor crsTest (dDate Date)
For n=-14 to 14 
   Insert into crsTest Values (Date()+n)
EndFor

? "without index"
Sys(3054,2)
Select * From crsTest Where LogRecord() AND dDate Between Date()-7 and Date() into Cursor crsResult1

?
? "with index"
Select crsTest
Index On dDate TAG xDate
Select * From crsTest Where LogRecord() AND dDate Between Date()-7 and Date() into Cursor crsResult2
?

Procedure LogRecord
    ?? Transform(Recno())+","

Its an academic case of only 29 records, but it shows the same query without index is processing/reading all rows, while the (partially) optimized query only reads recnos 8-15.
Both results are the same, both queries run fast as it's a very small table (cursor) anyway, but the index helps VFP to determine rows to read.
If you look closely you also see recno 1 being read in both cases, that's during some initialisation phase of the query where the LogRecord Procedure also is called, though no data is processed and the record pointer is positioned at row 1. Don't ask why, even more so don't ask why the query on the non indexed data shows recno()=1 three times and with index only two times.

The LogRecord procedure is called for all processed rows, it's done first by intention, as VFP wouldn't call it after the BETWEEN clause already is false. Doing LogRecord first shows at which records the query looks. With index the rows 2-7 and 16-29 are not printed, the rushmore optimization found only rows 8-15 are of interest before reading DBF started. So only with indexes you can read less and only have the necessary network traffic in regard of the DBF and FPT. It has costs in reading the index file (CDX), there's no way to make that visible, but it's also only reading a partial tag of the CDX, it's not the full CDX file, which is read. In case of big DBFs the index tags can help very much, to only address the recnos you really need and the cost of reading the CDX (partially) are less than what's saved. Especially if you query a months data from a DBF with years of data, you read less than 10% if not less than 1% of the DBF file.

Bye, Olaf.
 
Just for sake of a clear understanding: You don't CREATE CURSOR or TABLE and you don't INDEX in your applications code, your applications code will merely have the query in it. Indexes have to be created and maintained together with the database, its tables and fields. Like fields indexes are permanent, they are not a temporary part of the table. If for whatever reason you get the idea you put [tt]Index On dDate TAG xDate[/tt] in your application code before executing the query, you a) will get an error about the index tag already existing when executing this the second time and b) even if you ignore that would waste time to create the already existing index over and over.

It's quite clear, once you get used to indexes, but I mention this, because I've seen it too many times, people delete tags or set safety off and create indexes in application code. Even when they are happy with it they don't recognize how much time they waste. I already mentioned it in my earlier postings, but this is so important because it will contradict the performance gain to index everytime you query, indexing reads through the whole table, an index is not something built only for the current row, therefore you separate INDEX ON commands from your application code like you also separate CREATE TABLE and other database definition code from your application, if you have such code at all and not just merely distribute datas files with your EXE.

My code is just doing the indexing once, too, but it doesn't get clear, that from then on every further query on that cursor about dDate values will be optimized by that index, even if data is changed. I choose cursors for the simple reason I'm too lazy to write code to remove dbf files after it runs. So imagine my CREATE CURSOR is rather a CREATE TABLE, then what's done in the real application is deploying a DBF file. And the INDEX ON would go right after CREATE TABLE, even before inserting initial data. I merely put it later to demo the difference of the query without and with index. If you are an older foxer you may know IDX index files. Those are not that comfortable, they get out of sync with DBFs, if INDEX is not SET to them while modifying the DBF data. The same goes for secondary CDX files you could create with file stem name different from the table stem name (the name without file extension). So once table1.dbf has table1.cdx this cdx will be kept in sync with any operation on the DBF files by the VFP runtime, just to be very clear. Of course if you manipulate the DBF file form anything else but the VFP runtime you can change the DBF file without the CDX getting updated, the synchronisation mechanism is not in the files themselves, it's within VFPs INSERT/UPDATE/DELETE sql commands and in its INSERT/APPEND/REPLACE/DELETE xbase commands and anything else I forgot to modify DBFs, even within BROWSE.

As you set up the indexes once just like table and its fields an easy way to index therefore is using the table designer for that matter. Simple indexes are created by choosing ascending or descending index in the fields tab, more complex indexes on expressions and primary/candidate/unique/binary index types can be created in the index tab of the table designer. As it is a one time job to create indexes it is okay to do this manually/visually. You also can see there, what indexes you might already have.

And don't be afraid to miss some points or do something wrong. As indexes are built from DBF data they can be built again from scratch, and it's better to have any indexes than none at all. So there is much room for trial and error without risking any data loss. Indexes are in a separate file, so you don't have to fear corrupting the DBF by creating indexes.

One final word: Is it okay to use INDEX ON at any time in code? Well, obviously in a script creating the database, or recreating it and last not least you can of course only have indexed cursors, if you INDEX ON them after doing a query INTO CURSOR or CREATE CURSOR, there is no table designer at runtime and even less a cursor designer, so INDEX ON has its application in code creating a database and in code handling indexing of cursors, including view cursors, the existence of the command itself has enough legitimation for these matters and still it's not pointless to warn about using it for DBFs, which already are in production usage. You can make use of INDEX ON command in an update, it's something you could put in an update.exe run once before activating a new version of your EXE.

...If you will have a new version of the application at all. It would be very probable though, changing existing queries WHERE clauses and JOIN conditions after indexing is helpful to make active use of the indexes. Wherever you can define simple indexes on fields, do so instead of indexing expressions. Notice an index on a date field will optimize =,<,> comparisons or BETWEEN range clause, but not YEAR(date)=x, that's were the knowledge about the fine differences of how rushmore works become important. It's simpler for rushmore to optimize date between DATE(X,1,1) AND DATE(X,12,31) than to optimize YEAR(date)=X, unless you INDEX ON YEAR(date), but indexing on that expression makes the index less versatile to use for query optimizations of many more queries about dates or date ranges, just to get you started on such fine differences. And those little things makre much difference on how writing your queries and designing your indexes helps getting a best compromise of optimizing all queries and have least number of indexes needed for that.

Bye, Olaf.
 
Taken from Tamars second article, even if you have indexes, differing codepage and collation can cause the sql engine to not use them. As already said, if you only copy over DBFs, it's quite a sure sign you don't even have any CDXes. That's why I think you don't have that problem. You rather don't have any indexes. Rushmore doesn't optimize queries in any other way than by using indexes. Other optimizations like not testing AND-linked conditions after the first part already is false or not testing OR-linked conditions after the first condition already is true is normal code optimization, not query/rushmore optimization. Network speed, file sizes, disc speed, raid, all these of course also contribute, but are no gain by rushmore optimization. So get started with indexing to get the most of SQL. Notice rushmore will make use of multiple index tags sequentially and automatically, while using SET ORDER/SEEK or SET RANGE and SCAN WHILE and indexes used for RELATIONs and any other xbase constructs will only make use of one index per table and hardly enable you to write code switching indexes. Queries can have a big gain of being capable to make use of an index tag for each single partial where condition and for each join condition, including to consider whether to use a primary key index or foreign key index based on which table has fewer rows (not just by reccount, but by considering already done optimizations up to the join) to process/lookup.

Take it for granted, it pays to index your data and to stay SQL.

One final thought about first copying files: Why not copy over code to the file server? Running code at the file server of course makes data files local files. With a license of VFP you can install it on multiple computers, also server side, just ensure it's only usable by you, besides you can install an EXE you built with VFP runtimes on the file server and let the server run it for example by task scheduler or as always running "background" task or you work on the server via remote desktop (mstsc.exe). And still even queries on local data files profit of rushmore optimization. So get started. Don't expect any rearrangement of the query will give you much gains, as said sorting conditions by most probably hit/miss depending on OR/AND-linking will help of course, but it will only slightly accelerate the processing of each row, also the order of tables can help, but most help is limiting the reading of rows by usage of indexes. Rows not processed at all are what makes a query run faster.

Bye, Olaf.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top