Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Slow query when joining 2 tables

Status
Not open for further replies.

Dennis76

Technical User
Jul 17, 2016
6
NL
Hello,

I have 2 tables: invoices (1M records) and invoice_details (5M records)

I want to select all invoices which containe artiklenr "1234":
SELECT customernr from invoice_details d left join invoices i on i.invoicenr = d.invoicenr where artnr = 1234

This query takes about 3 seconds on a remote computer to fetch 3000 results.
There is an index on invoicenr at both tables. Also artnr is indexed.

If I query only invoice_details (SELECT * from invoice_details where artnr = 1234) then it takes about 0.1 sec.
I tried different joins (left,right, etc...) but nothing helped.

I also copied both tables to a mysql database server and did exact the same query. There it takes about 0.0003s for the results.

Maybe someone has an idea how to speed things up.

Dennis
 
Merging the two servers for DBC/ActiveVFP and MySQL/PHP would obviously also be a solution and going for a 10Gbit line between the two at least would most probably also lower the query time.

Bye, Olaf.
 
I experimented with my code on an external drive, no LAN, but it is even slower. A copy of one 1.1 GB ISO image file took 14.7 seconds, of which I compute a bandwidth of about 0.6 Gbit/s (1.1 GByte*8/14.7s is about 0.6Gbit)

I restarted VFP after creating the test data and just ran the second part of the code from CLEAR onwards, so this queried without cached data and atlopes test query taking 0.062 seconds and the query on the extra table took 0.032 seconds. This points out some fundamental network problem, if your 1GbE is even slower. Net bandwidth can be slower, indeed, not only if the server is busy, but the backbone of your LAN may be the bottleneck. The measurement suggests you could even go faster with a NAS attached to hold the DBFs, you couldnt run activeVFP on that, though.

I may repeat the query experiment after a restart, to also ensure no OS caching is improving the query time, but currently no time for a restart.

Bye. Olaf.
 
OK, the final truth in regard of the initial query speed without any caching in effect, neither OS nor VFP. Working on the external drive data after a notebook restart:

1M invoices
5.5M invoice details

Querying [tt]SELECT customernr FROM invoices INNER JOIN invoice_details ON invoices.invoicenr = invoice_details.invoicenr WHERE artnr = ?m.Artnr INTO CURSOR curTest[/tt] takes:
2.995 seconds

Querying [tt]SELECT customernr FROM article_customers WHERE artnr+BinToC(customernr) = ?m.Artnr INTO CURSOR curTest2[/tt] takes:
1.566 seconds

So an extra table and the additional code needed to maintain its data only gives you a factor of 2.


The results have about 6000 rows, as in your case. They differ, in that there is an overlap of about 500 double customers when not using the article_customers extra table, but the speed difference mainly is because of the aggregated nature of the article_customers table. It's double fast (or half slow), not just 10% difference, so it may be helpful, but maybe not enough.

The slowness of the LAN or in my case external drive query can be reproduced and the only way I see making this faster is via doing the query with an ActiveVFP script on local data and then hope for fast data transfer of the result. As initially said that's the benefit of a servers execution of queries on data stored locally, they get the result fast acting on local drives without LAN bottleneck, they only really profit, if the last step to serve this result data via LAN does not take longer than the time saved by acting locally. One big help is being able to start serving the result before finished with it, as the forwarding from server via ODBC to the caller can do. The simplistic Foxpro "server" I sketched above via [tt]&tcQuery INTO TABLE (tcResultFile)[/tt] can indeed build the result DBF file while querying, but you can only get at the file from the other side, when the query finished. The idea obviously is to pass in a tcResultFile file name, that means a local location for the caller, but specify the name in a way it is accessible from the DBC server, eg as URN file name \\mysqlserver\dbcresults\result.dbf.

Ok, enough about that.

I finally copied the external drive data to an internal drive and did the last thing to test, the speed of the queries with data being "local", on internal local drives. This is much better then, of course - back to where it was when data was cached: 0.073 seconds vs 0.029 seconds.

And just by the way, atlopes query to determine the most often artnr has to go through all data, it takes a few seconds before the really interesting part happens, you can get rid of that time in each test run by using the artnr determined by the first execution and creation of curArtnr. That's also kind of using cached data, manually cached.

Bye, Olaf.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top