Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Please help optimize 1

Status
Not open for further replies.

foxup

Programmer
Dec 14, 2010
328
CA
Hi All,

Here's the issue. I have 2 DBF's. DBF #1 (c:\UNION\BELL_LOC\SOURCE) has 160,000 recs & DBF #2 (c:\UNION\ALLSTREAM_LOC\SOURCE) has 124,000 recs. I use a simple command:

SELECT numberbill FROM c:\UNION\BELL_LOC\SOURCE WHERE ALLTRIM(numberbill) IN (SELECT ALLTRIM(external_i) FROM c:\UNION\ALLSTREAM_LOC\SOURCE) INTO TABLE tmp

which results the recs that are the same in both. The problem is that it is excruciatingly slow.

Can you please help me speed this up?

Any help please?

Thanks,
FOXUP!
 
So, to translate it into English, you want all the number bills from Bell_Loc that are also in AllStream_loc. Is that right?

In that case, a simple inner join might be faster than your sub-select. Something like this:

Code:
SELECT NumberBill FROM c:\UNION\BELL_LOC\SOURCE bl;
  JOIN  c:\UNION\ALLSTREAM_LOC\SOURCE ar ;
  ON bl.NumberBill = ar.NumberBill ;
  INTO TABLE tmp

However, you should test both versions with your actual data to see which is faster, as there are several factors involved.

Having said that, the most important thing you should do is to make sure you have an index on NumberBill, especially in AllStream_Loc. Doing that could give you a huge improvement, and should be the first thing you check.

I would add that I'm not a fan of specifying the full path to the tables within the SELECT. This is not a performance issue, but it does affect readability and portability. I would prefer to USE each of the two tables before doing the SELECT, giving each of them an alias, and then using that alias in your FROM and JOIN clauses:

Code:
USE c:\UNION\BELL_LOC\SOURCE IN 0 ALIAS Bell
USE c:\UNION\ALLSTREAM_LOC\SOURCE ALIAS AllStream

SELECT NumberBill FROM Bell bl;
  JOIN AllStream ar ;
  ON bl.NumberBill = ar.NumberBill ;
  INTO TABLE tmp

I think you'll agree that's more readable.

Mike




__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
Hi Mike,

Pure awesomeness!! The speed (and readability) is Night and Day !!! I can;t get over the difference. It's astronomical.

Many many thanks. Great Job sir !! :) Star for you.


Thanks,
FOXUP!

 
FWIW, I think the improvement is actually from getting rid of ALLTRIM(), not from changing to a join.

Tamar
 
Taking Mike's suggestion about assigning an ALIAS to your tables before calling the SELECTs a step further, I do the following.

At the beginning of a program I programmatically drill down through all directories that contain my tables and assign every directory path and every table path to a global variable which never changes during program execution. Then I use that global variable wherever I can. This accomplishes three major things 1) all tables become semi-portable without ever changing a single line of code, 2) I never need to remember a path, just the table name or directory name, and 3) everything is much easier to read and understand.

For example I have global variables like:

Code:
gpFamInfo = "C:\Program\Tables\Main\FamInfo.DBF"
gpMain = "C:\Program\Tables\Main.DBF"
gpWinInfo = "C:\Program\Tables\System\WinInfo.DBF"
gpFDC4A = "C:\Program\Tables\Webpage\Ford\FDC4A.DBF"

gdMain = "C:\Program\Tables\Main\"
gdSystem = "C:\Program\Tables\System\"
gdWebpage = "C:\Program\Tables\Webpage\"
gdFord = "C:\Program\Tables\Webpage\Ford\"

So now I write code like

Code:
SELECT(gpFamInfo)

rather than using an ALIAS. If I need a second instance of a table, I just do

Code:
USE gpFamInfo IN 0 ALIAS Whatever

and use that ALIAS as needed.

The criteria I use to make this work are:
1) All tables must exist SOMEWHERE in a SINGLE tree, "Tables" in the above example (location within the tree does not matter)
2) There can never be two or more directories with the same names anywhere within the tree
3) There can never be two or more tables with the same names anywhere within the tree
4) All directory and table names are limited to 8 characters in length (first 10 characters of global variables must be unique)

I started doing this over a year ago because I was tired of getting "random" runtime crashes because a table could not be found even though the alias calling the table was correct and working correctly over 99% of the time. In over 17 years I never found an incorrect alias that would cause the crashes and I never lost any data because of these "random" crashes. Once I started using global variables containing the complete path instead of aliases, the crashes totally disappeared. Go figure. I have no idea whether the change significantly affects program speed.

Another upside is that now if I don't like how the tables are arranged within the tree, I can move them to any directory or any subdirectory (no matter how deep), and never need to change a single line of code in the program. And I can add or subtract directories as needed within the tree without ever needing to change a line of program code.

One downside is that directory names and table names are limited to an 8 character maximum due to the 10 character uniqueness rule for variables. Another downside is that ALL directory names and ALL table names within the tree are assigned to a global variable even if those variables are never used, although with today's computers I don't see that as a big issue. The third downside is that if perchance I want to use a different directory tree name (e.g. "DBFS" instead of "Tables" above) or even move the whole tree to a different directory or drive I need to change TWO lines of code at the start of every relevant program - Wow big deal!

For reference, I am using FP2.6 and VFP3.0 on MACs.

mmerlinn


Poor people do not hire employees. If you soak the rich, who are you going to work for?

"We've found by experience that people who are careless and sloppy writers are usually also careless and sloppy at thinking and coding. Answering questions for careless and sloppy thinkers is not rewarding." - Eric Raymond
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top