Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

How to optimize this query

Status
Not open for further replies.

mackey333

Technical User
May 10, 2001
563
US
I have two tables with about 2.5 million rows each. One has address information and the other has name information. I've indexed the LOWER() value of the last name column and also the zip code column:

CREATE INDEX last_name_idx ON providers(LOWER(Last_Name));
CREATE INDEX zip_idx ON provider_addresses(zip);

I need to perform a query similar to this (the zip code list is all of the zip codes within x amount of miles):

Code:
SELECT id FROM (SELECT p.id, p.last_name FROM provider_addresses pa INNER JOIN providers p ON p.id = pa.id WHERE pa.practice_zip IN ('19130','19188','19121','19196','19173','19162','19187','19093','19197','19193','19194','19192','19110','19190','19191','19183','19092',
'19195','19184','19102','19176','19178','19099','19170','19123','19171','19255','19105','19104','19103','19109','19107','19175','19122',
'19244','19182','19132','19177','19181','19106','19172','19133','19146','19147','19131','19125','19129','19139','19101','08101','19145',
'19160','19148','19143','19140','08102','19134','08103','19027','19151','19144','19179','19137','19004','19142','08105','08104','19141',
'19066','19127','19124','19050','19185','08110','19112','19082','19120','19128','19096','19072','19023','08107','19138','08030','19126',
'19153','08109','19064','19108','08108','19079','19150','19012','19083','19003','19119','08063','19149','08106','08059','19135','19018',
'19032','08093','19036','19035','19041','19111','08099','19026','19095','19118','19161','08031','19098','08035','08002','19074','19043',
'08033','08076','08065','19070','19152','08077','19076','19444','19113','08086','08052','19010','19136','08007','19029','08097','19038',
'19031','19046','19033','08078','08034','08096','19008','19075','08066','08057','19078','08045','19428','19429','19081','19085','08029',
'08049','19001','19115','19462','19088','08027','19114','19094','08083','08061','08020','19022','19034','08090','19086','19091','19065',
'19025','19087','08003','19080','08051','19089','19090','08084','19006','19073','19154','19405','19489','19009','19015','08075','08056',
'19019','19116','19424','19016','19063','19155','19013','08054','19409','19422','19488','08043','08021','19487','08014','19333','08032',
'08026','19401','19406','19044','19040','19483','08053','19028','19037','08010','19052','19002','19437','19039','19485','19014','19484',
'08080','19020','19404','08046','19312','19053','08081','08091','08073','19403','19021','08071','19017','08085','19436','08074','08062',
'08012','19061','18974','18966','19301','19477','19455','19454','19407','19397','19399','19482','19496','18991','19486','19481','19408',
'19342','08036','19415','19339','19490','19319','08025','19456','08016','18976','08028','19703','19340','19007','19331','18954','19446',
'18936','08060','19048','08009','08048','19049','19373','19056','19420','19395','19355','19423','08067','18929','08039','19426','08055',
'19047','19057','19810','19887','18915','19432','08004','19453','19443','19493','19494','19495','19430','19474','19058','08312','19055',
'18956','08518','18914','19809','19440','19460','19030','18925','18943','19382','18980','19803','19451','18940','19381','08343','19380',
'19388','18932','19383','08018','19897','19054','08554','08069','18912','08098','19438','18922','19896','08089','18946','19802','19450',
'19067','18964','19441','08011','08322','19366','19732','19317','19442','19473','18927','19894','19886','19710','19468'))
WHERE LOWER(LAST_NAME) LIKE LOWER('smi%')

My thought process for designing the query this way is that the list returned by the inner query is only about 45,000 rows which should speed up the WHERE clause of the outer query. However, the problem I am having is that this query takes too long (about 25-35 seconds). Is there any way that I can speed up the execution?




-Greg :-Q

flaga.gif
 
Here is the execution plan:

Code:
Execution Plan
----------------------------------------------------------
   0	  SELECT STATEMENT Optimizer=ALL_ROWS (Cost=21 Card=6 Bytes=144)

   1	0   NESTED LOOPS (Cost=21 Card=6 Bytes=144)
   2	1     TABLE ACCESS (BY INDEX ROWID) OF 'PROVIDERS' (TABLE) (Cost=9 Card=6 Bytes=78)

   3	2	INDEX (RANGE SCAN) OF 'LAST_NAME_IDX' (INDEX) (Cost=3 Card=6)

   4	1     TABLE ACCESS (BY INDEX ROWID) OF 'PROVIDER_ADDRESSES' (TABLE) (Cost=2 Card=1 Bytes=11)

   5	4	INDEX (UNIQUE SCAN) OF 'SYS_C0032076' (INDEX (UNIQUE)) (Cost=1 Card=1)

-Greg :-Q

flaga.gif
 
It should perform faster if you formulate your query this way:
Code:
SELECT id
  FROM (SELECT p.id, p.last_name
          FROM provider_addresses pa
         INNER JOIN providers p ON p.id = pa.id
               WHERE exists (select 'Mackey'
                               from zip_table z
                              where pa.practice_zip = z.zip)
       )
 WHERE LOWER(LAST_NAME) LIKE LOWER('smi%');
The above implies your having a ZIP_TABLE populated with the candidate ZIP Codes.

Let us know your findings.

[santa]Mufasa
(aka Dave of Sandy, Utah, USA)
[I provide low-cost, remote Database Administration services: www.dasages.com]
“Beware of those that seek to protect you from harm. The cost will be your freedoms and your liberty.”
 
Thanks for the quick answer, however it is impossible for me to have the candidate zip codes in their own table because the list is generated on demand in PHP.

For example:

-The user enters a radius of 50 miles from zip code 19130
-PHP selects the zip codes within 50 miles based on lat/lon coordinates
-The results are used to create the string of acceptable zip codes in the list above.

Also, if I search by city, state instead of zip codes it takes a similar amount of time. I am wondering if there is a better way to index the tables or something. The execution plan shows that indexes are being used so I don't understand what is causing the delay.

-Greg :-Q

flaga.gif
 
I have an very similar situation for a site I'm building. My solution (that I haven't implemented yet) Was to denormalize my locations table. Assuming a max radius of 100 miles, I was going to store zip1, zip2, distance in a new table.
Code:
select l1.zip as zip1, l2.zip as zip2,
(((ACOS( SIN( l1.latitude * pi() /180 )
* SIN( l2.latitude * pi() /180 ) + COS( l1.latitude * pi() /180 ) * COS( l2.lati
tude * pi() /180 ) * COS( (
l2.longitude * pi() /180
) - ( l1.longitude * pi() /180 ) ) )
) * 3963.1)) as distance
from locations l1, locations l2
where
(((ACOS( SIN( l1.latitude * pi() /180 )
* SIN( l2.latitude * pi() /180 ) + COS( l1.latitude * pi() /180 ) * COS( l2.lati
tude * pi() /180 ) * COS( (
l2.longitude * pi() /180
) - ( l1.longitude * pi() /180 ) ) )
) * 3963.1)) < 100;

Then Santa's solution is appropriate
Code:
SELECT id
  FROM (SELECT p.id, p.last_name
          FROM provider_addresses pa
         INNER JOIN providers p ON p.id = pa.id
               WHERE exists (select 'Mackey'
                               from zip_table z
                              where pa.practice_zip = z.zip1
                                and z.zip2 = 19130
                                and z.distance <= 50)
       )
 WHERE LOWER(LAST_NAME) LIKE LOWER('smi%');

-----------------------------------------
I cannot be bought. Find leasing information at
 
That sounds like a good idea for a static location, however in my situation the input zip is variable and could be anywhere in the country.

-Greg :-Q

flaga.gif
 
Is not the PHP dynamically accessing the qualified ZIPs from a database somewhere? If so, why not access that directly with the SELECT that I proposed, combined with the proper distance algorithm?

[santa]Mufasa
(aka Dave of Sandy, Utah, USA)
[I provide low-cost, remote Database Administration services: www.dasages.com]
“Beware of those that seek to protect you from harm. The cost will be your freedoms and your liberty.”
 
My solution can be applied for all zips. It will be a very large table, but if you have the space its probably better than calculating distance every time. I haven't completed the new table yet, so I don't have any idea of actual performance gain. I do know that right now, searching by a state and then city (that has multiple zip codes) the search can take up to 40 seconds on my server. I am nearly positive it will be at least ten times faster after I save the results of the distance formula in a table, although that table will probably have over five million rows.
Here is an actual query (MySQL) from the site I'm working on.
Code:
select sql_calc_found_rows userid, username, city,state, workdistance, description, distance from(SELECT u.userid, u.username, u.city, u.state, min(u.workdistance) workdistance, u.description, min(ifnull(truncate(((ACOS( SIN( l1.latitude * pi() /180 ) * SIN( l2.latitude * pi() /180 ) + COS( l1.latitude * pi() /180 ) * COS( l2.latitude * pi() /180 ) * COS( ( l2.longitude * pi() /180 ) - ( l1.longitude * pi() /180 ) ) ) ) * 3963.1),5),0)) distance, u.userlevel FROM `locations` l1, `locations` l2, `users` u WHERE 1=1 AND l1.zip in (select zip from locations where state = 'PA' and city='Philadelphia' ) and l2.zip = u.zip and u.userlevel > 0 and ifnull(((ACOS( SIN( l1.latitude * pi() /180 ) * SIN( l2.latitude * pi() /180 ) + COS( l1.latitude * pi() /180 ) * COS( l2.latitude * pi() /180 ) * COS( ( l2.longitude * pi() /180 ) - ( l1.longitude * pi() /180 ) ) ) ) * 3963.1),0) <= u.workdistance AND lastlogindate > (date_sub(curdate(),INTERVAL 90 DAY)) GROUP BY u.userid, u.username, u.city, u.state, u.description, u.workdistance, u.userlevel) a order by userlevel desc, distance limit 0,10
The result page tells me
Code:
This page took 27.9313 seconds to load.




-----------------------------------------
I cannot be bought. Find leasing information at
 
There is only one tool for the job - the right one.

I quote:-

Oracle Spatial forms a separately-licensed option component of the Oracle Database. Oracle Spatial aids users in managing geographic and location-data in a native type within an Oracle database, potentially supporting a wide range of applications — from automated mapping/facilities-management and geographic information systems (GIS), to wireless location services and location-enabled e-business.

As of Oracle version 8i (8.1.7), potential users can elect to license Oracle Spatial with Oracle Enterprise Edition. Oracle Personal Edition bundles Oracle Spatial for no extra charge.

Oracle Spatial does not run with Oracle Standard Edition, with Oracle Standard Edition One or with Oracle Express Edition (XE) although a version of Spatial, with reduced functionality, named Locator is provided at no additional license fee with these editions.

Might this be of any use chaps?

Regards

T
 
@jaxtell

Sorry, I miss read what you were saying the first time around but your solution seems to make sense. I'll give it a try and see what I come up with.

@thargtheslayer

Thanks for the info, unfortunately I don't have the budget available to add that component to our Oracle license.

-Greg :-Q

flaga.gif
 
Just updating this - The query which is building the zip code distance table (I limited it to 25 miles for testing), is still running. I'll report back when I have results.

-Greg :-Q

flaga.gif
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top