Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations biv343 on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Looking for better solution than CURSOR 2

Status
Not open for further replies.

Qik3Coder

Programmer
Jan 4, 2006
1,487
US
Workup:
I have a (table) function
getZips(@ZipCode, @RangeToCheckInMiles)
that returns a list of zip codes within x miles of a zip code that you pass to the function.
I have a temp table that has two zip codes in it.

Problem:
I need to update a flag column if Zip1 is close to zip2.
I tried joining on the table function, but won't work, because i need to pass a value from table 1 to the function.

Here is what i have so far:
Code:
Select idCol, zip1, zip2, 'N' as CloseToEachOther
Into #tmp
From table1 t1
join table2 t2 on
 1=1

DECLARE @idCol int
DECLARE @zip1 varchar(10)
DECLARE @zip2 varchar(10)
DECLARE compareZips CURSOR for 
	SELECT idCol, zip1, zip2
	FROM #tmp tmp
OPEN CompareZips
FETCH NEXT FROM CompareZips
	INTO @idCol, @zip1, @zip2
WHILE @@FETCH_STATUS = 0
 BEGIN
	Update t
	set t.CloseToEachOther = 'Y'
	FROM #tmp t
	Join dbo.GetZips(@zip1, 100) zips on
	t.idCol = @idCol
	and left(@zip1, 5) = zips.Zip1
	and left(@zip2,  5) = zips.Zip2
FETCH NEXT FROM CompareStudentZipToDLeader
	INTO @idCol, @zip1, @zip2
 END
CLOSE CompareZips
DEALLOCATE CompareZips

select * from #tmp






-Sometimes the answer to your question is the hack that works
 
Take a look here: thread183-1197121

If that doesn't solve your problem, let me know. I should also add that I have a new CalculateDistance function that returns the same results, but is faster than the one shown in that thread.

You'll have to ignore the fact that it looks really ugly. I was shooting for speed, not readability.

Code:
[COLOR=blue]ALTER[/color]    [COLOR=#FF00FF]Function[/color] [dbo].[CalculateDistance](@Longitude1 [COLOR=blue]Float[/color], @Latitude1 [COLOR=blue]Float[/color], @Longitude2 [COLOR=blue]Float[/color], @Latitude2 [COLOR=blue]Float[/color])
Returns [COLOR=blue]Float[/color]
[COLOR=blue]AS[/color]
[COLOR=blue]Begin[/color]
	[COLOR=blue]Return[/color] [COLOR=#FF00FF]Sqrt[/color]([COLOR=#FF00FF]Power[/color](@Longitude2 - @Longitude1, 2) * [COLOR=#FF00FF]Power[/color]([COLOR=#FF00FF]Cos[/color]((@Latitude1 + @Latitude2) / 114.591559026164646), 2) * 12391523883.7963953387686623629457364 + [COLOR=#FF00FF]Power[/color](@Latitude2 - @Latitude1,2) * 12346214104.563361) / 1609.344
[COLOR=blue]End[/color]

-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
YOur distance calculate, when run against my table with 40k rows returns almost identical times as my distance calculate.
Sidenote, it's fun to accidentally cartesian join a table with 40k rows...

I can't eliminate the hawaii vs boston comparision, because that is sometimes the case. I may be able to optimize it with the inline function instead though. I'll post in a lil bit with what i have.

Thanks gmastros


-Sometimes the answer to your question is the hack that works
 
Just out of curiosity... would you be willing to post your distance calculation function? I'm interested to see it.



-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
yeah it's probably not a secret...

Code:
-- CONSTANTS
DECLARE @Radius  DECIMAL(8,4)
DECLARE @Deg_62_5m DECIMAL(8,4)
DECLARE @MaxDeviation DECIMAL(8,4)

SET @Radius  = 3981.25
-- APPROX RADIUS IN KILOMETERS 6370, Miles 3981.25

/*
Distance is calculated using the
HAVERSINE FORMULA GCD: Great Circle Distance.
c = 2 * asin( sqrt(( sin(( lat1 - lat2 )/2) )^2 + cos( lat1 ) * cos( lat2 ) * ( sin(( lon1 - lon2 )/2) )^2 ) )
*/
Create Function [dbo].[TEST_CalculateDistance2](@Longitude1 Float, @Latitude1 Float, @Longitude2 Float, @Latitude2 Float)
Returns Float
AS
Begin
    Return 
3981.25 * 2 *
   	ASIN (
	 CASE
    	  WHEN SQRT(POWER((SIN((RADIANS(@Latitude2) - RADIANS(@Latitude1)) / 2)), 2) + COS(RADIANS(@Latitude1)) * COS(RADIANS(@Latitude2)) *
	   POWER((SIN((RADIANS(@Longitude2) - RADIANS(@Longitude1)) / 2)), 2)) > 1 THEN 1
     	  ELSE SQRT(POWER((SIN((RADIANS(@Latitude2) - RADIANS(@Latitude1)) / 2)), 2) + COS(RADIANS(@Latitude1)) * COS(RADIANS(@Latitude2)) *
	   POWER((SIN((RADIANS(@Longitude2) - RADIANS(@Longitude1)) / 2)), 2))
    	 END
	) 
End

by the way, i'm on 20 mins and rolling with that cartesian join...I estimated 1.7 billion rows (how's that for load testing. I did yours first


Code:
Create proc Test_PostCompare1 as
Select [dbo].[TEST_CalculateDistance1](p2.Longitude, p2.Latitude, P1.Longitude, P1.Latitude) as Distance, P1.* from PostalCode p1 
join PostalCode P2 on 1=1
go

-Sometimes the answer to your question is the hack that works
 
Thank you for the inline function suggestion.

it's interesting, when actually running the queries, it says that both functions use the same resources.

The overall query takes approx 8 secs, which i can tune from here.

-Sometimes the answer to your question is the hack that works
 
Wait a minute. Your query went from 20 minutes to 8 seconds? Am I reading this right?

Which function are you using? Yours or mine?

I know that when I was optimizing mine for speed, a couple things made a big difference.

The RADIANS function is slow.
The PI function is slow.

It's better to hard code those constants. Well... Radians is not a constant, but you get the idea. Radians does nothing more than Multiply by PI and divide by 180. SO, the following all return the same value.

Code:
Select RADIANS(45.0), 
       45.0 * PI() / 180, 
       45.0 * 0.017453292519943295

The third one, where you multiply by a constant, will be faster.

Something else to consider... the less calculations you do, the faster the formula will be. So, things like this....

Code:
RADIANS(@Longitude2) - RADIANS(@Longitude1)

can be reduced to....

@Longitude2 * PI / 180 - @Longitude1 * PI / 180

Which can be further reduced to:

(@Longitude2-@Longitude1) * PI / 180

And finally, the constant....

Code:
(@Longitude2-@Longitude1) * 0.017453292519943295

You will never notice the difference for a small handful (think only a couple thousand calculations), but the more calculations you do, the more you will notice a difference.

Make sense?

-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
oh, mine didn't go from 20 mins to 8 secs, i actually hadn't let the cursor finish. I was(still am) running the cartesian join on a 40k row table, calculating the distance from every zip code in the US to every other zip code in the US. It is at 2hrs now.

-Sometimes the answer to your question is the hack that works
 
Why are you doing a cartesian join? The distance from Zip1 to Zip2 is the same as the distance from zip2 to zip1. Also... the distance from zip1 to zip1 will always be 0, right?

Code:
Select [dbo].[TEST_CalculateDistance1](p2.Longitude, p2.Latitude, P1.Longitude, P1.Latitude) as Distance, P1.* from PostalCode p1
join PostalCode P2 [!]on P1.Zip < P2.Zip[/!]

Of course, this will require you to be a little smarter about your query, making sure that Zip1 is less than Zip2 when doing your lookups.

My zipcode table has approximately 43,000 records in it. When I run....

Code:
Select Count(*) From ZipCode A Inner Join ZipCode B On A.ZipCode < B.ZipCode

I get [!]926,156,237[/!]. Are you really sure you want to do (almost) a billion calculations? Actually, for you it will be twice that many because of your join.


-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
It's not for keeps...it's friday. No one else is using the database, and it's running while i work on other things. Just curious how long it will take.

-Sometimes the answer to your question is the hack that works
 
17 hrs...

-Sometimes the answer to your question is the hack that works
 
George,

By the way, I looked at that other thread and found some optimizations (but only about 5% improvement) for that distance function, namely, collapse calculations, make the whole thing based on miles instead of meters with a conversion, and simply embed once-used calculations instead of declaring variables for them.

In analyzing the function I found it interesting that it does try for top accuracy by using two different values of the circumference of the earth, one for lat and one for long. So I used two values, too, based on this info.

Qik3Coder,

Doing a calculation on the cartesian product could take a looooong time. Here's my method. I used George's old sql challenge for development purposes, but you could easily adapt it to your own data:

Code:
-- manage indexes a bit
alter table StateCapitals alter column State VarChar(20) not null
alter table statecapitals add constraint pk_statecapitals primary key clustered (state)
alter table StateCapitals add box1 int, box2 int
create nonclustered index ix_statecapitals_boxes on statecapitals (box1, box2)
-- your table should have appropriate indexes. Experimentation is in order to see what indexes the optimizer actually uses. Don't add indexes that the optimizer won't use.

declare @milesapart int
set @milesapart = 100

update statecapitals
set
	box1 = floor((latitude + 90) * 34.5275275 / @milesapart) + floor((longitude + 180) * 34.585486 / @milesapart) * 1000,
	box2 = floor((latitude + 90 + (@milesapart / 69.055055)) * 34.52752750 / @milesapart) + floor((longitude + 180 + (@milesapart / 69.170972)) * 34.585486 / @milesapart) * 1000

select FromCapital, ToCapital, Distance = Convert(int, Distance) / 5 * 5
from (
	select
		FromCapital = s1.capital + ', ' + s1.state,
		ToCapital = s2.capital + ', ' + s2.state,
		Distance = dbo.fnCalculateDistance(s1.longitude, s1.latitude, s2.longitude, s2.latitude)
	from
		StateCapitals s1
		inner join StateCapitals s2 on s1.state <> s2.state
	where
		s1.box1 = s2.box1 or s1.box2 = s2.box2
	) x
where distance <= @milesapart
order by
	FromCapital, ToCapital, Distance
And here's that function:

Code:
Create Function fnCalculateDistance(@Longitude1 Float, @Latitude1 Float, @Longitude2 Float, @Latitude2 Float)
Returns Float
AS
Begin
	Declare @DeltaXMiles Float
	Declare @DeltaYMiles Float

	Set @DeltaXMiles = Abs(@Longitude2 - @Longitude1) * Cos((@Latitude1 + @Latitude2) * 0.00872664625) * 69.170972
	Set @DeltaYMiles = Abs(@Latitude2 - @Latitude1) * 69.055055
	Return Sqrt(@DeltaXMiles * @DeltaXMiles + @DeltaYMiles * @DeltaYMiles)
End
Rationale: first add some work columns to preload a lot of distance calculation without using a complex distance function. These calculations are only needed once per row (n times) instead of once per cross-joined row (n*n times). Essentially, make two grid overlays of side length @milesapart * 2 that are offset from each other by @milesapart x and @milesaparty y, and give them a number. Points that are within @milesapart miles of each other will have one of these two values the same. We are basically filtering out points that are more than @milesapart * 2 * sqrt(2) apart with a simple equality expression.

Note: It's possible I made a mistake somewhere. I was about to post and realized that the value I was using for the second half of each box calculation was based on the distance between longitude lines at north american latitudes. So that's fixed, but I could have made a mistake somewhere else. But on a huge data set, it should be easy to confirm that my method returns the right results by a simple rowcount comparison to another query.

[COLOR=black #e0e0e0]For SQL and technical ideas, visit my blog, Squared Thoughts.

The best part about anything that has cheese is the cheese.[/color]
 
My personal goal for a query is 100 milli-seconds. Of course, I can't always accomplish it, but clearly.... you've got some optimizing to do. [lol]



-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Hey look, George posted an optimized function at the top of this thread.

ESquared <-- dork

Code:
ALTER Function [dbo].[CalculateDistance](@Longitude1 Float, @Latitude1 Float, @Longitude2 Float, @Latitude2 Float)
Returns Float
AS
Begin
	Return Sqrt(Power((@Longitude2 - @Longitude1) * Cos((@Latitude1 + @Latitude2) / 114.591559026164646), 2) * 69.1693 + Power(@Latitude2 - @Latitude1,2) * 69.0427)
End
Okay so here's your function with one power() and the final division removed. lol

Is one power() really faster than one declared variable?
 
Thanks ESquared for reducing the formula even further.



-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Hey, blew it again! George pointed out where my formula was wrong. I mistakenly took the square root of the constants inside the Sqrt expression (I thought they needed to be similar to my function above) but was forgetting that in the function I originally posted, the values were getting squared in the process. So the constants for George's optimized function have to be near squares of the one from my originally proposed function.

Code:
CREATE Function [dbo].[CalculateDistanceGeorge_ErikReduced](@Longitude1 Float, @Latitude1 Float, @Longitude2 Float, @Latitude2 Float)
Returns Float
AS
Begin
    Return Sqrt(Power(@Longitude2 - @Longitude1, 2) * Power(Cos((@Latitude1 + @Latitude2) / 114.591559026164646), 2) * 4784.394119164060144943884 + Power(@Latitude2 - @Latitude1,2) * 4766.899915599103900266195)
End
 
Yikes! The Power() function crept back in.

Trying again:

Code:
CREATE Function [dbo].[CalculateDistanceGeorgeErik](@Longitude1 Float, @Latitude1 Float, @Longitude2 Float, @Latitude2 Float)
Returns Float
AS
Begin
    Return Sqrt(Power((@Longitude2 - @Longitude1) * Cos((@Latitude1 + @Latitude2) / 114.591559026164646), 2) * 4784.394119164060144943884 + Power(@Latitude2 - @Latitude1,2) * 4766.899915599103900266195)
End
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top