Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

SQL Server Puzzle #5: Points Within Polygon

Status
Not open for further replies.

gmmastros

Programmer
Feb 15, 2005
14,901
0
36
US
For my application, I need to create reports that show which points are contained within a polygon.

The coordinate system for this example is Latitude/Longitude. To define the polygon, there is a table MapcountiesPoints. In this table, there is a FeatureId (to identify the county), a point number field to identify the order in which points are put together, a latitude and longitude to identify the location.

Code:
CREATE TABLE [MapCountiesPoints] (
	[FeatureId] [int] NOT NULL ,
	[PointNumber] [int] NOT NULL ,
	[Latitude] [decimal](8, 5) NOT NULL ,
	[Longitude] [decimal](8, 5) NOT NULL ,
	CONSTRAINT [PK_MapCountiesPoints] PRIMARY KEY  CLUSTERED 
	(
		[FeatureId],
		[PointNumber]
	)  ON [PRIMARY] 
) ON [PRIMARY]

Sample data looks like this....

Code:
FeatureId   PointNumber Latitude   Longitude  
----------- ----------- ---------- ---------- 
6           1           40.22638   -75.60358
6           2           40.22615   -75.60386
6           3           40.22580   -75.60429
6           4           40.22574   -75.60436
6           5           40.22562   -75.60452
6           6           40.22544   -75.60476
6           7           40.22538   -75.60488
6           8           40.22534   -75.60493

A polygon is made up from a series of lines. To visualize the boundary, imagine a line drawn from point 1 to point 2, then point 2 to point 3, then point 3 to point 4, and so on. To close the boundary, the last point is connected to the first point.

Download the county data

There is another table, Location that identifies points we want to test. In this table, there is a unique identifier LocationId, and Latitude & longitude to identify the location.

Code:
Create TABLE [Location] (
	[LocationId] [int] NOT NULL ,
	[Longitude] [decimal](8, 5) NULL ,
	[Latitude] [decimal](8, 5) NULL ,
	CONSTRAINT [PK_Location_LocationId] PRIMARY KEY  CLUSTERED 
	(
		[LocationId]
	)  ON [PRIMARY] 
) ON [PRIMARY]

Download the location data

Here's the task....

Write a stored procedure that
1. Accepts FeatureId as an argument
2. Returns a list of LocationId's that are contained within the polygon identifed in the MapCountiesPoints table.

This functionality already works within my application, so I will humbly exclude myself from this puzzle.

Remember, fastest execution time wins. If you have any questions, just ask.

-George

Strong and bitter words indicate a weak cause. - Fortune cookie wisdom
 
1. Latitude/Longitude values are degrees with minutes/seconds converted to decimal (deg + min/60 + sec/3600)?

2. Earth is flat (one turtle, four elephants) or rounded?

3. Polygon can be of any shape - convex, concave?

Btw. very nicely presented problem...

------
"There's a man... He's bald and wears a short-sleeved shirt, and somehow he's very important to me. I think his name is Homer."
(Jack O'Neill, Stargate)
[banghead]
 
1. Yes. Lat/Lons are converted to decimal.

2. You can assume that the Earth is flat. I'm not sure that it matters since we are really trying to determine what is inside a boundary (with well defined vertices). A point within a boundary will still be within the boundary if it is projected on to a flat surface or remains on an ellipsoid surface (like the Earth).

3. Polygon can be any shape? Well, yeah, sorta. The vertices of the polygon MUST be on the earth because there is no elevation component.

-George

Strong and bitter words indicate a weak cause. - Fortune cookie wisdom
 
OK, tnx.

Can ya tell me correct number of locations for @FeatureID=6 (with provided sample data)? Just for quick check, I don't have enough time ATM to perform full testing...

------
"There's a man... He's bald and wears a short-sleeved shirt, and somehow he's very important to me. I think his name is Homer."
(Jack O'Neill, Stargate)
[banghead]
 
I'm fairly sure that there should be 3,722 locations contained within the polygon.

On my computer, it took 5 minutes 22 seconds to return the LocationId's.

-George

Strong and bitter words indicate a weak cause. - Fortune cookie wisdom
 
*** snort *** I got 3,754

You would be surprised with exec time though :)

------
"There's a man... He's bald and wears a short-sleeved shirt, and somehow he's very important to me. I think his name is Homer."
(Jack O'Neill, Stargate)
[banghead]
 
I just ran this through some VB code, and came up with 3,736 points within the polygon.

It could be a rounding issue, I suppose.


-George

Strong and bitter words indicate a weak cause. - Fortune cookie wisdom
 
Humph... interesting. Although I worked with graphical algorithms long time ago, fast SQL implementation is real challenge.

One question: in your code that returns 3722 points, polygon's boundaries are inclusive (when point falls over vertex this counts as "inside")?

------
"There's a man... He's bald and wears a short-sleeved shirt, and somehow he's very important to me. I think his name is Homer."
(Jack O'Neill, Stargate)
[banghead]
 
I'm checking now. Hold on... This is a little slow.

By the way, for those that may be interested, here's a hint on how to determine if a point is within a polygon.


vongrunt, I'm looking in to the inclusive business like you suggested. I'll post back when I figure this out.

-George

Strong and bitter words indicate a weak cause. - Fortune cookie wisdom
 
> By the way, for those that may be interested, here's a hint on how to determine if a point is within a polygon.

Yup, horizontal ray-casting + counting number of intersections + handling exceptions is what I'm trying to do. I'm pretty sure we can shrink exec time down to low teens (seconds).

------
"There's a man... He's bald and wears a short-sleeved shirt, and somehow he's very important to me. I think his name is Homer."
(Jack O'Neill, Stargate)
[banghead]
 
My code uses a vertical ray-casting approach, which should give the same results.

-George

Strong and bitter words indicate a weak cause. - Fortune cookie wisdom
 
I am getting 3854.
Are there 1464 MapPoints and 4166 Locations?

 
> I am getting 3854.

This is definitely wrong because number of points inside bounding box:

Code:
declare @minX decimal(8, 5), @maxX decimal(8, 5), @minY decimal(8, 5), @maxY decimal(8, 5)

select @minX = min(Longitude), @maxX = max(Longitude), @minY = min(Latitude), @maxY = max(Latitude)
from MapCountiesPoints
where FeatureID = 6

select count(*) 
from location 
where latitude between @miny and @maxy 
	and longitude between @minx and @maxx
... is smaller.

> Are there 1464 MapPoints and 4166 Locations?

Yup.

------
"There's a man... He's bald and wears a short-sleeved shirt, and somehow he's very important to me. I think his name is Homer."
(Jack O'Neill, Stargate)
[banghead]
 
vongrunt

Can you please send me your list of LocationId's that are inside the polygon? I'd like to see why our counts are different.

0x676D6D617374726F73406F72626974736F6674776172652E6E6574

-George

Strong and bitter words indicate a weak cause. - Fortune cookie wisdom
 
I figured that one out after I posted. I did the algorithm wrong.

Ok now im down to 3754 in 13.36
 
> Can you please send me your list of LocationId's that are inside the polygon? I'd like to see why our counts are different.

Let's just say my results are probably... not reliable. Yesterday I got 3754. Today - a little bit more (3785). And weird thing, all "maybe's" were eliminated before intersection checks...

------
"There's a man... He's bald and wears a short-sleeved shirt, and somehow he's very important to me. I think his name is Homer."
(Jack O'Neill, Stargate)
[banghead]
 
Hm... got 3754 again.
Code:
create procedure usp_getLocationsInPolygon( @FeatureID int )
as
set nocount on

-- calculate polygon's bounding box
declare @minX decimal(8, 5), @minY decimal(8, 5), @maxX decimal(8, 5), @maxY decimal(8, 5)
select @minX = min(Longitude), @minY = min(Latitude), @maxX = max(Longitude), @maxY = max(Latitude)
from MapCountiesPoints
where FeatureID = @FeatureID

-- locations outside bounding box cannot be within polygon so...
select LocationID, Longitude as Ax, Latitude as Ay
into #PointCandidates
from Location
where (Longitude between @minX and @maxX)
	and (Latitude between @minY and @maxY)

-- for each location A find lines CD that maybe intersect with horizontally cast ray (down the +x axis). 
select PC.LocationID, PC.Ax, PC.Ay, L.*,  case when PC.Ay in (Cy, Dy) then 0 else 1 end as iSect
into #LineCandidates
from #PointCandidates PC
inner join 
(	select C.Longitude as Cx, C.Latitude as Cy, D.Longitude as Dx, D.Latitude as Dy, C.PointNumber as CNum, D.PointNumber as DNum
	from MapCountiesPoints C
	inner join MapCountiesPoints D on C.FeatureID=D.FeatureID and C.PointNumber=D.PointNumber-1
	where C.FeatureID = @FeatureID
) L
on PC.Ay between L.Cy and L.Dy
where (PC.Ax <= L.Cx or PC.Ax <= L.Dx)
union all
select PC.LocationID, PC.Ax, PC.Ay, L.*,  case when PC.Ay in (Cy, Dy) then 0 else 1 end as iSect
from #PointCandidates PC
inner join 
(	select C.Longitude as Cx, C.Latitude as Cy, D.Longitude as Dx, D.Latitude as Dy, C.PointNumber as CNum, D.PointNumber as DNum
	from MapCountiesPoints C
	inner join MapCountiesPoints D on C.FeatureID=D.FeatureID and C.PointNumber=D.PointNumber-1
	where C.FeatureID = @FeatureID
) L
on PC.Ay between L.Dy and L.Cy
where (PC.Ax <= L.Cx or PC.Ax <= L.Dx)

-- we don't need point candidates anymore
drop table #PointCandidates

-- skip horizontal lines
update LC
set DNum = 
( 	select top 1 CNum 
	from #LineCandidates LC2
	where LC.LocationID=LC2.LocationID 
		and LC.Dy <> LC2.Dy 
		and LC.CNum < LC2.CNum 
	order by LC2.CNum
)
from #LineCandidates LC
where LC.iSect = 0 and LC.Dy = LC.Ay

delete LC
from #LineCandidates LC
where LC.Cy = LC.Dy

-- eliminate maybe's (Ax < any{Cx, Dx}) with simplified r-s intersection formula. Division by zero cannot happen. :)
delete from #LineCandidates
where (Ax < Cx or Ax < Dx)
	and not
	(	((Ay-Cy)*(Dx-Cx)-(Ax-Cx)*(Dy-Cy))/((180-Ax)*(Dy-Cy)) between 0 and 1
		and (Ay-Cy)/(Dy-Cy) between 0 and 1 
	)

-- for points crossed by horizontal ray, determine whether they count or not
update V1 set iSect = 1
from #LineCandidates V1
inner join #LineCandidates V2 on v1.LocationID = V2.LocationID and V1.DNum = V2.CNum
where V1.iSect = 0
	and sign(V1.Dy-V1.Cy) = sign(V2.Dy-V2.Cy)

-- locations with odd number of intersections are within a polygon
select LocationID, Ax, Ay
from #LineCandidates
group by LocationID, Ax, Ay 
having sum(iSect) % 2 = 1

drop table #LineCandidates

go
I'm not sure this is correct count, so plz. test results against yours - then we'll see what to do next.

------
"There's a man... He's bald and wears a short-sleeved shirt, and somehow he's very important to me. I think his name is Homer."
(Jack O'Neill, Stargate)
[banghead]
 
I'm beginning to lose faith in data!

>> Hm... got 3754 again.

When I run your SP, I get 3724
When I run my sp, I get 3722

Your sp returns 32 records that mine does not.
My sp returns 30 records that yours does not.

I'm digging a little deeper to try and figure out why there is a difference.

-George

Strong and bitter words indicate a weak cause. - Fortune cookie wisdom
 
Something must be weird with the data at the office. Not sure what it is yet. At home, using vongrunt's SP, I get the 3754 records.

I did figure out what the problem was with my SP. Using the horizontal ray casting method, you need to count the number of times a point intersects a line but only to the right OR the left of the point.

So, the general process is to determine which points lie within the lines vertically, then count the number of lines that intersect horizontally. If the number of intersections is odd ( Num % 2 = 1) then the point is within the polygon.

My problem was that I was using BETWEEN to determine if a point fell within a line vertically. The problem occurs when the Latitude of a point matches the latitude of a line. Using between will cause the point to align itself with both lines, causing the number of lines to be 1 greater than it should be. Which, of course, would cause an odd number to become even, so the algorithm determines that it is NOT inside the polygon eventhough it really is.

I really shouldn't ramble like this at 2 am....

Of course, I'm re-writing my SP. I suspect that this problem does not exist for my customers because their MapCountiesPoints table has float as their data type so this odd behaviour is VERY unlikely to have occurred. I decided to post the puzzle with 5 digits of precision because for the Continental U.S., the 5th digit of precision represents approximately 3 feet.


-George

Strong and bitter words indicate a weak cause. - Fortune cookie wisdom
 
> The problem occurs when the Latitude of a point matches the latitude of a line.

True. This is the trickiest part about ray casting. When ray falls across vertices, individual intersection check is inconclusive.

There is one nicely generalized approach explained in "Algorithms" book by Robert Sedgewick. My sproc is an attempt to implement that in a set-based fashion... but I won't guarantee everything is OK there. It would be nice to make artificial sample data covering all possible scenarios and worst cases... I'll probably do it over weekend.

Btw. with sample data you provided, after some 6-7 code revisions exec time settled down at 0.772s on my AMD64/3200 box (SQL2005).

------
"There's a man... He's bald and wears a short-sleeved shirt, and somehow he's very important to me. I think his name is Homer."
(Jack O'Neill, Stargate)
[banghead]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top