{smcl}
{* *! version 1.0.10 15aug2015}{...}
{cmd:help geoinpoly}
{hline}
{title:Title}
{phang}
{cmd:geoinpoly} {hline 2} Match geographic locations to shapefile polygons
{title:Syntax}
{p 8 14 2}
{cmd:geoinpoly} {it:latitude} {it:longitude} {ifin}
{cmd:using} {help filename:{it:coordinates_file}}
[{cmd:,}
{opt u:nique}
{opt ring}
{opt inside}
{opt noproj:ection}]
{title:Description}
{pstd}
{cmd:geoinpoly} returns, in a variable named {it:_ID},
the identifier of a polygon from
{help filename:{it:coordinates_file}}
that spatially overlays a point
defined by {it:latitude} and {it:longitude}.
If there is more than one overlaying polygon,
the observation is duplicated as needed so that all
matching polygon identifiers can be stored in {it:_ID}.
The {opt u:nique} option can be used to limit each point to
a single matching polygon, in which case the lowest identifier
in terms of sort order is returned in {it:_ID}.
{pstd}
The {help filename:{it:coordinates_file}} must follow the format
used by
{cmd:shp2dta}
(from SSC, {it:{stata ssc install shp2dta:click to install}})
when converting shapefiles to Stata datasets.
It must contain the following 3 variables: {it:_ID} (polygon identifier),
{it:_X} (longitude), and {it:_Y} (latitude).
{pstd}
All coordinates ({it:latitude} {it:longitude} and {it:_Y} {it:_X})
must be
based on the same
geographic coordinate system.
Latitudes range from -90 to 90
and longitudes range from -180 to 180.
{cmd:geoinpoly} assumes that polygon segments represent
lines of constant bearing (also referred to as rhumb lines or loxodromes).
To precisely calculate the latitude at which a
polygon segment intersects the meridian,
{cmd:geoinpoly} transforms all geographic coordinates to ({it:x-coor}, {it:y-coor}) using an
ellipsoid Mercator projection.
This projection has the property of representing lines of
constant bearing as straight lines on the
projected plane.
{pstd}
If {opt noproj:ection} is specified, {cmd:geoinpoly} skips the
Mercator projection step and uses
{it:latitude} {it:longitude} and {it:_Y} {it:_X}
without any further transformation.
{pstd}
By default, {cmd:geoinpoly} will return a match if
the point falls either inside or on the polygon.
With the {opt inside} option, a match on a vertex
or on a side is excluded.
With the {opt ring} option, a polygon is a match
only if the point is located on a vertex or exactly
on a side.
{title:Rules for polygons}
{pstd}
Polygons with different
identifiers ({it:_ID}) may be adjacent or even overlap
other polygons.
{pstd}
A polygon may consist of one or more rings.
A ring starts with an observation where _X and _Y are missing
followed by a sequence of at least four points that form
a closed loop.
So rings are also polygons but they all have the same
identifier value in {it:_ID} (e.g. Hawaii is described
as a single polygon that contains several
rings, one for each island).
{pstd}
A ring cannot self-intersect and multiple rings within the same polygon cannot
intersect either.
A ring may be completely inside another ring of the same polygon, in which
case the inside ring describes a hole in the polygon.
A polygon's rings
may touch each others at vertices but not along
segments.
{pstd}
See {browse "https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf":ESRI's technical specifications}
for more information about shapefile polygons.
{title:Algorithm}
{pstd}
{cmd:geoinpoly} uses a
{browse "http://en.wikipedia.org/wiki/Point_in_polygon#Ray_casting_algorithm":ray casting algorithm}
to determine if a point falls inside a polygon.
The point's longitude determines the ray used (i.e. meridian or line of longitude
that passes by the point's location).
Starting from both the North Pole and South Pole, {cmd:geoinpoly} counts, by polygon, the number of
line segments crossed before reaching the latitude of the point.
An odd count indicates that the point is in the polygon; an even count
indicates that the point is located outside the polygon.
If results differ when counting from opposite poles, {cmd:geoinpoly}
will stop and display which polygon triggered the error.
{pstd}
A point is considered on the ring if it matches a vertex
or if it is located exactly at the point where the ray intersects
the segment. For vertical segments, the slope is infinite so
{cmd:geoinpoly} uses the latitude of the segment's vertices
to determine if the point is on the segment.
{pstd}
{cmd:geoinpoly} incorporates the same divide and conquer strategy
used in {cmd:geonear} (from SSC, {it:{stata ssc install geonear:click to install}})
to significantly reduce the number of polygon segments
that need to be counted.
The points in memory are initially sorted by longitude and
recursively split in two until the number of points
falls below a preset cutoff and triggers the recursion's base case. At each step,
polygon segments are discarded if they are outside the x-bounds
of points at that stage.
When a base case is reached, the number of
points and line segments is very small.
The y-coordinate where the point's longitude line
intersects polygon line segments is then calculated and
the _ID of matching polygon(s) is returned.
{title:Examples}
{pstd}
The examples below require that you first install {cmd:geo2xy} and its
U.S. state boundaries shapefile datasets. To install {cmd:geo2xy} and
download its datasets to the current directory:
{cmd:.} {stata ssc install geo2xy}
{cmd:.} {stata `"net get geo2xy, from("http://fmwww.bc.edu/repec/bocode/g")"'}
{title:Example 1: Random points over the Midwestern U.S. States}
{pstd}
The following example creates a set of points randomly distributed over the Midwestern U.S.
We use {cmd:geoinpoly} to match each point to a state polygon. We then
merge with the shapefile's database and get the shape's attributes.
{cmd}set seed 42134123
clear
set obs 10000
gen double _Y = 40 + 8.5 * runiform()
gen double _X = -90 + 9 * runiform()
geoinpoly _Y _X using "geo2xy_us_coor.dta"
tab _ID, missing
merge m:1 _ID using "geo2xy_us_data.dta", keep(master match) nogen
tab NAME{txt}
{it:({stata geoinpoly_examples ex1:click to run})}
{pstd}
Now let's produce a map to visualize the results. We use {cmd:geo2xy}
to project latitude and longitude to (x,y) and highlight
points in Michigan differently. Do the same for points that fall into the
Great lakes or in Canada.
{cmd}levelsof _ID, clean sep(",") // polygon identifiers that have matched
local states `r(levels)'
rename _ID _IDmatched
append using "geo2xy_us_coor.dta"
drop if !inlist(_ID , `states') & !mi(_ID)
geo2xy _Y _X, gen(ycoor xcoor) proj(albers)
scatter ycoor xcoor if !mi(_IDmatched), msymbol(point) mcolor(green) || ///
scatter ycoor xcoor if !mi(_IDmatched) & NAME == "Michigan", msymbol(point) mcolor(cranberry) || ///
scatter ycoor xcoor if mi(_IDmatched) & mi(_ID), msymbol(point) mcolor(sandb) || ///
line ycoor xcoor if !mi(_ID) , lwidth(thin) lcolor(gray) cmissing(n) ///
ylabel(minmax) yscale(off) ///
xlabel(minmax) xscale(off) ///
aspectratio(`=r(aspect)') legend(off){txt}
{it:({stata geoinpoly_examples ex1b:click to run})}
{title:Example 2: The Four Corners}
{pstd}
The following example illustrates what happens when points are
located on a polygon ring (i.e. on a vertex or exactly on a segment).
We create a set of points around the exact location of the Four Corners,
a point that is at the intersection of 4 U.S. States.
{cmd}set seed 34124
clear
set obs 50
gen pointid = _n
gen double lat = 36.99908399999999631
gen double lon = -109.0452229999999929
replace lat = lat + runiform() - .5 if _n > 1
replace lon = lon + runiform() - .5 if _n > 1
geoinpoly lat lon using "geo2xy_us_coor.dta"
bysort pointid (_ID): gen N = _N
tab _ID N, missing{txt}
{it:({stata geoinpoly_examples ex2a:click to run})}
{pstd}
Now let's continue with the points in memory but redo using the {opt u:nique},
{opt ring}, and {opt inside} option. Note that {opt u:nique} and {opt ring}
are combined to avoid multiple matches for the Four Corner point.
{cmd}drop _ID
bysort pointid: keep if _n == 1
geoinpoly lat lon using "geo2xy_us_coor.dta", unique
rename _ID _IDunique
geoinpoly lat lon using "geo2xy_us_coor.dta", inside
rename _ID _IDinside
geoinpoly lat lon using "geo2xy_us_coor.dta", ring unique
rename _ID _IDringuniq
list in 1/5{txt}
{it:({stata geoinpoly_examples ex2b:click to run})}
{pstd}
This is what happens if the {opt u:nique} is omitted:
{cmd}geoinpoly lat lon using "geo2xy_us_coor.dta", ring
list in 1/10{txt}
{it:({stata geoinpoly_examples ex2c:click to run})}
{title:Example 3: Overlapping polygons}
{pstd}
The {help filename:{it:coordinates_file}} may contain overlapping
polygons. You can even mix and match polygons from different
shapefiles as long as you don't
get the identifiers mixed-up. In this example, we use {cmd:geocircles}
(from SSC, {it:{stata ssc install geocircles:click to install}})
to create polygons around the University of Michigan and
Ohio State University and append these polygons to the U.S. state boundaries
polygons.
{cmd}geocircles 42.265864 -83.748694 150, data("data.dta") coor("coor_MI.dta") replace
geocircles 40.012308 -83.027586 150, data("data.dta") coor("coor_OH.dta") replace
use "coor_MI.dta", clear
replace _ID = -1
append using "coor_OH.dta"
replace _ID = -2 if _ID != -1
append using "geo2xy_us_coor.dta"
save "states_circles.dta", replace{txt}
{it:({stata geoinpoly_examples ex3a:click to run})}
{pstd}
Next we create points over an area that covers the region and
find matching polygons and note how many polygons matched.
{cmd}set seed 5234523
clear
set obs 10000
gen double _Y = 38 + 8 * runiform()
gen double _X = -90 + 9 * runiform()
geoinpoly _Y _X using "states_circles.dta"
bysort _Y _X: gen N_ID = _N
tab _ID N_ID, missing
merge m:1 _ID using "geo2xy_us_data.dta", keep(master match) keepusing(NAME) nogen
tab NAME{txt}
{it:({stata geoinpoly_examples ex3b:click to run})}
{pstd}
Finally, we create a map that highlights points in the circles
and points common to both circles. We drop state polygons that do not
overlap the circles.
{cmd}levelsof _ID if N_ID > 1, clean sep(",")
local states `r(levels)'
rename _ID _IDmatched
append using "states_circles.dta"
drop if !inlist(_ID , `states') & !mi(_ID)
geo2xy _Y _X, gen(ycoor xcoor) proj(albers)
scatter ycoor xcoor if _IDmatched > 0 & !mi(_IDmatched), msymbol(point) mcolor(gs15) || ///
scatter ycoor xcoor if N_ID == 2 & NAME == "Michigan", msymbol(point) mcolor("blue") || ///
scatter ycoor xcoor if N_ID == 2 & NAME == "Ohio", msymbol(point) mcolor("red") || ///
scatter ycoor xcoor if N_ID == 3, msymbol(point) mcolor(green) || ///
line ycoor xcoor if !mi(_ID) & _ID > 0, lwidth(thin) lcolor(gray) cmissing(n) || ///
line ycoor xcoor if !mi(_ID) & _ID < 0, lwidth(vthin) lcolor(gray) cmissing(n) ///
ylabel(minmax) yscale(off) ///
xlabel(minmax) xscale(off) ///
aspectratio(`=r(aspect)') legend(off){txt}
{it:({stata geoinpoly_examples ex3c:click to run})}
{title:References and acknowledgements}
{pstd}
The U.S. state boundaries datasets distributed with
{cmd:geo2xy} and used in the examples above
were generated by {cmd:shp2dta}
using
the {it:2013 Cartographic Boundary File, State for United States, 1:500,000}
shapefile distributed by the U.S. Census Bureau and available from
{it:{browse "http://www.census.gov/geo/maps-data/data/cbf/cbf_state.html":Cartographic Boundary Shapefiles - States}}.
The xml in the shapefile includes the statement: "These products are
free to use in a product or publication,
however acknowledgement must be given to the U.S. Census Bureau as the source."
{pstd}
ESRI, {browse "http://support.esri.com/en/knowledgebase/whitepapers/download/fileid/282":ESRI Shapefile Technical Description, An ESRI White Paper - July 1998}.
{pstd}
The ray casting algorithm
is explained in the {browse "http://en.wikipedia.org/wiki/Point_in_polygon#Ray_casting_algorithm":Point in polygon}
Wikipedia page.
{pstd}
Darel Rex Finley,
{browse "http://alienryderflex.com/polygon/":Point-In-Polygon Algorithm - Determining Whether A Point Is Inside A Complex Polygon}.
{pstd}
Eric Haines,
{browse "http://erich.realtimerendering.com/ptinpoly/":Point in Polygon Strategies}.
{title:Author}
{pstd}Robert Picard{p_end}
{pstd}picard@netbox.com{p_end}
{marker also}{...}
{title:Also see}
{psee}
Stata: {help cross}
{p_end}
{psee}
SSC:
{stata "ssc des geo2xy":geo2xy},
{stata "ssc des geocircles":geocircles}, {stata "ssc des geodist":geodist}, {stata "ssc des geonear":geonear},
{stata "ssc des shp2dta":shp2dta}, {stata "ssc des spmap":spmap}, {stata "ssc des mergepoly":mergepoly}
{p_end}
{psee}
Others: {browse "http://www.statalist.org/forums/forum/general-stata-discussion/general/194666-shapefiles-which-points-latitude-longitude-are-within-which-polygon?p=194934#post194934":Maurizio Pisati's -point2poly- on Statalist}
{p_end}