{smcl}
{* 3feb2009}{...}
{cmd:help spkde}{right:Version 1.0.0}
{hline}
{title:Title}
{p 4 13 2}
{hi:spkde} {hline 2} Kernel estimation of density and intensity functions
for two-dimensional spatial point patterns {p_end}
{marker syntax}{title:Syntax}
{p 8 15 2}
{cmd:spkde} [{varlist}] {ifin} {help using}
{help spkde##section04:{it:gridpoints}}
[{cmd:,}
{help spkde##options1:{it:options}}]
{synoptset 35 tabbed}{...}
{marker options1}{synopthdr:{help spkde##options2:options}{col 41}}
{synoptline}
{marker main}{syntab: Main}
{p2coldent :* {opt x:coord(xvar)}}variable containing the x-coordinate
of data points {p_end}
{p2coldent :* {opt y:coord(yvar)}}variable containing the y-coordinate
of data points {p_end}
{synopt :{opt k:ernel(kf)}}kernel function, where {it:kf} is one of the
following: {cmdab:qu:artic} (default), {cmdab:un:iform},
{cmdab:no:rmal}, {cmdab:ne:gexp}, {cmdab:tr:iangular},
{cmdab:ep:anechnikov} {p_end}
{synopt :{opt trunc:ated(t)}}truncation parameter for kernel functions
{cmd:normal} and {cmd:negexp}, where {it:t} is a positive number {p_end}
{p2coldent :* {opt b:andwidth(method)}}method for setting the value of the
kernel bandwidth, where {it:method} is one of the following: {cmdab:fbw},
{cmdab:ndp}, {cmdab:mix:ed} {p_end}
{marker estpar}{syntab: Estimation parameters}
{synopt :{cmdab:fbw(ad}{it:q}|{it:b}{cmd:)}}fixed kernel bandwidth, where
{it:q} is positive integer and {it:b} is a positive number {p_end}
{synopt :{opt ndp(d)}}minimum (weighted) number of data points to be used for
kernel estimation at each grid point, where {it:d} is a positive number
{p_end}
{synopt :{opt ndpw(ndpwvar)}}weight data points by variable {it:ndpwvar}
when searching for the minimum number of data points to be used for
kernel estimation {p_end}
{synopt :{opt edge:correction}}apply approximate edge correction {p_end}
{syntab: Reporting}
{synopt :{opt d:ots}}display job progression dots {p_end}
{synopt :{opt noverb:ose}}suppress display of job progression {p_end}
{syntab: Saving results}
{p2coldent :* {cmdab:sav:ing(}{help spkde##section05:{it:kernest}} [{cmd:, replace}]{cmd:)}}save
results to Stata dataset {it:kernest} {p_end}
{synoptline}
{p 4 6 2}* Required option {p_end}
{marker desc}{title:Description}
{pstd} {cmd:spkde} implements a variety of kernel estimators of both the
probability density function and the intensity function of
two-dimensional spatial point patterns. {p_end}
{pstd} A two-dimensional spatial point pattern {bf:S} can be defined as a
set of {it:data points} {bf:s}_{it:i} ({it:i} = 1, ..., {it:n})
located in a two-dimensional study region {it:R} at coordinates
({it:s_i}1, {it:s_i}2). Each data point {bf:s}_{it:i} represents
the location in {it:R} of one or more “objects” of some kind: people,
events, sites, buildings, plants, cases of a disease, etc. {p_end}
{pstd} In the analysis of spatial point patterns we are often interested
in determining whether the distribution of the objects of interest
within {it:R} exhibits some form of clustering, as opposed to being
random. To explore the possibility of clustering, it may be useful
to describe the spatial point pattern of interest by means of its
{it:probability density function} {it:p}({bf:s}) and/or its
{it:intensity function} {it:l}({bf:s}). {p_end}
{pstd} The probability density function {it:p}({bf:s}) defines the probability
of observing an object at location {bf:s} in {it:R}. In turn, the
intensity function {it:l}({bf:s}) defines the expected number of objects
per unit area at location {bf:s} in {it:R}. Therefore, {it:p}({bf:s})
and {it:l}({bf:s}) differ only by a constant of proportionality (Bailey
and Gatrell 1995; Waller and Gotway 2004). {p_end}
{pstd} Both the probability density function {it:p}({bf:s}) and the intensity
function {it:l}({bf:s}) of a given two-dimensional spatial point pattern
can be easily estimated by means of nonparametric estimators, e.g.,
kernel estimators. {it:Kernel estimators} are used to generate a
spatially smooth estimate of {it:p}({bf:s}) and/or {it:l}({bf:s}) at a
fine grid of points {bf:s}_{it:g} ({it:g} = 1, ..., {it:G}) covering
the study region {it:R} (Bailey and Gatrell 1995; Waller and Gotway
2004). {p_end}
{pstd} {cmd:spkde} computes kernel estimates of both {it:p}({bf:s}) and
{it:l}({bf:s}) at a grid of points generated by the user-written
Stata program {help spgrid}. Expressly, for each grid point
{bf:s}_{it:g}, {cmd:spkde} computes first the kernel estimate of
the intensity {it:l}({bf:s}_{it:g}), and then the kernel estimate
of the density {it:p}({bf:s}_{it:g}). {p_end}
{marker eq1}{pstd} The intensity {it:l}({bf:s}_{it:g}) at each grid point
{bf:s}_{it:g} is estimated as follows: {p_end}
{space 12}^{space 10}{it:c}{space 6}{it:n}{space 5}
{space 4}(1){space 5}{it:l}({bf:s}_{it:g}) = {hline 5} · SUM {it:k}({it:d_ig} / {it:h_g}) · {it:y_i}
{space 22}{it:A_g}{space 4}{it:i}=1
{pstd} where {it:k}(·) is the kernel function {hline 1} usually a unimodal
symmetrical bivariate probability density function; {it:d_ig} is the
Euclidean distance between data point {bf:s}_{it:i} and grid point
{bf:s}_{it:g}; {it:h_g} is the kernel bandwidth {hline 1} i.e., the
radius of the kernel function {hline 1} at grid point {bf:s}_{it:g};
{it:y_i} is the number of objects located at data point {bf:s}_{it:i};
{it:A_g} is the area of the subregion of {it:R} over which the kernel
function is evaluated at grid point {bf:s}_{it:g}, possibly corrected
for edge effects; and {it:c} is a constant of proportionality. {p_end}
{pstd} In turn, the density {it:p}({bf:s}_{it:g}) at each grid point
{bf:s}_{it:g} is estimated as follows: {p_end}
{space 24}^
{space 12}^{space 11}{it:l}({bf:s}_{it:g})
{marker eq2}{space 4}(2){space 5}{it:p}({bf:s}_{it:g}) = {hline 12}
{space 23}{it:G}{space 2}^
{space 22}SUM {it:l}({bf:s}_{it:j})
{space 22}{it:j}=1
{pstd} The estimates of {it:l}({bf:s}_{it:g}) and {it:p}({bf:s}_{it:g}),
along with several other auxiliary variables, are saved to Stata
dataset {help spkde##section05:{it:kernest}} and can be visualized
using the user-written Stata program {help spmap}. {p_end}
{marker section01}{title:Kernel function}
{pstd} As shown by {help spkde##eq1:Equation 1} above, for each grid point
{bf:s}_{it:g}, kernel estimators compute a weighted sum of the objects
making up the spatial point pattern of interest. The weight
{it:k}({it:d_ig} / {it:h_g}) applied to each object is a function of
the ratio between the object's distance from {bf:s}_{it:g} ({it:d_ig})
and the value of the kernel bandwidth {it:h_g}. In turn, the way
weights depend on {it:d_ig} and {it:h_g} is determined by the form of
the kernel function {it:k}(·) used in the analysis. {p_end}
{pstd} {cmd:spkde} allows to choose among six basic types of kernel
function: uniform, normal, negative exponential, quartic (the default),
triangular, and Epanechnikov. Moreover, through specification of
option {opt truncated(t)}, it is possible to use a truncated version
of the normal and negative exponential kernel functions.
{pstd} Let {it:z} = {it:d_ig} / {it:h_g} denote the argument of the kernel
function, and {it:t}>0 denote a truncation parameter specified with
option {opt truncated(t)}. Then, the kernel functions made available
by {cmd:spkde} can be described as follows: {p_end}
{col 9}{hline 71}
{col 9}Name {col 44}Formula
{col 9}{hline 71}
{col 9}Uniform {col 44}{it:k}({it:z}) = 1 {col 64}if {it:z} < 1
{col 44}{it:k}({it:z}) = 0 {col 64}otherwise
{col 9}Normal {col 44}{it:k}({it:z}) = exp(-{it:z}²/2)
{col 9}Truncated normal {col 44}{it:k}({it:z}) = exp(-{it:z}²/2){col 64}if {it:d_ig} < {it:h_g}·{it:t}
{col 44}{it:k}({it:z}) = 0 {col 64}otherwise
{col 9}Negative exponential {col 44}{it:k}({it:z}) = exp(-3{it:z})
{col 9}Truncated negative exponential{col 44}{it:k}({it:z}) = exp(-3{it:z}){col 64}if {it:d_ig} < {it:h_g}·{it:t}
{col 44}{it:k}({it:z}) = 0 {col 64}otherwise
{col 9}Quartic {col 44}{it:k}({it:z}) = (1-{it:z}²)² {col 64}if {it:z} < 1
{col 44}{it:k}({it:z}) = 0 {col 64}otherwise
{col 9}Triangular {col 44}{it:k}({it:z}) = (1-{it:z}) {col 64}if {it:z} < 1
{col 44}{it:k}({it:z}) = 0 {col 64}otherwise
{col 9}Epanechnikov {col 44}{it:k}({it:z}) = (1-{it:z}²) {col 64}if {it:z} < 1
{col 44}{it:k}({it:z}) = 0 {col 64}otherwise
{col 9}{hline 71}
{marker section02}{title:Kernel bandwidth}
{pstd} While the precise form of the kernel function has a moderate influence
on the estimates of {it:p}({bf:s}) and {it:l}({bf:s}), the kernel
bandwidth plays a major role in kernel estimation, since it determines
the degree of smoothing with which {it:p}({bf:s}) and {it:l}({bf:s})
are estimated: large bandwidths may result in oversmoothing, small
bandwidths may retain too much local deatail and exhibit spikes at
isolated event locations (Bailey and Gatrell 1995; Waller and Gotway
2004). {p_end}
{pstd} {cmd:spkde} allows to choose among three methods for setting the value
of the kernel bandwidth {it:h_g} at each grid point {bf:s}_{it:g}: fixed
bandwidth, minimum (weighted) number of data points, and mixed. {p_end}
{pstd} The {it:fixed bandwidth} method sets {it:h_g} = {it:h} at all grid
points, where {it:h} is a positive number expressed in the same unit
of measurement (miles, kilometers, meters, pixels, etc.) as the grid
and data points coordinates. {p_end}
{pstd} The {it:minimum (weighted) number of data points} method sets
{it:h_g} = {it:r_g}({it:ndp}), where {it:r_g}({it:ndp}) is the radius
of the smallest circle centered on {bf:s}_{it:g} that circumscribes at
least {it:ndp} (weighted) data points. The quantity {it:ndp} can
express either an unweighted or a weighted number of data points. In
the latter case, the weight associated with each data point must be
stored in variable {it:ndpwvar} and specified with option
{opt ndpw(ndpwvar)}. For discussion and illustration of this
adaptive method and its applications see, e.g., Bailey and Gatrell
(1995), Brundson (1995), Talbot {it:et al.} (2000). {p_end}
{pstd} The {it:mixed} method is a combination of the other two
methods. Expressly, it sets {it:h_g} = {it:h} if the circle centered
on {bf:s}_{it:g} and having radius {it:h} circumscribes at least
{it:ndp} (weighted) data points, and {it:h_g} = {it:r_g}({it:ndp})
otherwise. {p_end}
{marker section03}{title:Edge correction}
{pstd} For bounded and truncated {help spkde##section01:kernel functions}, the
nominal value of the area over which the kernel function is evaluated at
each grid point {bf:s}_{it:g} (see {help spkde##eq1:Equation 1} above)
equals the area of the kernel window, i.e., of the circle centered on
{bf:s}_{it:g} and having radius {it:h_g} (for bounded functions) or
{it:h_g·t} (for truncated functions, {it:t} being the truncation
parameter). Formally: {it:A_g} = 3.1416 · {it:h_g}² for bounded
functions, and {it:A_g}{space 1}= 3.1416 · ({it:h_g·t})² for truncated
functions. {p_end}
{pstd} For grid points located near the edges of the study region, a greater
or lesser portion of their kernel window may lie outside the study
region, so that the densities/intensities computed with the
standard formula at such grid points may result underestimated
(Bailey and Gatrell 1995). {p_end}
{pstd} Although an accurate assessment of the impact of edge effects and
the development of proper corrections remain an open area of
research (Waller and Gotway 2004), some adjustments are
possible. {cmd:spkde} implements an approximate edge correction
that consists in rescaling the area of the kernel window at each
grid point {bf:s}_{it:g} by a factor {it:ec_g} that approximately
equals the proportion of the kernel window lying within the study
region. After this correction, {it:A_g} = 3.1416 · {it:h_g}² · {it:ec_g}
for bounded functions, and {it:A_g} = 3.1416 · ({it:h_g·t})² · {it:ec_g}
for truncated functions. {p_end}
{pstd} In {cmd:spkde}, the approximate edge correction described above can
be applied to the estimation of {it:l}({bf:s}) and {it:p}({bf:s})
only when a bounded or truncated kernel function is used. This
includes the uniform, quartic, triangular, Epanechnicov, truncated
normal, and truncated negative exponential functions. {p_end}
{marker section04}{title:Input datasets}
{pstd} {cmd:spkde} makes use of two input datasets: {it:datapoints} and
{it:gridpoints}. {p_end}
{pstd} {it:datapoints} is the dataset that resides in memory when {cmd:spkde}
is run. It consists of {it:n} observations, one for each data point
making up the spatial point pattern of interest. At the minimum, it
must include {help spkde##main:{it:xvar}}, a numeric variable that
contains the x-coordinate of data points, and
{help spkde##main:{it:yvar}}, a numeric variable that contains the
y-coordinate of data points. When each data point “hosts” more than
one object of {it:J}>0 different types, then {cmd:spkde} must be run
specifying a {help spkde##syntax:{it:varlist}} made of {it:J} numeric
variables, each containing the number of objects of a given type
located at each data point; in this case, dataset {it:datapoints} must
also include the {it:J} variables specified in {it:varlist}. Finally,
dataset {it:datapoints} can optionally include
{help spkde##estpar:{it:ndpwvar}}, a numeric variable sometimes
used to set the value of the {help spkde##section02:kernel bandwidth}.
{p_end}
{pstd} {it:gridpoints} is the {help spkde##syntax:using} dataset invoked by
{cmd:spkde} at run time. It is generated by the user-written Stata
program {help spgrid} and contains the spatial grid used by {cmd:spkde}
to compute the kernel estimates of {it:p}({bf:s}_{it:g}) and
{it:l}({bf:s}_{it:g}). {p_end}
{marker section05}{title:Output dataset}
{pstd} {cmd:spkde} routinely generates {it:kernest}, a Stata dataset that
consists of {it:G} observations {hline 1} one for each grid point
{bf:s}_{it:g} {hline 1} and includes the following variables: {p_end}
{phang2}{space 1}o{space 2}{bf:spgrid_id} is a numeric variable that uniquely
identifies the cells making up the grid. {p_end}
{phang2}{space 1}o{space 2}{bf:spgrid_xcoord} is a numeric variable that
contains the x-coordinate of each grid point.
{p_end}
{phang2}{space 1}o{space 2}{bf:spgrid_ycoord} is a numeric variable that
contains the y-coordinate of each grid point.
{p_end}
{phang2}{space 1}o{space 2}{bf:bandwidth} is a numeric variable that
contains the values of {it:h_g} (see
{help spkde##eq1:above}). {p_end}
{phang2}{space 1}o{space 2}{bf:ndp} is a numeric variable that contains
the number of data points used for kernel
estimation at each grid point. {p_end}
{phang2}{space 1}o{space 2}{bf:A} is a numeric variable that contains
the values of {it:A_g} (see
{help spkde##eq1:above}). {p_end}
{pstd} If no {help spkde##syntax:{it:varlist}} is specified, three additional
variables are included in dataset {it:kernest}: {p_end}
{phang2}{space 1}o{space 2}{bf:c} is a numeric variable that contains the
value of {it:c} (see {help spkde##eq1:above}).
{p_end}
{phang2}{space 1}o{space 2}{bf:lambda} is a numeric variable that contains
the kernel estimate of {it:l}({bf:s}_{it:g})
(see {help spkde##eq1:above}). {p_end}
{phang2}{space 1}o{space 2}{bf:p} is a numeric variable that contains
the kernel estimate of {it:p}({bf:s}_{it:g})
(see {help spkde##eq2:above}). {p_end}
{pstd} On the other hand, if a {help spkde##syntax:{it:varlist}} is specified,
for each variable {it:varname} in {it:varlist} three additional
variables are included in dataset {it:kernest}: {p_end}
{phang2}{space 1}o{space 2}{it:varname}{bf:_c} is a numeric variable that
contains the value of {it:c} for objects of type
{it:varname}. {p_end}
{phang2}{space 1}o{space 2}{it:varname}{bf:_lambda} is a numeric variable
that contains the kernel estimate of
{it:l}({bf:s}_{it:g}) for objects of type
{it:varname}. {p_end}
{phang2}{space 1}o{space 2}{it:varname}{bf:_p} is a numeric variable
that contains the kernel estimate of
{it:p}({bf:s}_{it:g}) for objects of type
{it:varname}. {p_end}
{pstd} When option {opt ndpw(ndpwvar)} is specified, an additional
variable is included in dataset {it:kernest}: {p_end}
{phang2}{space 1}o{space 2}{bf:wndp} is a numeric variable that contains
the weighted number of data points used for kernel
estimation at each grid point. {p_end}
{pstd} Finally, when option {opt edgecorrection} is specified, an additional
variable is included in dataset {it:kernest}: {p_end}
{phang2}{space 1}o{space 2}{bf:edgecorrect} is a numeric variable that
contains the edge correction factor {it:ec_g}
(see {help spkde##section03:above}). {p_end}
{marker section06}{title:Visualization of kernel estimates}
{pstd} The kernel estimates of {it:l}({bf:s}_{it:g}) and {it:p}({bf:s}_{it:g})
generated by {cmd:spkde} can be visualized using the user-written Stata
program {help spmap}. {p_end}
{pstd} To this purpose, two datasets are needed: {p_end}
{phang2}{space 1}o{space 2}{it:kernest} is the output dataset routinely
generated by {cmd:spkde} (see section
{help spkde##section05:Output dataset}
above). {p_end}
{phang2}{space 1}o{space 2}{it:gridcells} is one of the two output datasets
generated by the user-written Stata program
{help spgrid##section04:spgrid}. {p_end}
{pstd} To visualize the kernel estimates of interest, {help spmap} must be run
with {it:kernest} as the {help spmap##sd_master:{it:master} dataset},
and {it:gridcells} as the {help spmap##sd_basemap:{it:basemap} dataset}.
{p_end}
{marker section07}{title:Alternative applications}
{pstd} Although {cmd:spkde} has been designed to carry out kernel estimation
of density and intensity functions for two-dimensional spatial point
patterns, it can be used also for estimating the joint probability
density function {it:p}({it:x},{it:y}) of any pair of quantitative
variables {it:X} and {it:Y} (see section
{help spkde##ex2:Examples 2 {hline 1} Alternative applications} below).
{p_end}
{marker options2}{title:Options}
{dlgtab:Main}
{phang}
{opt xcoord(xvar)} specifies the name of the variable containing the
x-coordinate of each data point {bf:s}_{it:i}. {p_end}
{phang}
{opt ycoord(yvar)} specifies the name of the variable containing the
y-coordinate of each data point {bf:s}_{it:i}. {p_end}
{phang}
{opt kernel(kf)} specifies the basic type of kernel function
to be used in the estimation of {it:p}({bf:s}) and
{it:l}({bf:s}) (for details, see section
{help spkde##section01:Kernel function} above). {p_end}
{phang2}{cmd:kernel(quartic)} is the default and requests that the
quartic kernel function be used. {p_end}
{phang2}{cmd:kernel(uniform)} requests that the uniform kernel function
be used. {p_end}
{phang2}{cmd:kernel(normal)} requests that the normal kernel function
be used. {p_end}
{phang2}{cmd:kernel(negexp)} requests that the negative exponential kernel
function be used. {p_end}
{phang2}{cmd:kernel(triangular)} requests that the triangular kernel function
be used. {p_end}
{phang2}{cmd:kernel(epanechnikov)} requests that the Epanechnikov kernel
function be used. {p_end}
{phang}
{opt truncated(t)} applies only when option {cmd:kernel(normal)} or
{cmd:kernel(negexp)} is specified. It requests that, for each grid
point {bf:s}_{it:g}, the selected kernel function be evaluated only
within a distance {it:h_g·t} from {bf:s}_{it:g}, where {it:t} is
a positive number. {p_end}
{phang}
{opt bandwidth(method)} specifies the method to be used for setting the value
of the kernel bandwidth {it:h_g} at each grid point {bf:s}_{it:g} (for
details, see section {help spkde##section02:Kernel bandwidth} above).
{p_end}
{phang2}{cmd:bandwidth(fbw)} requests that the {bf:f}ixed {bf:b}and{bf:w}idth
method be used. {p_end}
{phang2}{cmd:bandwidth(ndp)} requests that the minimum (weighted) {bf:n}umber
of {bf:d}ata {bf:p}oints method be used. {p_end}
{phang2}{cmd:bandwidth(mixed)} requests that the mixed method be used. {p_end}
{dlgtab:Estimation parameters}
{phang}
{cmd:fbw(ad}{it:q}|{it:b}{cmd:)} sets the value of {it:h} as
defined in section {help spkde##section02:Kernel bandwidth} above. {p_end}
{phang2}{cmd:fbw(ad}{it:q}{cmd:)} sets {it:h} = {it:adq}, where
{it:adq} equals the average distance between each data point
{bf:s}_{it:i} and its {it:q} nearest data points. {p_end}
{phang2}{opt fbw(b)} sets {it:h} = {it:b}. {p_end}
{phang}
{opt ndp(d)} sets the value of {it:ndp} as defined in section
{help spkde##section02:Kernel bandwidth} above, namely
{it:ndp} = {it:d}. {p_end}
{phang}
{opt ndpw(ndpwvar)} specifies that {it:ndp} denotes a weighted number of
data points and the weights are stored in variable {it:ndpwvar}.
{p_end}
{phang}
{cmd:edgecorrection} requests that an approximate edge correction
be applied to the estimation of {it:p}({bf:s}) and
{it:l}({bf:s}) (for details, see section
{help spkde##section03:Edge correction} above). {p_end}
{dlgtab:Reporting}
{phang}
{cmd:dots} requests that job progression dots be displayed. {p_end}
{phang}
{cmd:noverbose} requests that the display of every indicator of job
progression be suppressed. {p_end}
{dlgtab:Saving results}
{phang}
{cmdab:sav:ing(}{it:kernest} [{cmd:, replace}]{cmd:)}} requests that the
kernel estimates of {it:p}({bf:s}_{it:g}) and {it:l}({bf:s}_{it:g})
({it:g} = 1, ..., {it:G}), along with a set of auxiliary variables, be
saved to dataset {help spkde##section05:{it:kernest}}. If specified,
suboption {cmd:replace} requests that dataset
{help spkde##section05:{it:kernest}} be overwritten if already existing.
{p_end}
{marker ex1}{title:Examples 1 {hline 1} Standard applications}
{cmd}
. spgrid using "Italy-OutlineCoordinates.dta", ///
resolution(w10) unit(kilometers) ///
cells("GridCells.dta") ///
points("GridPoints.dta") ///
replace compress dots
. use "Italy-DataPoints.dta", clear
. spkde using "GridPoints.dta", ///
xcoord(xcoord) ycoord(ycoord) ///
bandwidth(fbw) fbw(100) dots ///
saving("Kde.dta", replace)
. use "Kde.dta", clear
. spmap lambda using "GridCells.dta", ///
id(spgrid_id) clnum(20) ///
fcolor(Rainbow) ocolor(none ..) ///
legend(off) ///
point(data("Italy-DataPoints.dta") ///
x(xcoord) y(ycoord))
. use "Italy-DataPoints.dta", clear
. spkde using "GridPoints.dta", ///
xcoord(xcoord) ycoord(ycoord) ///
bandwidth(fbw) fbw(100) dots ///
edgecorrect ///
saving("Kde.dta", replace)
. use "Kde.dta", clear
. spmap lambda using "GridCells.dta", ///
id(spgrid_id) clnum(20) ///
fcolor(Rainbow) ocolor(none ..) ///
legend(off) ///
point(data("Italy-DataPoints.dta") ///
x(xcoord) y(ycoord))
. use "Italy-DataPoints.dta", clear
. spkde using "GridPoints.dta", ///
xcoord(xcoord) ycoord(ycoord) ///
kernel(normal) ///
bandwidth(fbw) fbw(ad5) dots ///
saving("Kde.dta", replace)
. use "Kde.dta", clear
. spmap lambda using "GridCells.dta", ///
id(spgrid_id) clnum(20) ///
fcolor(Rainbow) ocolor(none ..) ///
legend(off) ///
point(data("Italy-DataPoints.dta") ///
x(xcoord) y(ycoord))
. use "Italy-DataPoints.dta", clear
. spkde using "GridPoints.dta", ///
xcoord(xcoord) ycoord(ycoord) ///
bandwidth(ndp) ndp(4) dots ///
saving("Kde.dta", replace)
. use "Kde.dta", clear
. spmap lambda using "GridCells.dta", ///
id(spgrid_id) clnum(20) ///
fcolor(Rainbow) ocolor(none ..) ///
legend(off) ///
point(data("Italy-DataPoints.dta") ///
x(xcoord) y(ycoord))
. use "Italy-DataPoints.dta", clear
. spkde dcvd95 pop95 using "GridPoints.dta", ///
xcoord(xcoord) ycoord(ycoord) ///
bandwidth(fbw) fbw(100) dots ///
saving("Kde.dta", replace)
. use "Kde.dta", clear
. generate ratio = dcvd95_lambda / pop95_lambda * 1000
. spmap ratio using "GridCells.dta", ///
id(spgrid_id) clnum(20) ///
fcolor(Rainbow) ocolor(none ..) ///
legend(off)
{txt}
{marker ex2}{title:Examples 2 {hline 1} Alternative applications}
{pstd} As mentioned {help spkde##section07:above}, {cmd:spkde} can be used
also for estimating the joint probability density function of any
pair of quantitative variables (for an alternative, see Stata
program {help kdens2}, written by Christopher F. Baum and available
from the Boston SSC Archive). {help spmap} can then be used to
generate the corresponding density plot. To this purpose, it is
advised to make use of {help mylabels}, a Stata program written
by Nicholas J. Cox and available from the Boston SSC Archive. {p_end}
{pstd} As an example, let us estimate and plot the bivariate probability
density function for two of the variables included in the {bf:auto}
dataset: {bf:mpg} and {bf:price}. This can be done in four steps as
follows: {p_end}
{pstd} 1. Normalize variables in the range [0,1] {p_end}
{cmd}
. sysuse "auto.dta", clear
. summarize price mpg
. clonevar x = mpg
. clonevar y = price
. replace x = (x-0) / (50-0)
. replace y = (y-0) / (20000-0)
. mylabels 0(10)50, myscale((@-0) / (50-0)) local(XLAB)
. mylabels 0(5000)20000, myscale((@-0) / (20000-0)) local(YLAB)
. keep x y
. save "xy.dta", replace
{txt}
{pstd} 2. Generate a 100x100 grid {p_end}
{cmd}
. spgrid, shape(hexagonal) xdim(100) ///
xrange(0 1) yrange(0 1) ///
dots replace ///
cells("2D-GridCells.dta") ///
points("2D-GridPoints.dta")
{txt}
{pstd} 3. Estimate the bivariate probability density function {p_end}
{cmd}
. spkde using "2D-GridPoints.dta", ///
xcoord(x) ycoord(y) ///
bandwidth(fbw) fbw(0.1) dots ///
saving("2D-Kde.dta", replace)
{txt}
{pstd} 4. Display the density plot {p_end}
{cmd}
. use "2D-Kde.dta", clear
. recode lambda (.=0)
. spmap lambda using "2D-GridCells.dta", ///
id(spgrid_id) clnum(20) fcolor(Rainbow) ///
ocolor(none ..) legend(off) ///
point(data("xy.dta") x(x) y(y)) ///
freestyle aspectratio(1) ///
xtitle(" " "Mileage (mpg)") ///
xlab(`XLAB') ///
ytitle("Price" " ") ///
ylab(`YLAB', angle(0))
{txt}
{title:Acknowledgments}
{p 4 4 2} I wish to thank Nick Cox for helpful suggestions.
{title:Author}
{p 4} Maurizio Pisati {p_end}
{p 4} Department of Sociology and Social Research {p_end}
{p 4} University of Milano Bicocca - Italy {p_end}
{p 4} {browse "mailto:maurizio.pisati@unimib.it":maurizio.pisati@unimib.it}
{title:References}
{p 4 8 2}Bailey, T.C. and A.C. Gatrell. 1995. {it:Interactive Spatial Data}
{it:Analysis}. Harlow: Longman. {p_end}
{p 4 8 2}Brundson, C. 1995. Estimating Probability Surfaces for Geographical
Point Data: An Adaptive Kernel
Algorithm.{it: Computers & Geosciences} 21: 877{c -}894. {p_end}
{p 4 8 2}Talbot, T.O., Kulldorff, M., Forand, S.P. and
V.B. Haley. 2007. Evaluation of Spatial Filters To Create Smoothed
Maps of Health Data. {it:Statistics in Medicine} 19: 2399{c -}2408.
{p_end}
{p 4 8 2}Waller, L.A. and C.A. Gotway. 2004. {it:Applied Spatial Statistics}
{it:for Public Health Data}. Hoboken NJ: Wiley. {p_end}
{title:Also see}
{psee}
Online: {helpb spgrid} (if installed), {helpb spmap} (if installed),
{helpb mylabels} (if installed)
{p_end}