{smcl} {* 3feb2009}{...} {cmd:help spkde}{right:Version 1.0.0} {hline} {title:Title} {p 4 13 2} {hi:spkde} {hline 2} Kernel estimation of density and intensity functions for two-dimensional spatial point patterns {p_end} {marker syntax}{title:Syntax} {p 8 15 2} {cmd:spkde} [{varlist}] {ifin} {help using} {help spkde##section04:{it:gridpoints}} [{cmd:,} {help spkde##options1:{it:options}}] {synoptset 35 tabbed}{...} {marker options1}{synopthdr:{help spkde##options2:options}{col 41}} {synoptline} {marker main}{syntab: Main} {p2coldent :* {opt x:coord(xvar)}}variable containing the x-coordinate of data points {p_end} {p2coldent :* {opt y:coord(yvar)}}variable containing the y-coordinate of data points {p_end} {synopt :{opt k:ernel(kf)}}kernel function, where {it:kf} is one of the following: {cmdab:qu:artic} (default), {cmdab:un:iform}, {cmdab:no:rmal}, {cmdab:ne:gexp}, {cmdab:tr:iangular}, {cmdab:ep:anechnikov} {p_end} {synopt :{opt trunc:ated(t)}}truncation parameter for kernel functions {cmd:normal} and {cmd:negexp}, where {it:t} is a positive number {p_end} {p2coldent :* {opt b:andwidth(method)}}method for setting the value of the kernel bandwidth, where {it:method} is one of the following: {cmdab:fbw}, {cmdab:ndp}, {cmdab:mix:ed} {p_end} {marker estpar}{syntab: Estimation parameters} {synopt :{cmdab:fbw(ad}{it:q}|{it:b}{cmd:)}}fixed kernel bandwidth, where {it:q} is positive integer and {it:b} is a positive number {p_end} {synopt :{opt ndp(d)}}minimum (weighted) number of data points to be used for kernel estimation at each grid point, where {it:d} is a positive number {p_end} {synopt :{opt ndpw(ndpwvar)}}weight data points by variable {it:ndpwvar} when searching for the minimum number of data points to be used for kernel estimation {p_end} {synopt :{opt edge:correction}}apply approximate edge correction {p_end} {syntab: Reporting} {synopt :{opt d:ots}}display job progression dots {p_end} {synopt :{opt noverb:ose}}suppress display of job progression {p_end} {syntab: Saving results} {p2coldent :* {cmdab:sav:ing(}{help spkde##section05:{it:kernest}} [{cmd:, replace}]{cmd:)}}save results to Stata dataset {it:kernest} {p_end} {synoptline} {p 4 6 2}* Required option {p_end} {marker desc}{title:Description} {pstd} {cmd:spkde} implements a variety of kernel estimators of both the probability density function and the intensity function of two-dimensional spatial point patterns. {p_end} {pstd} A two-dimensional spatial point pattern {bf:S} can be defined as a set of {it:data points} {bf:s}_{it:i} ({it:i} = 1, ..., {it:n}) located in a two-dimensional study region {it:R} at coordinates ({it:s_i}1, {it:s_i}2). Each data point {bf:s}_{it:i} represents the location in {it:R} of one or more “objects” of some kind: people, events, sites, buildings, plants, cases of a disease, etc. {p_end} {pstd} In the analysis of spatial point patterns we are often interested in determining whether the distribution of the objects of interest within {it:R} exhibits some form of clustering, as opposed to being random. To explore the possibility of clustering, it may be useful to describe the spatial point pattern of interest by means of its {it:probability density function} {it:p}({bf:s}) and/or its {it:intensity function} {it:l}({bf:s}). {p_end} {pstd} The probability density function {it:p}({bf:s}) defines the probability of observing an object at location {bf:s} in {it:R}. In turn, the intensity function {it:l}({bf:s}) defines the expected number of objects per unit area at location {bf:s} in {it:R}. Therefore, {it:p}({bf:s}) and {it:l}({bf:s}) differ only by a constant of proportionality (Bailey and Gatrell 1995; Waller and Gotway 2004). {p_end} {pstd} Both the probability density function {it:p}({bf:s}) and the intensity function {it:l}({bf:s}) of a given two-dimensional spatial point pattern can be easily estimated by means of nonparametric estimators, e.g., kernel estimators. {it:Kernel estimators} are used to generate a spatially smooth estimate of {it:p}({bf:s}) and/or {it:l}({bf:s}) at a fine grid of points {bf:s}_{it:g} ({it:g} = 1, ..., {it:G}) covering the study region {it:R} (Bailey and Gatrell 1995; Waller and Gotway 2004). {p_end} {pstd} {cmd:spkde} computes kernel estimates of both {it:p}({bf:s}) and {it:l}({bf:s}) at a grid of points generated by the user-written Stata program {help spgrid}. Expressly, for each grid point {bf:s}_{it:g}, {cmd:spkde} computes first the kernel estimate of the intensity {it:l}({bf:s}_{it:g}), and then the kernel estimate of the density {it:p}({bf:s}_{it:g}). {p_end} {marker eq1}{pstd} The intensity {it:l}({bf:s}_{it:g}) at each grid point {bf:s}_{it:g} is estimated as follows: {p_end} {space 12}^{space 10}{it:c}{space 6}{it:n}{space 5} {space 4}(1){space 5}{it:l}({bf:s}_{it:g}) = {hline 5} · SUM {it:k}({it:d_ig} / {it:h_g}) · {it:y_i} {space 22}{it:A_g}{space 4}{it:i}=1 {pstd} where {it:k}(·) is the kernel function {hline 1} usually a unimodal symmetrical bivariate probability density function; {it:d_ig} is the Euclidean distance between data point {bf:s}_{it:i} and grid point {bf:s}_{it:g}; {it:h_g} is the kernel bandwidth {hline 1} i.e., the radius of the kernel function {hline 1} at grid point {bf:s}_{it:g}; {it:y_i} is the number of objects located at data point {bf:s}_{it:i}; {it:A_g} is the area of the subregion of {it:R} over which the kernel function is evaluated at grid point {bf:s}_{it:g}, possibly corrected for edge effects; and {it:c} is a constant of proportionality. {p_end} {pstd} In turn, the density {it:p}({bf:s}_{it:g}) at each grid point {bf:s}_{it:g} is estimated as follows: {p_end} {space 24}^ {space 12}^{space 11}{it:l}({bf:s}_{it:g}) {marker eq2}{space 4}(2){space 5}{it:p}({bf:s}_{it:g}) = {hline 12} {space 23}{it:G}{space 2}^ {space 22}SUM {it:l}({bf:s}_{it:j}) {space 22}{it:j}=1 {pstd} The estimates of {it:l}({bf:s}_{it:g}) and {it:p}({bf:s}_{it:g}), along with several other auxiliary variables, are saved to Stata dataset {help spkde##section05:{it:kernest}} and can be visualized using the user-written Stata program {help spmap}. {p_end} {marker section01}{title:Kernel function} {pstd} As shown by {help spkde##eq1:Equation 1} above, for each grid point {bf:s}_{it:g}, kernel estimators compute a weighted sum of the objects making up the spatial point pattern of interest. The weight {it:k}({it:d_ig} / {it:h_g}) applied to each object is a function of the ratio between the object's distance from {bf:s}_{it:g} ({it:d_ig}) and the value of the kernel bandwidth {it:h_g}. In turn, the way weights depend on {it:d_ig} and {it:h_g} is determined by the form of the kernel function {it:k}(·) used in the analysis. {p_end} {pstd} {cmd:spkde} allows to choose among six basic types of kernel function: uniform, normal, negative exponential, quartic (the default), triangular, and Epanechnikov. Moreover, through specification of option {opt truncated(t)}, it is possible to use a truncated version of the normal and negative exponential kernel functions. {pstd} Let {it:z} = {it:d_ig} / {it:h_g} denote the argument of the kernel function, and {it:t}>0 denote a truncation parameter specified with option {opt truncated(t)}. Then, the kernel functions made available by {cmd:spkde} can be described as follows: {p_end} {col 9}{hline 71} {col 9}Name {col 44}Formula {col 9}{hline 71} {col 9}Uniform {col 44}{it:k}({it:z}) = 1 {col 64}if {it:z} < 1 {col 44}{it:k}({it:z}) = 0 {col 64}otherwise {col 9}Normal {col 44}{it:k}({it:z}) = exp(-{it:z}²/2) {col 9}Truncated normal {col 44}{it:k}({it:z}) = exp(-{it:z}²/2){col 64}if {it:d_ig} < {it:h_g}·{it:t} {col 44}{it:k}({it:z}) = 0 {col 64}otherwise {col 9}Negative exponential {col 44}{it:k}({it:z}) = exp(-3{it:z}) {col 9}Truncated negative exponential{col 44}{it:k}({it:z}) = exp(-3{it:z}){col 64}if {it:d_ig} < {it:h_g}·{it:t} {col 44}{it:k}({it:z}) = 0 {col 64}otherwise {col 9}Quartic {col 44}{it:k}({it:z}) = (1-{it:z}²)² {col 64}if {it:z} < 1 {col 44}{it:k}({it:z}) = 0 {col 64}otherwise {col 9}Triangular {col 44}{it:k}({it:z}) = (1-{it:z}) {col 64}if {it:z} < 1 {col 44}{it:k}({it:z}) = 0 {col 64}otherwise {col 9}Epanechnikov {col 44}{it:k}({it:z}) = (1-{it:z}²) {col 64}if {it:z} < 1 {col 44}{it:k}({it:z}) = 0 {col 64}otherwise {col 9}{hline 71} {marker section02}{title:Kernel bandwidth} {pstd} While the precise form of the kernel function has a moderate influence on the estimates of {it:p}({bf:s}) and {it:l}({bf:s}), the kernel bandwidth plays a major role in kernel estimation, since it determines the degree of smoothing with which {it:p}({bf:s}) and {it:l}({bf:s}) are estimated: large bandwidths may result in oversmoothing, small bandwidths may retain too much local deatail and exhibit spikes at isolated event locations (Bailey and Gatrell 1995; Waller and Gotway 2004). {p_end} {pstd} {cmd:spkde} allows to choose among three methods for setting the value of the kernel bandwidth {it:h_g} at each grid point {bf:s}_{it:g}: fixed bandwidth, minimum (weighted) number of data points, and mixed. {p_end} {pstd} The {it:fixed bandwidth} method sets {it:h_g} = {it:h} at all grid points, where {it:h} is a positive number expressed in the same unit of measurement (miles, kilometers, meters, pixels, etc.) as the grid and data points coordinates. {p_end} {pstd} The {it:minimum (weighted) number of data points} method sets {it:h_g} = {it:r_g}({it:ndp}), where {it:r_g}({it:ndp}) is the radius of the smallest circle centered on {bf:s}_{it:g} that circumscribes at least {it:ndp} (weighted) data points. The quantity {it:ndp} can express either an unweighted or a weighted number of data points. In the latter case, the weight associated with each data point must be stored in variable {it:ndpwvar} and specified with option {opt ndpw(ndpwvar)}. For discussion and illustration of this adaptive method and its applications see, e.g., Bailey and Gatrell (1995), Brundson (1995), Talbot {it:et al.} (2000). {p_end} {pstd} The {it:mixed} method is a combination of the other two methods. Expressly, it sets {it:h_g} = {it:h} if the circle centered on {bf:s}_{it:g} and having radius {it:h} circumscribes at least {it:ndp} (weighted) data points, and {it:h_g} = {it:r_g}({it:ndp}) otherwise. {p_end} {marker section03}{title:Edge correction} {pstd} For bounded and truncated {help spkde##section01:kernel functions}, the nominal value of the area over which the kernel function is evaluated at each grid point {bf:s}_{it:g} (see {help spkde##eq1:Equation 1} above) equals the area of the kernel window, i.e., of the circle centered on {bf:s}_{it:g} and having radius {it:h_g} (for bounded functions) or {it:h_g·t} (for truncated functions, {it:t} being the truncation parameter). Formally: {it:A_g} = 3.1416 · {it:h_g}² for bounded functions, and {it:A_g}{space 1}= 3.1416 · ({it:h_g·t})² for truncated functions. {p_end} {pstd} For grid points located near the edges of the study region, a greater or lesser portion of their kernel window may lie outside the study region, so that the densities/intensities computed with the standard formula at such grid points may result underestimated (Bailey and Gatrell 1995). {p_end} {pstd} Although an accurate assessment of the impact of edge effects and the development of proper corrections remain an open area of research (Waller and Gotway 2004), some adjustments are possible. {cmd:spkde} implements an approximate edge correction that consists in rescaling the area of the kernel window at each grid point {bf:s}_{it:g} by a factor {it:ec_g} that approximately equals the proportion of the kernel window lying within the study region. After this correction, {it:A_g} = 3.1416 · {it:h_g}² · {it:ec_g} for bounded functions, and {it:A_g} = 3.1416 · ({it:h_g·t})² · {it:ec_g} for truncated functions. {p_end} {pstd} In {cmd:spkde}, the approximate edge correction described above can be applied to the estimation of {it:l}({bf:s}) and {it:p}({bf:s}) only when a bounded or truncated kernel function is used. This includes the uniform, quartic, triangular, Epanechnicov, truncated normal, and truncated negative exponential functions. {p_end} {marker section04}{title:Input datasets} {pstd} {cmd:spkde} makes use of two input datasets: {it:datapoints} and {it:gridpoints}. {p_end} {pstd} {it:datapoints} is the dataset that resides in memory when {cmd:spkde} is run. It consists of {it:n} observations, one for each data point making up the spatial point pattern of interest. At the minimum, it must include {help spkde##main:{it:xvar}}, a numeric variable that contains the x-coordinate of data points, and {help spkde##main:{it:yvar}}, a numeric variable that contains the y-coordinate of data points. When each data point “hosts” more than one object of {it:J}>0 different types, then {cmd:spkde} must be run specifying a {help spkde##syntax:{it:varlist}} made of {it:J} numeric variables, each containing the number of objects of a given type located at each data point; in this case, dataset {it:datapoints} must also include the {it:J} variables specified in {it:varlist}. Finally, dataset {it:datapoints} can optionally include {help spkde##estpar:{it:ndpwvar}}, a numeric variable sometimes used to set the value of the {help spkde##section02:kernel bandwidth}. {p_end} {pstd} {it:gridpoints} is the {help spkde##syntax:using} dataset invoked by {cmd:spkde} at run time. It is generated by the user-written Stata program {help spgrid} and contains the spatial grid used by {cmd:spkde} to compute the kernel estimates of {it:p}({bf:s}_{it:g}) and {it:l}({bf:s}_{it:g}). {p_end} {marker section05}{title:Output dataset} {pstd} {cmd:spkde} routinely generates {it:kernest}, a Stata dataset that consists of {it:G} observations {hline 1} one for each grid point {bf:s}_{it:g} {hline 1} and includes the following variables: {p_end} {phang2}{space 1}o{space 2}{bf:spgrid_id} is a numeric variable that uniquely identifies the cells making up the grid. {p_end} {phang2}{space 1}o{space 2}{bf:spgrid_xcoord} is a numeric variable that contains the x-coordinate of each grid point. {p_end} {phang2}{space 1}o{space 2}{bf:spgrid_ycoord} is a numeric variable that contains the y-coordinate of each grid point. {p_end} {phang2}{space 1}o{space 2}{bf:bandwidth} is a numeric variable that contains the values of {it:h_g} (see {help spkde##eq1:above}). {p_end} {phang2}{space 1}o{space 2}{bf:ndp} is a numeric variable that contains the number of data points used for kernel estimation at each grid point. {p_end} {phang2}{space 1}o{space 2}{bf:A} is a numeric variable that contains the values of {it:A_g} (see {help spkde##eq1:above}). {p_end} {pstd} If no {help spkde##syntax:{it:varlist}} is specified, three additional variables are included in dataset {it:kernest}: {p_end} {phang2}{space 1}o{space 2}{bf:c} is a numeric variable that contains the value of {it:c} (see {help spkde##eq1:above}). {p_end} {phang2}{space 1}o{space 2}{bf:lambda} is a numeric variable that contains the kernel estimate of {it:l}({bf:s}_{it:g}) (see {help spkde##eq1:above}). {p_end} {phang2}{space 1}o{space 2}{bf:p} is a numeric variable that contains the kernel estimate of {it:p}({bf:s}_{it:g}) (see {help spkde##eq2:above}). {p_end} {pstd} On the other hand, if a {help spkde##syntax:{it:varlist}} is specified, for each variable {it:varname} in {it:varlist} three additional variables are included in dataset {it:kernest}: {p_end} {phang2}{space 1}o{space 2}{it:varname}{bf:_c} is a numeric variable that contains the value of {it:c} for objects of type {it:varname}. {p_end} {phang2}{space 1}o{space 2}{it:varname}{bf:_lambda} is a numeric variable that contains the kernel estimate of {it:l}({bf:s}_{it:g}) for objects of type {it:varname}. {p_end} {phang2}{space 1}o{space 2}{it:varname}{bf:_p} is a numeric variable that contains the kernel estimate of {it:p}({bf:s}_{it:g}) for objects of type {it:varname}. {p_end} {pstd} When option {opt ndpw(ndpwvar)} is specified, an additional variable is included in dataset {it:kernest}: {p_end} {phang2}{space 1}o{space 2}{bf:wndp} is a numeric variable that contains the weighted number of data points used for kernel estimation at each grid point. {p_end} {pstd} Finally, when option {opt edgecorrection} is specified, an additional variable is included in dataset {it:kernest}: {p_end} {phang2}{space 1}o{space 2}{bf:edgecorrect} is a numeric variable that contains the edge correction factor {it:ec_g} (see {help spkde##section03:above}). {p_end} {marker section06}{title:Visualization of kernel estimates} {pstd} The kernel estimates of {it:l}({bf:s}_{it:g}) and {it:p}({bf:s}_{it:g}) generated by {cmd:spkde} can be visualized using the user-written Stata program {help spmap}. {p_end} {pstd} To this purpose, two datasets are needed: {p_end} {phang2}{space 1}o{space 2}{it:kernest} is the output dataset routinely generated by {cmd:spkde} (see section {help spkde##section05:Output dataset} above). {p_end} {phang2}{space 1}o{space 2}{it:gridcells} is one of the two output datasets generated by the user-written Stata program {help spgrid##section04:spgrid}. {p_end} {pstd} To visualize the kernel estimates of interest, {help spmap} must be run with {it:kernest} as the {help spmap##sd_master:{it:master} dataset}, and {it:gridcells} as the {help spmap##sd_basemap:{it:basemap} dataset}. {p_end} {marker section07}{title:Alternative applications} {pstd} Although {cmd:spkde} has been designed to carry out kernel estimation of density and intensity functions for two-dimensional spatial point patterns, it can be used also for estimating the joint probability density function {it:p}({it:x},{it:y}) of any pair of quantitative variables {it:X} and {it:Y} (see section {help spkde##ex2:Examples 2 {hline 1} Alternative applications} below). {p_end} {marker options2}{title:Options} {dlgtab:Main} {phang} {opt xcoord(xvar)} specifies the name of the variable containing the x-coordinate of each data point {bf:s}_{it:i}. {p_end} {phang} {opt ycoord(yvar)} specifies the name of the variable containing the y-coordinate of each data point {bf:s}_{it:i}. {p_end} {phang} {opt kernel(kf)} specifies the basic type of kernel function to be used in the estimation of {it:p}({bf:s}) and {it:l}({bf:s}) (for details, see section {help spkde##section01:Kernel function} above). {p_end} {phang2}{cmd:kernel(quartic)} is the default and requests that the quartic kernel function be used. {p_end} {phang2}{cmd:kernel(uniform)} requests that the uniform kernel function be used. {p_end} {phang2}{cmd:kernel(normal)} requests that the normal kernel function be used. {p_end} {phang2}{cmd:kernel(negexp)} requests that the negative exponential kernel function be used. {p_end} {phang2}{cmd:kernel(triangular)} requests that the triangular kernel function be used. {p_end} {phang2}{cmd:kernel(epanechnikov)} requests that the Epanechnikov kernel function be used. {p_end} {phang} {opt truncated(t)} applies only when option {cmd:kernel(normal)} or {cmd:kernel(negexp)} is specified. It requests that, for each grid point {bf:s}_{it:g}, the selected kernel function be evaluated only within a distance {it:h_g·t} from {bf:s}_{it:g}, where {it:t} is a positive number. {p_end} {phang} {opt bandwidth(method)} specifies the method to be used for setting the value of the kernel bandwidth {it:h_g} at each grid point {bf:s}_{it:g} (for details, see section {help spkde##section02:Kernel bandwidth} above). {p_end} {phang2}{cmd:bandwidth(fbw)} requests that the {bf:f}ixed {bf:b}and{bf:w}idth method be used. {p_end} {phang2}{cmd:bandwidth(ndp)} requests that the minimum (weighted) {bf:n}umber of {bf:d}ata {bf:p}oints method be used. {p_end} {phang2}{cmd:bandwidth(mixed)} requests that the mixed method be used. {p_end} {dlgtab:Estimation parameters} {phang} {cmd:fbw(ad}{it:q}|{it:b}{cmd:)} sets the value of {it:h} as defined in section {help spkde##section02:Kernel bandwidth} above. {p_end} {phang2}{cmd:fbw(ad}{it:q}{cmd:)} sets {it:h} = {it:adq}, where {it:adq} equals the average distance between each data point {bf:s}_{it:i} and its {it:q} nearest data points. {p_end} {phang2}{opt fbw(b)} sets {it:h} = {it:b}. {p_end} {phang} {opt ndp(d)} sets the value of {it:ndp} as defined in section {help spkde##section02:Kernel bandwidth} above, namely {it:ndp} = {it:d}. {p_end} {phang} {opt ndpw(ndpwvar)} specifies that {it:ndp} denotes a weighted number of data points and the weights are stored in variable {it:ndpwvar}. {p_end} {phang} {cmd:edgecorrection} requests that an approximate edge correction be applied to the estimation of {it:p}({bf:s}) and {it:l}({bf:s}) (for details, see section {help spkde##section03:Edge correction} above). {p_end} {dlgtab:Reporting} {phang} {cmd:dots} requests that job progression dots be displayed. {p_end} {phang} {cmd:noverbose} requests that the display of every indicator of job progression be suppressed. {p_end} {dlgtab:Saving results} {phang} {cmdab:sav:ing(}{it:kernest} [{cmd:, replace}]{cmd:)}} requests that the kernel estimates of {it:p}({bf:s}_{it:g}) and {it:l}({bf:s}_{it:g}) ({it:g} = 1, ..., {it:G}), along with a set of auxiliary variables, be saved to dataset {help spkde##section05:{it:kernest}}. If specified, suboption {cmd:replace} requests that dataset {help spkde##section05:{it:kernest}} be overwritten if already existing. {p_end} {marker ex1}{title:Examples 1 {hline 1} Standard applications} {cmd} . spgrid using "Italy-OutlineCoordinates.dta", /// resolution(w10) unit(kilometers) /// cells("GridCells.dta") /// points("GridPoints.dta") /// replace compress dots . use "Italy-DataPoints.dta", clear . spkde using "GridPoints.dta", /// xcoord(xcoord) ycoord(ycoord) /// bandwidth(fbw) fbw(100) dots /// saving("Kde.dta", replace) . use "Kde.dta", clear . spmap lambda using "GridCells.dta", /// id(spgrid_id) clnum(20) /// fcolor(Rainbow) ocolor(none ..) /// legend(off) /// point(data("Italy-DataPoints.dta") /// x(xcoord) y(ycoord)) . use "Italy-DataPoints.dta", clear . spkde using "GridPoints.dta", /// xcoord(xcoord) ycoord(ycoord) /// bandwidth(fbw) fbw(100) dots /// edgecorrect /// saving("Kde.dta", replace) . use "Kde.dta", clear . spmap lambda using "GridCells.dta", /// id(spgrid_id) clnum(20) /// fcolor(Rainbow) ocolor(none ..) /// legend(off) /// point(data("Italy-DataPoints.dta") /// x(xcoord) y(ycoord)) . use "Italy-DataPoints.dta", clear . spkde using "GridPoints.dta", /// xcoord(xcoord) ycoord(ycoord) /// kernel(normal) /// bandwidth(fbw) fbw(ad5) dots /// saving("Kde.dta", replace) . use "Kde.dta", clear . spmap lambda using "GridCells.dta", /// id(spgrid_id) clnum(20) /// fcolor(Rainbow) ocolor(none ..) /// legend(off) /// point(data("Italy-DataPoints.dta") /// x(xcoord) y(ycoord)) . use "Italy-DataPoints.dta", clear . spkde using "GridPoints.dta", /// xcoord(xcoord) ycoord(ycoord) /// bandwidth(ndp) ndp(4) dots /// saving("Kde.dta", replace) . use "Kde.dta", clear . spmap lambda using "GridCells.dta", /// id(spgrid_id) clnum(20) /// fcolor(Rainbow) ocolor(none ..) /// legend(off) /// point(data("Italy-DataPoints.dta") /// x(xcoord) y(ycoord)) . use "Italy-DataPoints.dta", clear . spkde dcvd95 pop95 using "GridPoints.dta", /// xcoord(xcoord) ycoord(ycoord) /// bandwidth(fbw) fbw(100) dots /// saving("Kde.dta", replace) . use "Kde.dta", clear . generate ratio = dcvd95_lambda / pop95_lambda * 1000 . spmap ratio using "GridCells.dta", /// id(spgrid_id) clnum(20) /// fcolor(Rainbow) ocolor(none ..) /// legend(off) {txt} {marker ex2}{title:Examples 2 {hline 1} Alternative applications} {pstd} As mentioned {help spkde##section07:above}, {cmd:spkde} can be used also for estimating the joint probability density function of any pair of quantitative variables (for an alternative, see Stata program {help kdens2}, written by Christopher F. Baum and available from the Boston SSC Archive). {help spmap} can then be used to generate the corresponding density plot. To this purpose, it is advised to make use of {help mylabels}, a Stata program written by Nicholas J. Cox and available from the Boston SSC Archive. {p_end} {pstd} As an example, let us estimate and plot the bivariate probability density function for two of the variables included in the {bf:auto} dataset: {bf:mpg} and {bf:price}. This can be done in four steps as follows: {p_end} {pstd} 1. Normalize variables in the range [0,1] {p_end} {cmd} . sysuse "auto.dta", clear . summarize price mpg . clonevar x = mpg . clonevar y = price . replace x = (x-0) / (50-0) . replace y = (y-0) / (20000-0) . mylabels 0(10)50, myscale((@-0) / (50-0)) local(XLAB) . mylabels 0(5000)20000, myscale((@-0) / (20000-0)) local(YLAB) . keep x y . save "xy.dta", replace {txt} {pstd} 2. Generate a 100x100 grid {p_end} {cmd} . spgrid, shape(hexagonal) xdim(100) /// xrange(0 1) yrange(0 1) /// dots replace /// cells("2D-GridCells.dta") /// points("2D-GridPoints.dta") {txt} {pstd} 3. Estimate the bivariate probability density function {p_end} {cmd} . spkde using "2D-GridPoints.dta", /// xcoord(x) ycoord(y) /// bandwidth(fbw) fbw(0.1) dots /// saving("2D-Kde.dta", replace) {txt} {pstd} 4. Display the density plot {p_end} {cmd} . use "2D-Kde.dta", clear . recode lambda (.=0) . spmap lambda using "2D-GridCells.dta", /// id(spgrid_id) clnum(20) fcolor(Rainbow) /// ocolor(none ..) legend(off) /// point(data("xy.dta") x(x) y(y)) /// freestyle aspectratio(1) /// xtitle(" " "Mileage (mpg)") /// xlab(`XLAB') /// ytitle("Price" " ") /// ylab(`YLAB', angle(0)) {txt} {title:Acknowledgments} {p 4 4 2} I wish to thank Nick Cox for helpful suggestions. {title:Author} {p 4} Maurizio Pisati {p_end} {p 4} Department of Sociology and Social Research {p_end} {p 4} University of Milano Bicocca - Italy {p_end} {p 4} {browse "mailto:maurizio.pisati@unimib.it":maurizio.pisati@unimib.it} {title:References} {p 4 8 2}Bailey, T.C. and A.C. Gatrell. 1995. {it:Interactive Spatial Data} {it:Analysis}. Harlow: Longman. {p_end} {p 4 8 2}Brundson, C. 1995. Estimating Probability Surfaces for Geographical Point Data: An Adaptive Kernel Algorithm.{it: Computers & Geosciences} 21: 877{c -}894. {p_end} {p 4 8 2}Talbot, T.O., Kulldorff, M., Forand, S.P. and V.B. Haley. 2007. Evaluation of Spatial Filters To Create Smoothed Maps of Health Data. {it:Statistics in Medicine} 19: 2399{c -}2408. {p_end} {p 4 8 2}Waller, L.A. and C.A. Gotway. 2004. {it:Applied Spatial Statistics} {it:for Public Health Data}. Hoboken NJ: Wiley. {p_end} {title:Also see} {psee} Online: {helpb spgrid} (if installed), {helpb spmap} (if installed), {helpb mylabels} (if installed) {p_end}