help mstat , mtest -------------------------------------------------------------------------------

Title

M statistics -- 2 samples M statistics for spatial distribution analysis

Syntax

2 samples M statistics

mstat , x(varname) y(varname) g(varname) [options graphic_options]

2 samples M statistics and Monte Carlo test

mtest , x(varname) y(varname) g(varname) [options graphic_options] Options for mstat options Description ------------------------------------------------------------------------- Main * x(varname) x-coordinates * y(varname) y-coordinates * g(varname) group dummy variable bins(#) choose the number of bins to be used chi2 returns the upper tail pvalue of the asymptotic chi2 distribution scatter plot the spatial distribution of two groups density plot the Kernel density for the interpoint distances graphic_options manipulate graphic output

------------------------------------------------------------------------- * Required option

Options for mtest options Description ------------------------------------------------------------------------- Main * x(varname) x-coordinates * y(varname) y-coordinates * g(varname) group dummy variable bins(#) choose the number of bins to be used iter(#) choose the number of Monte Carlo permutation in the test level(#) choose the level for the p-value confidence interval to be reported scatter plot the spatial distribution of two groups density plot the Kernel density for the interpoint distances graphic_options manipulate graphic output ------------------------------------------------------------------------- * Required option

Description

The dataset is required to have the following structure:

x-coord y-coord group -------------------------------------------------- 23.4 45.8 0 26.4 41.3 1 71.3 39.2 1 55.0 42.8 0 ... ... ... --------------------------------------------------

mstat Computes the two samples observed M statistic using k bins (via option bins(k)) > .

mtest Computes the two samples observed M statistic using k bins (via option bins(k)) > and executes a Monte Carlo test permuting the group variable (option g(varname)) n times (via option iter(n). I > t returns the upper tail p-value=#(M>=M0)/n and a corresponding l% confidence interval (via option level > (l)).

Required options for mstat and mtest

x(x-coord) variable containing the x-coordinate of the location y(y-coord) variable containing the y-coordinate of the location g(group) binary variable (0-1) indicating the group

Options for mstat

bins(#) selects the number of bins. The number must be a positive integer, not > larger than the number of distances in the dataset, that is (number of observations choose 2). The default is bins(20). For the theoretical implications refer to Forsberg, L., Bonetti, M. and Pagano, > M. 2009.

chi2 displays and returns the pvalue for the asymptotic chi2 test (upper tail).

scatter generates scatter plot of the two groups.

density generates Kernel density of the interpoint distance distribution for th > e two groups.

Options for mtest

bins(#) selects the number of bins. The number must be a positive integer, not > larger than the number of distances in the dataset, that is (number of observations choose 2). The default is bins(20). For the theoretical implications refer to White, F. L., Bonetti, M. and Pagano, > M. 2009.

iter(#) set the number of random permutations of the group variable (option g(v > arname) to be performed for the Monte Carlo test. The default value is iter(100).

level(#) set the confidence level for the p-value's confidence interval. The de > fault is level(95).

scatter generates a scatter plot of the two groups.

density generates a Kernel density of the interpoint distance distribution for > the two groups.

Graphic Options

The graphic options are designed to manipulate the graphic output of the comman > ds. The options are active only if the corresponding option (scatter or density) is > specified.

Options when option scatter is specified

scolor0(colorstyle) set the color for the marker of group 0. scolor1(colorstyle) set the color for the marker of group 1. smarker0(marker symbol) set the symbol for the marker of group 0. smarker1(marker symbol) set the symbol for the marker of group 1. ssize0(marker size) set the size for the marker of group 0. ssize1(marker size) set the size for the marker of group 1. slabel0(string) input the label for group 0 in the legend, default: "Group 0". slabel1(string) input the label for group 1 in the legend, default: "Group 1". stitle(string) specifies the title for the scatter, default: "Spatial Distribut > ion of the two groups". sytitle(string) specifies the title for the y axis, default is the name of the > variable in option y(y-coord). sxtitle(string) specifies the title for the x axis, default is the name of the > variable in option x(x-coord).

Options when option density is specified

dcolor0(colorstyle) set the color for the line of the density of group 0. dcolor1(colorstyle) set the color for the line of the density of group 1. dpattern0(line pattern style) set the pattern style for the line of the density > of group 0. dpattern1(line pattern style) set the pattern style for the line of the density > of group 1. dwidth0(line width style) set the width for the line of the density of group 0. dwidth1(line width style) set the width for the line of the density of group 1. dlabel0(string) input the label for group 0 in the legend, default: "Group 0". dlabel1(string) input the label for group 1 in the legend, default: "Group 1". dtitle(string) specifies the title for the Kernel density, default: "IDD Kernel > Densities".

Saved results

mstat saves the following in r()

Scalars r(M) observed M statistic r(p) chi-squared p-value (if option chi2 is specified) Matrices r(difF) difference between the ECDFs in the two groups r(Sinv) generalized inverse of the covariance matrix of r(difF) r(d) cutoffs of the equiprobable bins

mtest saves the following in r()

Scalars r(N) sample size

Matrices r(M) observed M statistic r(c) count when M>=M(obs) is true r(p) observed empirical p-value r(se) standard error of empirical p-value r(ci) exact binomial confidence interval of observed p-value r(reps) number of nonmissing results r(d) cutoffs of the equiprobable bins r(Sinv) generalized inverse of the covariance matrix

Author

Pietro Tebaldi Department of Biostatistics , Harvard School of Public Health Bocconi University , Milan - Italy pietro.tebaldi@studbocconi.it

References

Bonetti, M., and Pagano, M. The interpoint distance distribution as a descriptor of point patterns, with an application to spatial disease clustering. Stat Med 2005; 24(5):753-773.

Forsberg, L., Bonetti, M. and Pagano, M. 2009. The choice of the number of bins for the M statistic. CSDA 2009; 53(10):3640-3649.

Manjourides, J., and Pagano, M. 2010. An interpoint distance based test for the difference between two spatial distribution. Submitted.

Ozonoff, A. , Jeffery, C. , Manjourides, J. , White, L.F., and Pagano, M. 2007. Effect of spatial resolution on cluster detection: a simulation study. Int J Health Geogr 2007; 52(6).