help mata mm_bs() -------------------------------------------------------------------------------

Title

mm_bs() -- Bootstrap estimation

Syntax

bs = mm_bs(f, X [, w, reps, d, nodots, strata, cluster, stat, ...])

bs = mm_bs2(f, X [, w, reps, d, nodots, strata, cluster, stat, ...])

where

f: pointer scalar containing address of function to be bootstrapped, i.e. f = &functionname() X: real matrix containing data (rows are observations, columns variables) w: real colvector containing weights reps: real scalar specifying number of replications (default: 50) d: real scalar specifying reduction of bootstrap sample size (default: 0) nodots: real scalar indicating that replication dots be suppressed strata: real colvector containing (sorted) strata ID variable cluster: real colvector containing (sorted) cluster ID variable stat: real matrix containing the results of f using the original data, i.e., the "observed" value of f ...: up to 10 optional arguments to pass through to f

real matrix mm_bs_report(bs [, what, level, mse, jk])

where

what: string vector containing statistics to be reported, where the available statistics are: "b" or "theta" ("observed" value), "mean" (bootstrap mean), "bias" (bootstrap mean - observed value), "v" (variance-covariance matrix), "se" (standard error; the default), "ci" or "n" (normal-approximation confidence interval), "basic" (basic confidence interval), "p" (percentile confidence interval), "bc" (bias-corrected confidence interval), "bca" (bias-corrected and accelerated confidence interval), "t" (percentile-t confidence interval) level: real scalar containing the confidence level for confidence intervals (default is 95 or as set by set level) mse: real scalar indicating that the mean squared errors formula be used jk: struct mm_jkstats containing results from mm_jk() (required if what contains "bca")

bs is a variable used for communication between mm_bs() and mm_bs_report(). If you declare bs, declare it to be transmorphic.

Description

mm_bs(f, X, w) applies function f to bootstrap samples of the data X (and weights w) and returns the results as a structure. To be precise, f is a pointer to a function, i.e. f = &functionname(), e.g. f = &mean() (see [M-2] ftof). mm_bs() expects function f to return a real rowvector of parameter estimates to be bootstrapped or, optionally, a real matrix containing parameter estimates in the first row and associated standard errors in the second row (the standard errors are required for percentile-t confidence intervals; see Remarks below). Furthermore, function f must take the data as the first argument and weights as the second argument.

Note that the weights w are not relevant for the bootstrap resampling process. That is, mm_bs() always draws simple (i.e. equal probability) random samples (with replacement), no matter whether weights are specified or not. However, weights are passed through to the internal calls of function f. Omit w, or specify w as 1 to obtain unweighted results. w=1 is passed to function f if w is omitted.

reps specifies the number of desired bootstrap replicates. The default is 50, which is too low for most applications. The default sample size for the single bootstrap samples is the number of observations in the data (or number of clusters, if cluster is specified). However, d>0 causes the default bootstrap sample size to be reduced by d (within each stratum if strata is specified). For example, specify d=1 to produce bootstrap samples containing only N-1 observations. Specify, d=0 to not change the default sample size.

nodots!=0 indicates that replication dots be suppressed. By default, a single dot character is displayed for each successful replication and a single red 'x' is displayed for each unsuccessful replication. A replication is considered unsuccessful if the replication result contains one or more missing values. mm_bs() only returns results from successful replications.

strata and cluster may be used to specify a strata ID variable and a cluster ID variable. mm_bs() will then draw stratified samples of clusters. Note that mm_bs() does not sort the data: A new stratum begins each time strata changes from one row to the next, a new cluster begins each time cluster (or strata) changes from one row to the next. Omit strata or specify strata=. if the sample is unstratified; omit cluster or specify cluster=. if the sample does not contain clusters.

By default, mm_bs() first applies f to the original data to obtain the "observed" value of f given X and w. Alternatively, the "observed" value may be provided as stat, where stat is a real matrix containing point estimates in the first row and, optionally, associated standard errors in the second row. Omit stat or specify stat=. if you do not want to provide the "observed" value.

mm_bs2() is an alternative version of mm_bs(). Instead of physically sampling the data, mm_bs2() implements bootstrap estimation by multiplying w by the number of times an observation belongs to the bootstrap sample. mm_bs2() requires less memory than mm_bs() but it is slower and cannot be used in all situations (i.e. only if, for the statistic in question, multiplying w is a correct way to represent multiple drawn observations).

The results produced by mm_bs() and mm_bs2() depend on the the initial value of random-number seed. Use set seed or uniformseed() to set the seed.

mm_bs_report() is used to analyze the bootstrap replications computed by mm_bs() or mm_bs2(). It returns a matrix of statistics such as bootstrap means, bootstrap standard errors, or various versions of bootstrap confidence intervals (see the what argument above). Multiple statistics are arranged beneath one another in the specified order. For example, mm_bs_report(bs, ("b","se","ci")) will return the observed values in the first row, the standard errors in the second row, and the lower and upper bounds of the normal-approximation confidence intervals in the third and forth row.

level specifies the confidence level, as a percentage, for confidence intervals. The default is level=95 or as set by set level.

mse!=0 indicates that variances and standard errors be computed using deviations of the replicates from the "observed" value. By default, variances and standard errors are computed using deviations from the average of the replicates.

jk provides jackknife statistics as returned by mm_jk(). jk is required for bias-corrected and accelerated confidence intervals ("bca"). Omit jk or specify jk=. if no bias-corrected and accelerated confidence intervals are computed.

Remarks

Remarks are presented under the headings

Introduction BCa confidence intervals Percentile-t confidence intervals Methods and formulas

Introduction

The following example illustrates the basic usage of mm_bs() and mm_bs_report():

: x = uniform(75,2) : B = mm_bs(&mean(), x, 1, 200) Bootstrap replications (200) ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 .................................................. 50 .................................................. 100 .................................................. 150 .................................................. 200

: mm_bs_report(B, ("b", "se")) 1 2 +-----------------------------+ 1 | .5059552237 .4677170853 | 2 | .0307029664 .0333863864 | +-----------------------------+

: mm_bs_report(B, "ci") 1 2 +-----------------------------+ 1 | .4454103082 .4018805821 | 2 | .5665001392 .5335535885 | +-----------------------------+

mm_bs() first produces 200 bootstrap replicates of the means of the two variables contained in x. mm_bs_report() then reports the "observed" values, i.e. the means of the two variables in x (first row) and the bootstrap standard errors of the means (second row). The second call of mm_bs_report() displays the 95% normal-approximation confidence intervals for the two means (lower bound in first row, upper bound in second row).

BCa confidence intervals

Bias-corrected and accelerated confidence intervals require the user to provide jackknife replicates of the parameter estimates. Use the mm_jk() function to compute these statistics. Example:

: J = mm_jk(&mean(), x, 1) Jackknife replications (75) ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 .................................................. 50 .........................

: mm_bs_report(B, "bca", 95, 0, J) 1 2 +-----------------------------+ 1 | .4468480011 .4088091557 | 2 | .5566050638 .5391249383 | +-----------------------------+

Percentile-t confidence intervals

The formula for the (1-alpha) percentile-t confidence interval is

[ b - t(1-alpha/2) * se, b - t(alpha/2) * se ]

where b is the original parameter estimate, se is an estimate of the standard error of b and t(p) is the p-quantile of the asymptotically pivotal statistic

t_i = (b_i - b)/se_i, i = 1, ..., B

where the i indicates the bootstrap replicates.

To enable the computation of percentile-t confidence intervals, function f provided to mm_bs() must return standard error estimates along with the point estimates of the parameters (point estimates in the first row, standard errors in the second row). The following example illustrates the procedure using the textbook formula for the standard error of the mean:

: real matrix meanse(x, w) > { > if (w!=1) _error(3498, "w must be 1") > return(mean(x, 1) \ > sqrt(diagonal(variance(x, 1))'/rows(x))) > }

: meanse(x,1) 1 2 +-----------------------------+ 1 | .5059552237 .4677170853 | 2 | .0326460175 .0347510918 | +-----------------------------+

: B = mm_bs(&meanse(), x, 1, 200) Bootstrap replications (200) ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 .................................................. 50 .................................................. 100 .................................................. 150 .................................................. 200

: mm_bs_report(B, "t") 1 2 +-----------------------------+ 1 | .4442775319 .406048357 | 2 | .5645887108 .537843043 | +-----------------------------+

An alternative approach may be to compute the standard errors by the bootstrap, i.e. to perform bootstrap-within-bootstrap or nested bootstrap. Example:

: real matrix meanbsse(x, w) > { > if (w!=1) _error(3498, "w must be 1") > return(mean(x, 1) \ > mm_bs_report(mm_bs(x,1,&mean(),50,0,1),"se")) > }

: meanbsse(x,1) 1 2 +-----------------------------+ 1 | .5059552237 .4677170853 | 2 | .0294400659 .0297055941 | +-----------------------------+

: B = mm_bs(&meanbsse(), x, 1, 200) Bootstrap replications (200) ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 .................................................. 50 .................................................. 100 .................................................. 150 .................................................. 200

: mm_bs_report(B, "t") 1 2 +-----------------------------+ 1 | .438978656 .4096034266 | 2 | .5833692536 .552163646 | +-----------------------------+

Methods and formulas

The formulas for the percentile-t confidence intervals can be found above. Also see, e.g., Poi (2004).

The basic confidence interval is defined as

[ 2*b - q(1-alpha/2), 2*b - q(alpha/2) ]

where b is the original parameter estimate and q(p) is the p-quantile of the bootstrap distribution of b (see, e.g., Davison and Hinkley 1997).

For all other formulas see [R] bootstrap. Note that, other than indicated in [R] bootstrap, 1/k is used instead of 1/(k-1) in the mean squared errors formula (the same is true for official Stata's bootstrap).

Conformability

mm_bs(f, X, w, reps, d, nodots, strata, cluster, stat, ...), mm_bs2(f, X, w, reps, d, nodots, strata, cluster, stat, ...): f: 1 x 1 X: n x k w: n x 1 or 1 x 1 reps: 1 x 1 d: 1 x 1 nodots: 1 x 1 strata: n x 1 or strata=. cluster: n x 1 or cluster=. stat: m x p, m>0, or stat=. ...: (depending on f) result: struct mm_bsstats

(*f)(X, w, ...): X: n x k w: n x 1 or 1 x 1 ...: (depending on f) result: m x p, m>0

mm_bs_report(bs, what, level, mse, jk): bs: struct mm_bsstats what: s x 1 or 1 x s level: 1 x 1 mse: 1 x 1 jk: struct mm_jkstats or jk=. result: r x p

Diagnostics

mm_bs() and mm_bs2() cannot be used with built-in functions (use wrappers).

mm_bs_report() will abort with error if what contains "bca" and no jackknife estimates are provided.

mm_bs_report() with what containing "t" will abort with error if applied to bootstrap replicates that do not contain information on standard errors.

Source code

mm_bs.mata

References

Davison, A.C., Hinkley, D.V. (1997). Bootstrap Methods and Their Application. Cambridge University Press.

Poi, B.P. (2004). From the help desk: Some bootstrapping techniques. Stata Journal 4(3):312-328.

Author

Ben Jann, ETH Zurich, jann@soz.gess.ethz.ch

Also see

Online: help for bootstrap, mm_jk(), moremata