help mata mm_bs()-------------------------------------------------------------------------------

Title

mm_bs() -- Bootstrap estimation

Syntax

bs=mm_bs(f,X[,w,reps,d,nodots,strata,cluster,stat,...])

bs=mm_bs2(f,X[,w,reps,d,nodots,strata,cluster,stat,...])where

f:pointer scalarcontaining address of function to be bootstrapped, i.e.f=&functionname()X:real matrixcontaining data (rows are observations, columns variables)w:real colvectorcontaining weightsreps:real scalarspecifying number of replications (default: 50)d:real scalarspecifying reduction of bootstrap sample size (default: 0)nodots:real scalarindicating that replication dots be suppressedstrata:real colvectorcontaining (sorted) strata ID variablecluster:real colvectorcontaining (sorted) cluster ID variablestat:real matrixcontaining the results offusing the original data, i.e., the "observed" value off...: up to 10 optional arguments to pass through tof

real matrixmm_bs_report(bs[,what,level,mse,jk])where

what:string vectorcontaining statistics to be reported, where the available statistics are:"b"or"theta"("observed" value),"mean"(bootstrap mean),"bias"(bootstrap mean - observed value),"v"(variance-covariance matrix),"se"(standard error; the default),"ci"or"n"(normal-approximation confidence interval),"basic"(basic confidence interval),"p"(percentile confidence interval),"bc"(bias-corrected confidence interval),"bca"(bias-corrected and accelerated confidence interval),"t"(percentile-t confidence interval)level:real scalarcontaining the confidence level for confidence intervals (default is 95 or as set byset level)mse:real scalarindicating that the mean squared errors formula be usedjk:struct mm_jkstatscontaining results frommm_jk()(required ifwhatcontains"bca")

bsis a variable used for communication betweenmm_bs()andmm_bs_report(). If you declarebs, declare it to betransmorphic.

Description

mm_bs(f,X,w)applies functionfto bootstrap samples of the dataX(and weightsw) and returns the results as a structure. To be precise,fis a pointer to a function, i.e.f=&functionname(), e.g.f=&mean()(see[M-2] ftof).mm_bs()expects functionfto return areal rowvectorof parameter estimates to be bootstrapped or, optionally, areal matrixcontaining parameter estimates in the first row and associated standard errors in the second row (the standard errors are required for percentile-t confidence intervals; see Remarks below). Furthermore, functionfmust take the data as the first argument and weights as the second argument.Note that the weights

ware not relevant for the bootstrap resampling process. That is,mm_bs()always draws simple (i.e. equal probability) random samples (with replacement), no matter whether weights are specified or not. However, weights are passed through to the internal calls of functionf. Omitw, or specifywas 1 to obtain unweighted results.w=1 is passed to functionfifwis omitted.

repsspecifies the number of desired bootstrap replicates. The default is 50, which is too low for most applications. The default sample size for the single bootstrap samples is the number of observations in the data (or number of clusters, ifclusteris specified). However,d>0 causes the default bootstrap sample size to be reduced byd(within each stratum ifstratais specified). For example, specifyd=1 to produce bootstrap samples containing only N-1 observations. Specify,d=0 to not change the default sample size.

nodots!=0 indicates that replication dots be suppressed. By default, a single dot character is displayed for each successful replication and a single red 'x' is displayed for each unsuccessful replication. A replication is considered unsuccessful if the replication result contains one or more missing values.mm_bs()only returns results from successful replications.

strataandclustermay be used to specify a strata ID variable and a cluster ID variable.mm_bs()will then draw stratified samples of clusters. Note thatmm_bs()does not sort the data: A new stratum begins each timestratachanges from one row to the next, a new cluster begins each timecluster(orstrata) changes from one row to the next. Omitstrataor specifystrata=. if the sample is unstratified; omitclusteror specifycluster=. if the sample does not contain clusters.By default,

mm_bs()first appliesfto the original data to obtain the "observed" value offgivenXandw. Alternatively, the "observed" value may be provided asstat, wherestatis areal matrixcontaining point estimates in the first row and, optionally, associated standard errors in the second row. Omitstator specifystat=. if you do not want to provide the "observed" value.

mm_bs2()is an alternative version ofmm_bs(). Instead of physically sampling the data,mm_bs2()implements bootstrap estimation by multiplyingwby the number of times an observation belongs to the bootstrap sample.mm_bs2()requires less memory thanmm_bs()but it is slower and cannot be used in all situations (i.e. only if, for the statistic in question, multiplyingwis a correct way to represent multiple drawn observations).The results produced by

mm_bs()andmm_bs2()depend on the the initial value of random-number seed. Useset seedoruniformseed()to set the seed.

mm_bs_report()is used to analyze the bootstrap replications computed bymm_bs()ormm_bs2(). It returns a matrix of statistics such as bootstrap means, bootstrap standard errors, or various versions of bootstrap confidence intervals (see thewhatargument above). Multiple statistics are arranged beneath one another in the specified order. For example,mm_bs_report(bs, ("b","se","ci"))will return the observed values in the first row, the standard errors in the second row, and the lower and upper bounds of the normal-approximation confidence intervals in the third and forth row.

levelspecifies the confidence level, as a percentage, for confidence intervals. The default islevel=95 or as set byset level.

mse!=0 indicates that variances and standard errors be computed using deviations of the replicates from the "observed" value. By default, variances and standard errors are computed using deviations from the average of the replicates.

jkprovides jackknife statistics as returned bymm_jk().jkis required for bias-corrected and accelerated confidence intervals ("bca"). Omitjkor specifyjk=. if no bias-corrected and accelerated confidence intervals are computed.

RemarksRemarks are presented under the headings

IntroductionBCa confidence intervalsPercentile-t confidence intervalsMethods and formulas

The following example illustrates the basic usage of

mm_bs()andmm_bs_report():: x = uniform(75,2) : B = mm_bs(&mean(), x, 1, 200) Bootstrap replications (200) ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 .................................................. 50 .................................................. 100 .................................................. 150 .................................................. 200

: mm_bs_report(B, ("b", "se")) 1 2 +-----------------------------+ 1 | .5059552237 .4677170853 | 2 | .0307029664 .0333863864 | +-----------------------------+

: mm_bs_report(B, "ci") 1 2 +-----------------------------+ 1 | .4454103082 .4018805821 | 2 | .5665001392 .5335535885 | +-----------------------------+

mm_bs()first produces 200 bootstrap replicates of the means of the two variables contained inx.mm_bs_report()then reports the "observed" values, i.e. the means of the two variables inx(first row) and the bootstrap standard errors of the means (second row). The second call ofmm_bs_report()displays the 95% normal-approximation confidence intervals for the two means (lower bound in first row, upper bound in second row).

Bias-corrected and accelerated confidence intervals require the user to provide jackknife replicates of the parameter estimates. Use the

mm_jk()function to compute these statistics. Example:: J = mm_jk(&mean(), x, 1) Jackknife replications (75) ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 .................................................. 50 .........................

: mm_bs_report(B, "bca", 95, 0, J) 1 2 +-----------------------------+ 1 | .4468480011 .4088091557 | 2 | .5566050638 .5391249383 | +-----------------------------+

Percentile-t confidence intervalsThe formula for the (1-alpha) percentile-t confidence interval is

[ b - t(1-alpha/2) * se, b - t(alpha/2) * se ]

where b is the original parameter estimate, se is an estimate of the standard error of b and t(

p) is thep-quantile of the asymptotically pivotal statistict_i = (b_i - b)/se_i, i = 1, ..., B

where the i indicates the bootstrap replicates.

To enable the computation of percentile-t confidence intervals, function

fprovided tomm_bs()must return standard error estimates along with the point estimates of the parameters (point estimates in the first row, standard errors in the second row). The following example illustrates the procedure using the textbook formula for the standard error of the mean:: real matrix meanse(x, w) > { > if (w!=1) _error(3498, "w must be 1") > return(mean(x, 1) \ > sqrt(diagonal(variance(x, 1))'/rows(x))) > }

: meanse(x,1) 1 2 +-----------------------------+ 1 | .5059552237 .4677170853 | 2 | .0326460175 .0347510918 | +-----------------------------+

: B = mm_bs(&meanse(), x, 1, 200) Bootstrap replications (200) ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 .................................................. 50 .................................................. 100 .................................................. 150 .................................................. 200

: mm_bs_report(B, "t") 1 2 +-----------------------------+ 1 | .4442775319 .406048357 | 2 | .5645887108 .537843043 | +-----------------------------+

An alternative approach may be to compute the standard errors by the bootstrap, i.e. to perform bootstrap-within-bootstrap or nested bootstrap. Example:

: real matrix meanbsse(x, w) > { > if (w!=1) _error(3498, "w must be 1") > return(mean(x, 1) \ > mm_bs_report(mm_bs(x,1,&mean(),50,0,1),"se")) > }

: meanbsse(x,1) 1 2 +-----------------------------+ 1 | .5059552237 .4677170853 | 2 | .0294400659 .0297055941 | +-----------------------------+

: B = mm_bs(&meanbsse(), x, 1, 200) Bootstrap replications (200) ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 .................................................. 50 .................................................. 100 .................................................. 150 .................................................. 200

: mm_bs_report(B, "t") 1 2 +-----------------------------+ 1 | .438978656 .4096034266 | 2 | .5833692536 .552163646 | +-----------------------------+

The formulas for the percentile-t confidence intervals can be found above. Also see, e.g., Poi (2004).

The basic confidence interval is defined as

[ 2*b - q(1-alpha/2), 2*b - q(alpha/2) ]

where b is the original parameter estimate and q(

p) is thep-quantile of the bootstrap distribution of b (see, e.g., Davison and Hinkley 1997).For all other formulas see

[R] bootstrap. Note that, other than indicated in[R] bootstrap, 1/k is used instead of 1/(k-1) in the mean squared errors formula (the same is true for official Stata'sbootstrap).

Conformability

mm_bs(f,X,w,reps,d,nodots,strata,cluster,stat,...),mm_bs2(f,X,w,reps,d,nodots,strata,cluster,stat,...):f: 1x1X:n x kw:n x1 or 1x1reps: 1x1d: 1x1nodots: 1x1strata:n x1 orstrata=.cluster:n x1 orcluster=.stat:m x p, m>0, orstat=....: (depending onf)result:struct mm_bsstats

(*f)(X,w,...):X:n x kw:n x1 or 1x1...: (depending onf)result:m x p, m>0

mm_bs_report(bs,what,level,mse,jk):bs:struct mm_bsstatswhat:s x1 or 1x slevel: 1x1mse: 1x1jk:struct mm_jkstatsorjk=.result:r x p

Diagnostics

mm_bs()andmm_bs2()cannot be used with built-in functions (use wrappers).

mm_bs_report()will abort with error ifwhatcontains"bca"and no jackknife estimates are provided.

mm_bs_report()withwhatcontaining"t"will abort with error if applied to bootstrap replicates that do not contain information on standard errors.

Source codemm_bs.mata

ReferencesDavison, A.C., Hinkley, D.V. (1997). Bootstrap Methods and Their Application. Cambridge University Press.

Poi, B.P. (2004). From the help desk: Some bootstrapping techniques. Stata Journal 4(3):312-328.

AuthorBen Jann, ETH Zurich, jann@soz.gess.ethz.ch

Also seeOnline: help for

bootstrap,mm_jk(),moremata