/*** {hline} help for {hi:exbsample}{right:P. Van Kerm (September 2022)} {hline} Title ===== __exbsample__ {hline 2} Exchangeably weighted (or Bayesian) bootstraps Syntax ------ > __exbsample__ _#_ [_if_] [_in_] [_weight_] [using _filename_] [, _options_] _#_ is the desired number of bootstrap replicates. | _option_ | _Description_ | |:-------------------------------------------|:-----------------------------------------------------------| | stub(_name_) | prefix of bootstrap weight variables generated | | **d**istribution(poisson _or_ exponential) | choice of bootstrap weight distribution | | norescale | disable scaling of weights to unit mean | | **bal**ance(_#_) | request balancing of bootstrap weights (in _#_ iterations) | | seed(_#_) | set random-number seed to _#_ | | **str**ata(_varlist_) | variables identifying strata | | **cl**uster(_varlist_) | variables identifying clusters | | **svy**setttings | reads strata and cluster identifiers from __svyset__ | | **id**vars(_varlist_) | variables uniquely identifying bootrapped units in new frame or data file | | **fr**ame(_name_ [, **link**varname(_varname_) replace nofrlink]) | | | | save bootstrap weight variables in a separate frame _name_ and links to the current frame using variable _varname_ (unless _nofrlink_ is specified) | | replace | replace frame _name_ or file _filename_ or variables _stub*_ if they exist | | nodots | do not display dots | __fweight__, __pweight__ or __iweight__ are allowed. Description ----------- __exbsample__ generates bootstrap replication weights for implementation of exchangeably weighted bootstrap schemes, also known as the Bayesian bootstrap. It can be used as an alternative to __bsample__. Exchangeably weighted bootstrap schemes (or weighted, or exchangeable bootstraps) are alternatives to the traditional non-parametric (paired) bootstrap. Standard bootstrap replications involve generating bootstrap samples of size _N_ by drawing with replacement from the original data. Such a bootstrap resample can be seen as a frequency weighted version of the original data, with integer weights representing the number of times each observation is drawn in a resample. (See the __weight__ option of Stata's bootstrap drawing command __bsample__.) Exchangeably weighted bootstrap schemes can be seen as extensions of this representation: bootstrap resamples are created by generating replication weights directly from appropriate distribution functions. See Praestgaard and Wellner (1993) for details. This technique is also known as the Bayesian bootstrap (Rubin, 1981). __exbsample__ generates weights based on draws from a Poisson distribution or from an exponential distribution (both with unit mean). Drawing from the Poisson distribution generates integer weights 0, 1, 2, ... the distribution of which approximates the multinomial distribution that standard resampling weights effectively follow. Drawing from the exponential distribution generates strictly positive, non-integer weights. Draws from the exponential distribution can be seen as continuous (smoothed) versions of the Poisson draws. The advantage of exponential draws is the absence of zero weights: all observations from the original data are kept in the bootstrap resamples, albeit with possibly small weights. This can have practical computational advantages. In both cases, replication weights are, by default, scaled to sum to the sample size _N_. Once replication weight variables are generated, they can be used by __svy bootstrap__ for bootstrap inference. (__svyset__ __,bsrweight(...)__ needs to be set accordingly.) Also see J. Pitblado's __bs4rw__. Stratified and/or clustered sampling is handled by specifying strata and cluster identifiers (as in {cmd:bsample}); samples of clusters are `drawn' independently across strata -- observations from the same cluster all have the same weight and weights sum to the number of clusters. Observations that do not meet the optional _if_ and _in_ criteria are excluded from the bootstrap replications. If an __fweight__, __pweight__ or __iweight__ is given, the Poisson or exponential bootstrap replication weights are multiplied by the weight expression. The replication weight variables generated are added to the data in memory by default. They can alternatively be saved in a separate file if __using__ _filename_ is specified or in a separate frame with the option __frame__. Options ------- {phang} {opth stub(name)} determines the name of the bootstrap weight variables generated. Replication weight variables are named {it:name1}, {it:name2}, etc. Default is _bootvar1_, bootvar2_, etc. {phang} {opth distribution(name)} selects the bootstrap weight distribution; name is __exponential__ (the default) or __poisson__. {phang} {opt norescale} disable scaling of replication weight variables to sum to the number of observations (or clusters). {phang} {opth balance(#)} requests balancing of weights across all replications. Standard bootstrap balancing ensures that each observation in the data is drawn the same number of times in the overall set of resamples. Balancing is implemented here by scaling resampling weights `horizontally' (i.e., across replications for each observation) so that they sum to the number of bootstrap replications. To obtain both balancing (horizontal) and scaling (vertical), the two scaling steps are iterated _#_ number of times. (Default is 0 which implies _no_ balancing.) {phang} {opt seed(#)} sets the random number generator seed to _#_ prior to generating replication weight draw. {phang} {opth strata(varlist)} specifies the variables identifying strata. If {opt strata()} is specified, bootstrap replication weights are scaled to sum to the number of clusters in each stratum. {phang} {opth cluster(varlist)} specifies the variables identifying resampling clusters (primary sampling units). If {opt cluster()} is specified, one replication weight is drawn per cluster and is shared across all observations in the cluster. {phang} {opt svysettings} requests that strata and cluster information is read from the settings of the dataset, as determined by __svyset__. {phang} {opth idvars(varlist)} identifies variables that uniquely identify the bootrapped units. This is required when replication weights are stored in a separate frame or data file: the variables in __idvars__ are saved alongside the replication weights to allow matching to the dataset in memory. {phang} {opth frame(name)} requests that bootstrap replication weight variables are stored in a new, separate frame named _name_ (and not in the current frame in memory). A frame linkage is created to the current frame unless the _nofrlink_ sub-option is specified. The link variable is given in {cmd:linkvarname(}{it:varname}{cmd:)} (BOOTSTRAPLINK by default). {phang} {opt replace} requests that frame _name_ or file _filename_ or variables _stubX_ are replaced if they already exist. {phang} {opt nodots} disables display of dots. Examples -------- Generate simple replication weights from exponential distribution: . sysuse auto . exbsample 499 , stub(rw) . summarize rw1 rw2 rw499 . svyset , bsrweight(rw1-rw499) . svy bootstrap : regress price trunk i.foreign Select Poisson weights and save weights in separate dataset: . sysuse auto . exbsample 499 using replications-weights.dta , stub(rw) distribution(poisson) idvars(make) Select Poisson weights, disable weight scaling and save weights in separate frame: . sysuse auto . exbsample 499 , stub(rw) distribution(poisson) norescale frame(replications , link(bootvarlink)) idvars(make) . frget rw1, from(bootvarlink) . regress price trunk i.foreign [iw=rw1] . frame change replications . summarize rw1 rw2 rw499 See Van Kerm (2022) for more examples. Citation suggestion ------------------- Van Kerm, P. (2022). exbsample {c -} Stata module for exchangeably weighted (or Bayesian) bootstraps, Statistical Software Components, Boston College Department of Economics. Also see -------- {psee} Online: {manhelp bsample R}, {helpb rhsbsample} (if installed), {helpb gsample} (if installed), {helpb bsweights} (if installed), {helpb bs4rw} (if installed) {p_end} Author ------ Philippe Van Kerm Luxembourg Institute of Socio-Economic Research and University of Luxembourg References ---------- Praestgaard, J. and Wellner, J. A. (1993), Exchangeably weighted bootstraps of the general empirical process, The Annals of Probability 21(4), 2053–2086. Rubin, D. (1981), The Bayesian bootstrap, The Annals of Probability 21(4), 2053–2086. Van Kerm, P. (2022). [Exchangeably weighted bootstrap schemes](http://ideas.repec.org/p/boc/usug22/). 2022 London Stata Users Group meeting, September 8-9 2022, University College London. - - - This help file was dynamically produced by [MarkDoc Literate Programming package](http://www.haghish.com/markdoc/) ***/ ** To build help file: mini exbsample.ado , export(sthlp) replace *! v1.0.0, 2022-09-08, Philippe Van Kerm, Exchangeably Weighted Bootstrap pr def exbsample , sortpreserve sclass version 9.2 syntax [anything(name=N)] [if] [in] [fw pw iw] [using/] [, /// /// bootstrap choice Distribution(string) /// norescale /// BALance(integer 0) /// /// unique id, strata and cluster CLuster(varlist) /// STRata(varlist) /// SVYsettings /// /// storage options stub(name) /// IDvars(varlist) /// FRame(string) /// replace /// /// misc seed(integer -1) /// nodots /// ] // 1. --- PARSING AND INITALIZATION ---- // check non-empty file if (_N==0) { di as error "No observations." exit 198 } // number of replications if (`'"`N'"'=="") loc N 1 confirm integer number `N' // draw type if (!inlist("`distribution'","exponential","poisson","")) { di as error "name must be exponential or poisson in option distribution(name)" exit 198 } if ("`distribution'"=="") loc distribution exponential // scaleing if ("`rescale'"=="") loc adjust 1 else loc adjust 0 // nodots if ("`dots'"!="") loc quidots quietly // mark observations marksample touse qui markout `touse' `strata' `cluster' , strok // read svy settings if -svysettings- specified: if ("`svysettings'" != "") { if ("`: char _dta[_svy_version]'" == "") { di as error "svy settings not available" exit 198 } if ("`cluster'" != "") { di as error "svysettings and cluster() options are mutually exclusive" exit 198 } if ("`strata'" != "") { di as error "svysettings and strata() options are mutually exclusive" exit 198 } local strata : char _dta[_svy_strata1] local cluster : char _dta[_svy_su1] } // set strata and cluster if unspecidied if ("`strata'"=="") { tempvar strata qui gen byte `strata' = 1 if `touse' } if (`"`cluster'"' == "") { tempvar cluster qui gen double `cluster' = _n if `touse' } // `base' weight tempvar baseweight if (`"`weight'"'!="") { qui gen double `baseweight' `exp' qui replace `baseweight' = `baseweight' * `touse' } else { qui gen byte `baseweight' = `touse' } // check stub and set default if ("`stub'"=="") { di as text "No stub specified. Default is bootvar." local stub bootvar } // check that ID is passed if using or frame are used if ( ( (`"`using'"'!="") | (`"`frame'"'!="") ) & ("`idvars'"=="") ) { di as error "Option idvars() is required if the replication weights are saved outside current dataset or frame." exit 198 } // check existing using file if ((`"`using'"'!="") & ("`replace'"=="")) confirm new file `using' // parse frame and check existing frame _parse_frame `frame' if (`"`frame'"'!="") { if ("`replace'"=="") confirm new frame `frame' else cap frame drop `frame' } // check existing variables if ((`"`using'"'=="") & ("`frame'"=="")) { if ("`replace'"=="") { forvalues i = 1/`N' { confirm new variable `stub'`i' } } else { forvalues i = 1/`N' { cap drop `stub'`i' } } } // 2. --- GETTING FRAMES OR FILENAME IN SHAPE ---- // prepare frame if ("`frame'"!="") { // create and move into new frame qui frame pwf loc currentframe `r(currentframe)' //frame create `frame' qui frame put `idvars' `strata' `cluster' `baseweight' `touse' if `touse' , into(`frame') qui frame change `frame' } else { if ("`using'"!="") { // prepare empty file preserve qui keep if `touse' keep `idvars' `strata' `cluster' `baseweight' `touse' } } // 3. --- GENERATE REPLICATION WEIGHTS ---- // create replication variables if (`seed'>-1) set seed `seed' tempvar onecobs mndraw sort `touse' `strata' `cluster' qui egen `onecobs' = tag(`touse' `strata' `cluster') forvalues i = 1/`N' { `quidots' di "." _c qui gen double `stub'`i' = r`distribution'(1) if `onecobs' & `touse' if (`adjust'==1) { qui by `touse' `strata' : egen `mndraw' = mean(`stub'`i') if `onecobs' qui replace `stub'`i' = `stub'`i'/`mndraw' drop `mndraw' } } // balancing iterations (if balance>0 is specified) tempvar rowmn forvalues biter=1/`balance' { qui egen `rowmn' = rowmean(`stub'*) if `onecobs' forvalues i = 1/`N' { qui replace `stub'`i' = `stub'`i'/`rowmn' if (`adjust'==1) { qui by `touse' `strata' : egen `mndraw' = mean(`stub'`i') if `onecobs' qui replace `stub'`i' = `stub'`i'/`mndraw' drop `mndraw' } } drop `rowmn' } // copy values across clusters and mulitply base weights forvalues i = 1/`N' { // copy to all in cluster qui bys `touse' `strata' `cluster' (`onecobs') : replace `stub'`i' = `stub'`i'[_N] // mutliply base weights qui replace `stub'`i' = `baseweight' * `stub'`i' } // 4. --- CLOSING ---- // wrap up frames and saved datasets if (`"`frame'`using'"'!="") { keep `idvars' `stub'1-`stub'`N' } if (`"`using'"'!="") { save `"`using'"' , `replace' if ("`frame'"=="") restore } if ("`frame'"!="") { qui frame change `currentframe' if ("`frlink'"=="") qui frlink m:1 `idvars' , frame(`frame') generate(`linkvarname') } sreturn local N = `N' end pr def _parse_frame syntax [name] [ , LINKvarname(name) REPLACE noFRLINK ] if ("`namelist'"!="") { if ("`frlink'"=="") { if ("`linkvarname'"=="") loc linkvarname BOOTSTRAPLINK if ("`replace'"=="") confirm new var `linkvarname' else cap drop `linkvarname' } c_local frame `namelist' c_local linkvarname `linkvarname' c_local frlink `frlink' } end exit Philippe Van Kerm Luxembourg Institute of Socio-Economic Research and University of Luxembourg