{smcl} {* 1.1.0 created 2013-11-14}{...} {* 1.0.0 created 2013-09-09}{...} {* 0.0.1 created 2009-02-20}{...} {hline} help for {hi:rhsbsample}{right:P. Van Kerm (September 2013, February 2009)} {hline} {title:Title} {pstd}{hi:rhsbsample} {hline 2} Repeated half-sample bootstrap sampling {title:Syntax} {p 8 17 2} {cmdab:rhsbsample} [{cmd:if} {it:exp}] [{cmd:in} {it:range}] [{cmd:,} {it:options}] {synoptset 20 tabbed} {synopthdr} {synoptline} {syntab:Main} {synopt :{opth str:ata(varlist)}}variables identifying strata{p_end} {synopt :{opth cl:uster(varlist)}}variables identifying resampling clusters{p_end} {synopt :{opth id:cluster(newvar)}}create new cluster ID variable{p_end} {synopt :{opth w:eight(varname)}}replace {it:varname} with frequency weights{p_end} {synopt :{opt svy:settings}}read strata and cluster identifiers from {cmd:svyset}{p_end} {synoptline} {title:Description} {pstd} {cmd:rhsbsample} is a variant of the {cmd:bsample} command. It draws random samples from the data in memory using the 'repeated half-sample bootstrap' proposed in Saigo, Shao and Sitter (2001). It is particularly suitable for bootstrapping complex survey data. This modified bootstrap procedure has the advantage of being valid irrespectively of the number of primary sampling units per stratum and of the type of estimator under analysis. It is also easy as it avoids the need for rescaling weights as in other bootstrap procedures for survey data. See Shao (2003) for an accessible discussion of bootstrapping sample surveys and Kolenikov (2010) for a detailed description of Stata capabilities. {pstd} When the original sample size N (the number of primary sampling units (PSU)) in a stratum is even, Saigo et al.'s (2001) repeated half-sample bootstrap consists in drawing {it:without} replacement a sample of size N/2. Each sampled PSU is then duplicated so the bootstrap sample has size N. When N is odd, the procedure is modified as follows: {phang2} Either (i) a sample of size (N-1)/2 is drawn without replacement and each sampled PSU is duplicated to achieve a sample size N-1. One additional PSU is then drawn at random from the units already selected to reach a sample size N; {phang2} Or (ii) a sample of size (N-1)/2 + 1 is drawn without replacement and each sampled PSU is duplicated to achieve a sample size N+1. One PSU is then dropped at random to reach a sample size N. {pstd} Method (i) is used with probability 1/4 and method (ii) is used with probability 3/4 when forming replicate bootstrap samples. {pstd} Stratified multistage sampling is handled by specifying strata and cluster identifiers (as in {cmd:bsample}); samples of clusters are drawn independently across strata using the repeated hal-sample procedure. {pstd} Observations that do not meet the optional {it:{help if}} and {it:{help in}} criteria are dropped (not sampled). {title:Options} Options are as in {help bsample} except for {opth svysettings}, see {bf:[R] bsample}. {dlgtab:Main} {phang} {opth strata(varlist)} specifies the variables identifying strata. If {opt strata()} is specified, bootstrap samples are selected within each stratum. {phang} {opth cluster(varlist)} specifies the variables identifying resampling clusters (primary sampling units). If {opt cluster()} is specified, the sample drawn during each replication is a bootstrap sample of clusters. {phang} {opth idcluster(newvar)} creates a new variable containing a unique identifier for each resampled cluster. {phang} {opth weight(varname)} specifies a variable in which the sampling frequencies will be placed. {it:varname} must be an existing variable, which will be replaced. After {cmd:rhsbsample}, {it:varname} can be used as an {opt fweight} in any Stata command that accepts {opt fweight}s. This option cannot be combined with {opt idcluster()}. {phang} {opth svysettings} requests that strata and cluster information is read from the settings of the dataset, as determined by {cmd;svyset}. {pmore} By default, {cmd:rhsbsample} replaces the data in memory with the sampled observations; however, specifying the {opt weight()} option causes only the specified {it:varname} to be changed and record the frequency of appearance of each observation in the bootstrap sample. {title:Examples} {phang}{cmd:. sysuse auto}{p_end} {phang}{cmd:. rhsbsample }{p_end} {phang}{cmd:. sysuse auto}{p_end} {phang}{cmd:. rhsbsample if !foreign}{p_end} {phang}{cmd:. sysuse auto}{p_end} {phang}{cmd:. rhsbsample , strata(foreign)}{p_end} {phang}{cmd:. sysuse auto}{p_end} {phang}{cmd:. forvalues i=1/500 {c -(}}{p_end} {phang}{cmd:. qui gen brw`i' = .}{p_end} {phang}{cmd:. rhsbsample , strata(foreign) weight(brw`i')}{p_end} {phang}{cmd:. {c )-}}{p_end} {phang}{cmd:. svyset , strata(foreign) bsrweight(brw*)}{p_end} {phang}{cmd:. svy bootstrap : regress price trunk turn}{p_end} {pstd} See {browse "http://ideas.repec.org/p/boc/usug13/10.html":Van Kerm (2013)} for more elaborate usage examples. {title:References} {phang}Kolenikov, S. (2010). Resampling variance estimation for complex survey data. {it:Stata Journal}, 10(2): 165–199. {phang}Saigo, H., Shao, J. and Sitter, R.R. (2001). A Repeated Half-Sample Bootstrap and Balanced Repeated Replications for Randomly Imputed Data. {it:Survey Methodology}, 27(2): 189{c -}196. {phang}Shao, J. (2003). Impact of the Bootstrap on Sample Surveys. {it:Statistical Science}, 18(2): 191{c -}196. {phang}Van Kerm, P. (2013). {browse "http://ideas.repec.org/p/boc/usug13/10.html":Repeated half-sample bootstrap resampling}. 2013 London Stata Users Group meeting, September 12–13 2013, Cass Business School, London. {title:Author} {pstd}Philippe Van Kerm, CEPS/INSTEAD, Luxembourg, philippe.vankerm@ceps.lu {title:Citation suggestion} {phang} Van Kerm, P. (2013). rhsbsample {c -} Stata module for repeated half-sample bootstrap resampling, {browse "http://ideas.repec.org/c/boc/bocode/s457697.html":Statistical Software Component S457697}, Boston College Department of Economics. {title:Acknowlegdments} {pstd} This work was part of the MeDIM and InWIn projects supported by the Luxembourg Fonds National de la Recherche (contracts FNR/06/15/08 and C10/LM/785657) and by core funding for CEPS/INSTEAD by the Ministry of Higher Education and Research of Luxembourg. {title:Also see} {psee} Online: {manhelp bsample R}, {helpb gsample} (if installed), {helpb bsweights} (if installed) {p_end}