{smcl}
{* *! version 1.1.0 10Aug2013}{...}
{* *! version 1.0.0 30Jul2013}{...}
{cmd:help svysampsi}
{hline}
{title:Title}
{p2colset 5 18 18 2}{...}
{p2col :{hi:svysampsi} {hline 2}}Sample size for surveys with a dichotomous outcome variable {p_end}
{p2colreset}{...}
{title:Syntax}
{pstd} {cmd:svysampsi} {it:#population}
[{cmd:,} {cmdab:p:roportion}{cmd:(}{it:#}{cmd:)}
{cmdab:moe:}{cmd:(}{it:#}{cmd:)}
{cmdab:lev:el}{cmd:(}{it:#}{cmd:)}
{cmdab:resp:onse}{cmd:(}{it:#}{cmd:)} ]
{p 8 14 2}{it:#population} is the size of the target population.{p_end}
{marker options}{...}
{title:Options}
{dlgtab:Main}
{phang}
{opt p:roportion(#)} specifies the proportion of the sample
with the expected outcome. The default is {cmd:prop(0.50)}, and values must be
between 0 and 1.0.
{phang}
{opt moe(#)} specifies the margin of error, as a percent. The default is {cmd:moe(5.0)}, and values must be
between 0 and 100.
{phang}
{opt resp:onse(#)} specifies the expected response rate. When {cmd:response} is
specified, {cmd:svysampi} provides an adjusted sample size estimate. Values must be
between 0 and 1.0.
{phang}
{opt lev:el(#)} specifies the confidence level, as a percentage, for
confidence intervals. The default is {cmd:level(95)} or as set by
{helpb set level}.
{synoptline}
{p2colreset}{...}
{title:Description}
{pstd}
{cmd:svysampsi} estimates the finite population corrected sample size
for a simple random survey in which the primary variable under study is
dichotomous. {cmd:svysampsi} is an immediate command; see {help immed}
for more on immediate commands. {p_end}
{synoptline}
{p2colreset}{...}
{title:Remarks}
{pstd}
There are many situations in which the primary variable being measured
in a survey is dichotomous or binary, and thus aggregates to a
proportion (e.g., the proportion of smokers in a population, the
proportion of voters that view a candidate favorably, etc.). In
estimating the sample size needed for the survey, the researcher must
consider three criteria:
{phang}1. The {it:proportion} of the population expected to respond
positively to the question. Higher (or lower) proportions indicate
greater homogeneity in the population on this attribute, whereas a
proportion of 0.50 indicates the greatest amount of variability (50/50
split). When an a priori assumption cannot be made about a population's
expected level of the attribute, researchers typically set the
{it:proportion} at 0.50 to derive a conservative sample size estimate.
{p_end}
{phang}2. The {it:margin of error} (or sampling error) represents the
range in which the true value of the population is expected to lie.
Thus, with 50% of a hypothetical survey sample indicating that they
smoke, and a 5% {it:margin of error}, we would estimate that between 45%
and 55% of the overall population are smokers. Other things being equal,
larger margins of error produce smaller sample size estimates.
{phang}3. The {it:confidence level} is used in conjunction with the
{it:margin of error}. For example, assuming a normally distributed
variable and a 95% confidence level, we would expect that 95 out of 100
randomly drawn samples will elicit a true population proportion that is
within the range of the {it:proportion} +/- the {it:margin of error}.
Other things being equal, lower confidence levels produce lower sample
size estimates.
{pstd}
In addition, the researcher may want to oversample from the population
to account for non-response. In {cmd:svysampsi}, when the expected
response rate is specified, an additional "over-sample" size estimate is
provided. {p_end}
{pstd}
Lastly, {cmd:svysampsi} estimates the sample size assuming a simple
random sample design is being utilized. If a more complex design is
planned, such as stratified random sampling, sample sizes at the level
of strata should be estimated. An example is provided below for how
{cmd:svysampsi} can be modified for this purpose.{p_end}
{title:Examples}
{pstd}
Example 1: Assumes default settings; prop=0.50, level=95.0, moe=5.0
{cmd}{...}
. svysampsi 10000
{txt}{...}
{pstd}
Example 2: Specifies all options, including an expected response rate of 65%
{cmd}{...}
. svysampsi 10000, prop(0.80) moe(3.0) lev(99) resp(0.65)
{txt}{...}
{pstd}
Example 3: A routine for estimating sample sizes by strata, and then
summing them to provide an overall value. Here we specify 3 strata with
population sizes of 390, 121 and 42, with respective proportions of 0.5,
0.2, and 0.3. We set the moe=3.0, and level=95
{cmd}{...}
scalar ss_all = 0
forval k = 1/3 {
local i : word `k' of 390 121 42
local j : word `k' of 0.5 0.2 0.3
svysampsi `i', p(`j') lev(95) moe(3.0)
scalar ss_all = ss_all + r(adjss)
}
di scalar(ss_all)
di as text "Estimated total required sample size:"
di as text " n = " as result scalar(ss_all)
{txt}{...}
{pstd} Or for a somewhat different look...
{cmd}{...}
tempname resmat
forvalues k = 1/3 {
local i : word `k' of 390 121 42
local j : word `k' of 0.5 0.2 0.3
svysampsi `i', p(`j') lev(95) moe(3.0)
matrix `resmat' = nullmat(`resmat') \ r(adjss)
local names `"`names' `"`i'"'"'
}
mat colnames `resmat' = "Sample Size"
mat rownames `resmat' = `names'
matlist `resmat' , row("Strata Size")
{txt}{...}
{marker results}{...}
{title:Stored results}
{pstd}
{cmd:svysampsi} stores the following in {cmd:r()}:
{synoptset 20 tabbed}{...}
{p2col 5 15 19 2: Scalars}{p_end}
{synopt:{cmd:r(pop)}} user-entered population size {p_end}
{synopt:{cmd:r(prop)}} user-entered proportion {p_end}
{synopt:{cmd:r(moe)}} user-entered margin of error {p_end}
{synopt:{cmd:r(resp)}} user-entered response rate {p_end}
{synopt:{cmd:r(ss)}} unadjusted sample size {p_end}
{synopt:{cmd:r(adjss)}} finite population corrected sample size {p_end}
{synopt:{cmd:r(resp_adjss)}} response-rate adjusted sample size {p_end}
{p2colreset}{...}
{title:References}
{p 4 8 2}
Lohr, Sharon L. 2010. {it:Sampling: Design and Analysis, 2nd Ed.} Boston: Cengage Learning.{p_end}
{p 4 8 2}
Cochran, William G. 1977. {it:Sampling Techniques, 3nd Ed.} New York: John Wiley and Sons, Inc.{p_end}
{p 4 8 2}
Sudman, Seymour. 1976. {it:Applied Sampling.} New York: Academic Press.{p_end}
{p 4 8 2}
Kish, Leslie. 1965. {it:Survey Sampling.} New York: John Wiley and Sons, Inc.{p_end}
{marker citation}
{title:Citation of {cmd:svysampsi}}
{p 4 8 2}{cmd:svysampsi} is not an official Stata command. It is a free contribution
to the research community, like a paper. Please cite it as such: {p_end}
{p 4 4 2}
Linden, Ariel (2013). svysampsi: Stata module for estimating sample size for surveys with a dichotomous outcome variable.
{browse "http://www.lindenconsulting.org":http://www.lindenconsulting.org} {p_end}
{title:Author}
{p 4 8 2} Ariel Linden{p_end}
{p 4 8 2} President, Linden Consulting Group, LLC{p_end}
{p 4 8 2} Ann Arbor, MI, USA{p_end}
{p 4 8 2}{browse "mailto:alinden@lindenconsulting.org":alinden@lindenconsulting.org}{p_end}
{p 4 8 2}{browse "http://www.lindenconsulting.org"}{p_end}
{title:Acknowledgments}
{p 4 4 2} I would like to thank Nicholas J. Cox for providing helpful guidance in writing the code used in Example 3, as
well as providing a review of the overall program code and help file.
{title:Also see}
{p 4 8 2} Manual: {bf:[R] sampsi,} {bf:[D] sample}{p_end}
{p 4 8 2} Online: {helpb sampsi,} {helpb sample}{p_end}