------------------------------------------------------------------------------- help forsampleppsStephen P. Jenkins (June 2005, help revised May 2008)) -------------------------------------------------------------------------------

Draw random sample, proportional to size, of n cases

sampleppsnewvar[ifexp] [inrange] ,ncases(integer)size(sizevar)[withrepl]

Description

sampleppsdraws a random sample withncasesobservations from the current data set, with probabilities proportional to size (`pps'). The default is to select cases without replacement; optionally cases may be selected with replacement.If sampling is without replacement, the variable

newvaris equal to 1 for selected cases, and 0 for non-selected cases. The program returns an error if either the number of cases to be selected is greater than the number of valid observations, or if any observation hasnewvar/(SUM_inewvar) >= 1/ncases.If sampling is with replacement, the variable

newvaris equal to a positive integer for selected cases (the integer is the number of times the case has been selected), and 0 for non-selected cases. For both types of sampling,newvaris missing ifsizevaris missing.If you are serious about drawing random samples, you must first set the random number seed; see generate.

Methods for sampling with probabilities proportional to size are discussed by Lohr (1999). See also Levy and Lemeshow (1991, chapter 11) and Som (1973, chapter 5), who focus on the with-replacement case. The algorithm used by

sampleppsfor the with-replacement case is the standard `cumulative method'. For the without-replacement case, I used an algorithm described by Jean-Yves Pip Courbois (formerly at the University of Washington), orginally due to Madow (1949). For more details, see Brewer and Hanif (1983) and Cochran (1977, p. 265) who cites Hartley and Rao (1962) and Madow (1949).

Options

ncases(integer)specifies the number of observations to be selected.

size(sizevar)specifies the name of the existing variable summarizing `size'.

withreplspecifies selection with replacement. (If the option is specified, a given obs may be selected more than once.)

Saved results

r(ncases)is the integerncases.

r(nobs)is the number of valid observations at risk of being sampled.

r(sizevar)contains the namesizevar.

r(withrepl)= 1 if the with-replacement option was specified.

r(sample)contains the namenewvar.

Examples. // select a sample of schools with selection probabilities depending on # pupils per school.

. use schools.dta, clear

. set seed 123517

. samplepps pick1, size(n_pupils) n(100)

. samplepps pick2, size(n_pupils) n(50) withrepl

AcknowledgementsProgram written with support of ESRC grant number RES-000-22-0995 ("Social segregation in UK schools: benchmarking with international comparisons"). For helpful discussions, I thank project colleagues John Micklewright and Sylke Schnepf, and also Philippe Van Kerm. Steven Samuels due my attention to the references by Cochran, Hartley and Rao, and Madow. Ben Jann drew my attention to the Brewer and Hanif reference.

AuthorStephen P. Jenkins, ISER, University of Essex, U.K. <stephenj@essex.ac.uk>

References

Brewer, K. R. W. and Muhammad Hanif. 1983. Sampling with Unequal Probabilities. New York: Springer.

Cochran, William G. 1977.

Sampling Techniques, 3rd Edition. New York: Wiley.Madow, William G. 1949. On the theory of systematic sampling. II.

Annals ofMathematical Statistics, 19: 535-545.Hartley, H.O. and J.N.K. Rao. 1962. Sampling with unequal probabilities and without replacement.

Annals of Mathematical Statistics, 33: 350-374.Levy, Paul S. and Stanley Lemeshow. 1991.

Sampling of Populations: Methodsand Applications, 2nd edition. New York: John Wiley and Sons.Lohr, Sharon L. 1999.

Sampling: Design and Analysis. Pacific Grove CA: Duxbury Press.Som, Ranjan K. 1973.

Practical Sampling Techniques, second edition, revisedand expanded. New York: Marcel Dekker.

Also seeManual:

[S-Z] sampleOn-line: help for sample, and gsample if installed.