sdlim -- Rescaled standard deviations for limited variables
Syntax 1: Rescaled standard deviations for many variables
sdlim varlist [if] [in] [weight] , limits(# #)} [ simulate(# #)} keep]
Syntax 2: Rescaled standard deviations by groups
sdlim varname [if] [in] [weight] , by(varname ) [ limits(# #)} simulate(# #)} keep]
aweights, and fweights are allowed; see weight.
by is allowed; see by.
sdlim rescales the standard deviation of a variable such that the result is the proportion of the raw standard deviation at the maximum standard deviation for a given mean.
The theoretical maximum of the standard deviation of the variable X with mean = mean(X) is
max(SD) = sqrt((min(X) - mean(X)) * (mean(X)-max(X)) * N/(N-1))
The formula assumes that the mimimum and the maximum of X is known and fixed. This is the case for variables measured with a rating scale, for example.
sdlim has two syntaxes. Syntax 1 is used to to rescale the standard deviations of various variables. In this case the option limits() must be used. Syntax 2 is used to rescale the standard deviation of one variable for different groups. In this case the option by() is required, while the specification of a varlist behind the command is not allowed.
limits(# #) is used to set the theoretical limits of the variable(s) for which the standard deviation should be rescaled. Limits are set by two interger numbers. The first number is the theoretical minimum of the variable the second number is the theoretical maximum. Results are omitted if variables contain values outside its theoretical boundaries. Note that the option is required if the command is given without option by() (i.e. for Syntax 1). For Syntax 2 limits(# #) defaults to the minimum and maximum value of varname over all by-groups.
by(varname) is used to compare rescaled standard deviations between groups defined by categories of varname. With option by(), rescaling of standard deviation can be done for only one variable.
simulate(# #) uses a simulation to rescale the standard deviation of a variable measured with a limited rating scale. The simulation assumes a latent variable with the mean of the observed variable and a given standard deviation. It further assumes that all values of the latent variable that exceed the specified limits are set to the highest and lowest value of the observed variable. Inside the parentheses the option requires two numbers. The first number is the number of observations used in the simulation, the second number is the standard deviation of the latent variable.
keep is used to keep in memory the results of sdlim as a Stata data set (i.e. a resultsset)
. sysuse auto . sdlim rep78, l(1 5) . sdlim rep78, by(for) . sdlim rep78, by(for) l(1 9)
. sdlim rep78, by(for) l(1 9) sim(1000 2)
sdlim does not save returns. However with option keep keep the results are kept in memory as a resultsset.
The formula for the maximum standard deviation is published in Kalmijn, Wim and Ruut Veenhoven, 2005: Measuring inequality of happiness in nations. In search for proper statistics. Journal of Happiness Studies 6, 357-396 (Special issue on "Inequality of Happiness in nations").
The two methods for rescaling of standard deviations are discussed by Delhey, Jan and Kohler, Ulrich: Is Happiness Inequality Immune to Income Inequality? New Evidence through Instrument-Effect-Corrected Standard Deviations. This article is currently under review in Social Science Research.
Ulrich Kohler, WZB firstname.lastname@example.org
Also see Manual: [R] summarize