```-------------------------------------------------------------------------------
help for perturb                                                 John Hendrickx
-------------------------------------------------------------------------------

perturb

perturb : any_stata_command , poptions(options) [command_options]

Where poptions can contain the following options:

pvars(varlist) prange(numlist) uniform pfactors(varlist)
ulist(numlist) distlist(numlist) assoc(string)
statitics(string) format(string) save(string)
niter(integer 100) misclass(numlist) verbose

Description

perturb is a tool for assessing ill-conditioning, i.e. the impact of small
random changes (perturbations) to variables on parameter estimates. It is an
alternative to collinearity diagnostics such as vif, collin, coldiag, coldiag2.
perturb works with any model, not just linear regression and is suitable for
models with categorical variables, interactions, or non-linear transformations
of the independent variables.

perturb works by adding a small random "perturbation" value to selected
independent variables, then re-estimating the model. This process is repeated
niter times, after which a summary of the means, standard deviation, minimum
and maximum of the parameter estimates is displayed. If collinearity is a
serious problem in the data, then the estimates will be unstable and vary
strongly.

perturb can be used with categorical variables. Categorical variables are
reclassified according to a table of reclassification probabilities. There
could for example be a 95% probability that each case is recoded to the same
category, otherwise it is assigned to one of the others. Reclassification
probabilities can be specified in the pcnttabs option. These are adjusted such
that the expected frequencies of the reclassified variable are the same as the
original and an appropriate pattern of association is imposed between the
original and the reclassified variable. See reclass for further details.

If a model contains interaction or nonlinear transformation then perturbations
are only added to the main effects/untransformed variables. perturb shows how
the perturbations indirectly affect estimates of the derived terms whereas
other collinearity diagnostics basicly treat interactions and transformations
as separate independent variables.

Options

pvars Contains a list of variables to be perturbed. Random values are added to
the variable, after which the model is re-estimated.

prange Contains a list of values determining the magnitude of perturbations.
There should be as many prange values as pvars variables.

uniform By default, the random perturbations are drawn from a normal
distribution N(0,x), where x is the prange value corresponding with the
pvars variable in question. If the option uniform is specified, then the
random perturbations are drawn from a uniform distribution U(-x/2,x/2)

pfactors Contains a list of categorical variables to be perturbed.

pcnttabs Contains a list of values corresponding with each entry in pfactors.
List elements can be numbers, row or column matrices or square matrices. If
matrices are specified, their dimensions should correspond with the number
of categories of the pfactor in question.

If a number is specified, its value should be between 0 and 100. The number
should indicate the percentage of cases that will be reclassified to the
same category. Note that this value is only used to derive initial
reclassification probabilities and that the adjusted values will be
somewhat different; see reclass for details.

A row or column matrix can also be specified with different values for each
category of the pfactor entry. These values must be between 0 and 100 and
indicate the probability of reclassification to the same category for each
category. See reclass for further details.

If a square matrix is specified, it should specify initial reclassification
probabilities with the original variable in the rows and the reclassified
variable in the columns. Values need not add to 100 over the columns, this
is handled by reclass. A square matrix is taken to indicate that the
pfactor entry is an ordered variable.

the expected frequencies of the reclassified variable are the same as those
of the original when the pcnttabs option is used. Use noadjust to suppress
this and use the percentages specified in the pcnttabs option unmodified.

bestmod By default, an appropriate pattern of association is imposed between
the original and the reclassified variable when the pcnttab option is used.
Use nobestmod to avoid this. The reclassification probabilities will be
adjusted to make the expected frequencies of the reclassified variable
equal to those of the original but they will otherwise be close
approximations of the values specified in the pcnttab option.

qlist Contains values for the multiplicative q parameter corresponding with
each entry in pfactors. See reclass for further details.

ulist Contains values for multiplicative u corresponding with each entry in
pfactors. See reclass for further details.

distlist Contains values for dist corresponding with each entry in pfactors.
See reclass for further details.

assoc For users familiar with loglinear mobility models. Defines association
patterns corresponding with each entry in pfactors.  Each entry should
refer to a program in which the variable paras is defined in terms of the
variables orig and dest to produce a loglinear pattern of associaton. If
assoc is defined, qlist and ulist are ignored.

statistics Specify summary statistics to be produced by tabstat. See the
corresponding option in tabstat for valid values. The default is mean st
min max.

format A valid format for specifying results of tabstat and reclass. The
default is %8.3f.

save Specify a valid filename to save the coefficients as a dataset for further
analysis

misclass maintained for compatability with version 1.Translated by reclass into

niter Indicates the number of times to re-estimate the model.  Default is 100.

verbose Used to print debugging information.

Transformations

Transformations are specified as global variables \$ptrans1, \$ptrans2, \$ptransn.
These global variables specify one variable as a function of others using a
syntax suitable for a replace statement. For example:

global ptrans1 "exp2=exp^2"
#delimit ;
perturb: reg ses fses*eyr educyr*eyr fses*exp educyr*exp exp2,
poptions(pvars(eyr exp) prange(5 5)) beta;
#delimit cr

Categorical variables

In a perturbation analysis, categorical variables are reclassified with a high
probability for each case to remain in the same category.  The easiest way to
do this is to specify an initial table of reclassification probabilities using
the pcnttabs option. These initial probabilities will be adjusted so that there
is an appropriate pattern of association between the original and the
reclassified variable and that the expected frequency distribution of the
reclassified variable is identical to that of the original. See reclass for
further details.

Dummy variables for the categorical variables can be created using the builtin
xi command or with xi3 or desmat, available from the ssc archives. The defcon
and desrep options will be recognized if desmat is used. For example:

char eyr[pzat] dir
#delimit ;
mat p=(96, 4, 1, 0\
4,91, 4, 1\
1, 4,91, 4\
0, 1, 4,96);
perturb: desmat: reg ses fegp6 expc eyr, defcon(dev) desrep(all)
poptions(pvars(eyr) prange(2.5) pfac(expc fegp6) pcnt(p 96)
save("tstdat") replace ) ;

The same example using xi3:

perturb: xi3: reg ses e.fegp6 e.expc eyr,
poptions(pvars(eyr) prange(2.5) pfac(expc fegp6) pcnt(p 96));

In these examples, the matrix p contains initial reclassification probabilities
for the variable expc. For the variable fegp6, the initial probability of
reclassification to the same category is 96% for all categories.

Remarks

perturb saves the coefficients for each interation in a matrix.  On completion,
the matrix is transformed to data and summarized to show the mean, standard
deviation, minimum and maximum of the parameter estimates for the perturbed
variables. perturb restores the data to its original state before exiting but
the estimates for each iteration are saved in the result r(perturb). The
summary statistics are saved as r(StatTot). Optionally, the dataset of
coefficients can saved for subsequent analysis. Note that perturb modifies the
output of tabstat and prints variable labels instead of variable names.  The
results using the saved dataset will not have this feature

perturb can be used with estimation procedures other than regress. On the other
hand, collinearity is a result of extreme (multiple) correlation among
independent variables. Collinearity could therefore be diagnosed by running
regress with an arbitrary dependent variable to use perturb, vif and/or collin
to assess collinearity. This will certainly be a faster solution since maximum
likelihood procedures require iterative solutions whereas ols regression does
not. It is possible though that ML procedures are more sensitive to
collinearity, in which case perturb would be the preferred solution.

Saved results

r(perturb)
A matrix of coefficients (columns) over the iterations (rows)

r(StatTot)
A matrix of summary statistics produced by tabstat.

References

Belsley, D.A. (1991).  Conditioning diagnostics, collinearity and weak data in
regression.  New York: John Wiley & Sons.

http://www.xs4all.nl/~jhckx/perturb/