{smcl}
{.-}
help for {cmd:perturb} {right: {browse "mailto:John_Hendrickx@yahoo.com":John Hendrickx}}
{.-}
{title:perturb}
{p 8 27}
{cmd:perturb} : {it:any_stata_command} ,
{cmdab:popt:ions(}{it:options}{cmd:)}
[{it:command_options}]
{p}
Where {cmd:poptions} can contain the following options:
{p 8 27}
{cmdab:pv:ars(}{it:varlist}{cmd:)}
{cmdab:pr:ange(}{it:numlist}{cmd:)}
{cmdab:u:niform}
{cmdab:pf:actors(}{it:varlist}{cmd:)}
{cmdab:pc:nttabs(}{it:string}{cmd:)}
{cmdab:ad:just}
{cmdab:b:estmod}
{cmdab:q:list(}{it:numlist}{cmd:)}
{cmdab:u:list(}{it:numlist}{cmd:)}
{cmdab:d:istlist(}{it:numlist}{cmd:)}
{cmdab:a:ssoc(}{it:string}{cmd:)}
{cmdab:s:tatitics(}{it:string}{cmd:)}
{cmdab:f:ormat(}{it:string}{cmd:)}
{cmd:save(}{it:string}{cmd:)}
{cmdab:n:iter(}{input:integer} 100{cmd:)}
{cmdab:m:isclass(}{it:numlist}{cmd:)}
{cmdab:v:erbose}
{title:Description}
{p}
{cmd:perturb} is a tool for assessing ill-conditioning, i.e. the
impact of small random changes (perturbations) to variables on
parameter estimates. It is an alternative to collinearity diagnostics
such as
{help vif}, {help collin}, {help coldiag}, {help coldiag2}.
{cmd:perturb} works with any model, not just linear regression and is
suitable for models with categorical variables, interactions, or
non-linear transformations of the independent variables.
{p}
{cmd:perturb} works by adding a small random "perturbation" value to
selected independent variables, then re-estimating the model. This
process is repeated {it:niter} times, after which a summary of the
means, standard deviation, minimum and maximum of the parameter
estimates is displayed. If collinearity is a serious problem in the
data, then the estimates will be unstable and vary strongly.
{p}
{cmd:perturb} can be used with categorical variables. Categorical
variables are reclassified according to a table of reclassification
probabilities. There could for example be a 95% probability that each
case is recoded to the same category, otherwise it is assigned to one
of the others. Reclassification probabilities can be specified in the
{cmd:pcnttabs} option. These are adjusted such that the expected
frequencies of the reclassified variable are the same as the original
and an {it:appropriate} pattern of association is imposed between the
original and the reclassified variable. See {help reclass} for further
details.
{p}
If a model contains interaction or nonlinear transformation then
perturbations are only added to the main effects/untransformed
variables. {cmd:perturb} shows how the perturbations indirectly affect
estimates of the derived terms whereas other collinearity diagnostics
basicly treat interactions and transformations as separate independent
variables.
{title:Options}
{p 0 4}
{cmd:pvars} Contains a list of variables to be perturbed. Random
values are added to the variable, after which the model is
re-estimated.
{p 0 4}
{cmd:prange} Contains a list of values determining the magnitude of
perturbations. There should be as many {it:prange} values as
{it:pvars} variables.
{p 0 4}
{cmd:uniform} By default, the random perturbations are drawn from a
normal distribution N(0,{it:x}), where {it:x} is the {it:prange} value
corresponding with the {it:pvars} variable in question. If the option
{cmd:uniform} is specified, then the random perturbations are drawn
from a uniform distribution U(-{it:x}/2,{it:x}/2) instead.
{p 0 4}
{cmd:pfactors} Contains a list of categorical variables to be
perturbed.
{p 0 4}
{cmd:pcnttabs} Contains a list of values corresponding with each entry
in {cmd:pfactors}. List elements can be numbers, row or column
matrices or square matrices. If matrices are specified, their
dimensions should correspond with the number of categories of the
{it:pfactor} in question.
{p 4 4}
If a number is specified, its value should be between 0 and 100. The
number should indicate the percentage of cases that will be
reclassified to the same category. Note that this value is only used
to derive initial reclassification probabilities and that the adjusted
values will be somewhat different; see {help reclass} for details.
{p 4 4}
A row or column matrix can also be specified with different values for
each category of the {it:pfactor} entry. These values must be between
0 and 100 and indicate the probability of reclassification to the same
category for each category. See {help reclass} for further details.
{p 4 4}
If a square matrix is specified, it should specify initial
reclassification probabilities with the original variable in the rows
and the reclassified variable in the columns. Values need not add to
100 over the columns, this is handled by {help reclass}. A square
matrix is taken to indicate that the {it:pfactor} entry is an ordered
variable.
{p 0 4}
{cmd:adjust} By default, the reclassification probabilities are
adjusted such that the expected frequencies of the reclassified
variable are the same as those of the original when the {cmd:pcnttabs}
option is used. Use {cmd:noadjust} to suppress this and use the
percentages specified in the {cmd:pcnttabs} option unmodified.
{cmd:noadjust} implies {cmd:nobestmod}.
{p 0 4}
{cmd:bestmod} By default, an appropriate pattern of association is
imposed between the original and the reclassified variable when the
{cmd:pcnttab} option is used. Use {cmd:nobestmod} to avoid this. The
reclassification probabilities will be adjusted to make the expected
frequencies of the reclassified variable equal to those of the
original but they will otherwise be close approximations of the values
specified in the {cmd:pcnttab} option.
{p 0 4}
{cmd:qlist} Contains values for the multiplicative {cmd:q} parameter
corresponding with each entry in {cmd:pfactors}. See {help reclass}
for further details.
{p 0 4}
{cmd:ulist} Contains values for multiplicative {cmd:u} corresponding
with each entry in {cmd:pfactors}. See {help reclass} for further
details.
{p 0 4}
{cmd:distlist} Contains values for {cmd:dist} corresponding with each
entry in {cmd:pfactors}. See {help reclass} for further details.
{p 0 4}
{cmd:assoc} For users familiar with loglinear mobility models. Defines
association patterns corresponding with each entry in {it:pfactors}.
Each entry should refer to a {cmd:program} in which the variable
{cmd:paras} is defined in terms of the variables {cmd:orig} and
{cmd:dest} to produce a loglinear pattern of associaton. If
{cmd:assoc} is defined, {cmd:qlist} and {cmd:ulist} are ignored.
{p 0 4}
{cmd:statistics} Specify summary statistics to be produced by
{help tabstat}. See the corresponding option in {cmd:tabstat} for
valid values. The default is {hi:mean st min max}.
{p 0 4}
{cmd:format} A valid format for specifying results of {help tabstat}
and {help reclass}. The default is %8.3f.
{p 0 4}
{cmd:save} Specify a valid filename to save the coefficients as a
dataset for further analysis
{p 0 4}
{cmd:misclass} maintained for compatability with version 1.Translated
by {cmd:reclass} into
{cmd:pcnttab(100-}{it:misclass}{cmd:) noadjust}.
{p 0 4}
{cmd:niter} Indicates the number of times to re-estimate the model.
Default is 100.
{p 0 4}
{cmd:verbose} Used to print debugging information.
{title:Transformations}
{p}
Transformations are specified as global variables {input:$ptrans1},
{input:$ptrans2}, {input:$ptrans}{it:n}. These global variables
specify one variable as a function of others using a syntax suitable
for a {input:replace} statement. For example:
{input:global ptrans1 "exp2=exp^2"}
{input:#delimit ;}
{input:perturb: reg ses fses*eyr educyr*eyr fses*exp educyr*exp exp2,}
{input:poptions(pvars(eyr exp) prange(5 5)) beta;}
{input:#delimit cr}
{title:Categorical variables}
{p}
In a perturbation analysis, categorical variables are reclassified
with a high probability for each case to remain in the same category.
The easiest way to do this is to specify an initial table of
reclassification probabilities using the {cmd:pcnttabs} option. These
initial probabilities will be adjusted so that there is an appropriate
pattern of association between the original and the reclassified
variable and that the expected frequency distribution of the
reclassified variable is identical to that of the original. See
{help reclass} for further details.
{p}
Dummy variables for the categorical variables can be created using the
builtin {help xi} command or with {help xi3} or {help desmat},
available from the {help ssc} archives. The {cmd:defcon} and
{cmd:desrep} options will be recognized if {cmd:desmat} is used. For
example:
{input:char eyr[pzat] dir }
{input:#delimit ; }
{input:mat p=(96, 4, 1, 0\ }
{input: 4,91, 4, 1\ }
{input: 1, 4,91, 4\ }
{input: 0, 1, 4,96); }
{input:perturb: desmat: reg ses fegp6 expc eyr, defcon(dev) desrep(all)}
{input: poptions(pvars(eyr) prange(2.5) pfac(expc fegp6) pcnt(p 96)}
{input: save("tstdat") replace ) ; }
{p}
The same example using {cmd:xi3}:
{input:perturb: xi3: reg ses e.fegp6 e.expc eyr, }
{input: poptions(pvars(eyr) prange(2.5) pfac(expc fegp6) pcnt(p 96));}
{p}
In these examples, the matrix {cmd:p} contains initial
reclassification probabilities for the variable {cmd:expc}. For the
variable {cmd:fegp6}, the initial probability of reclassification to
the same category is 96% for all categories.
{title:Remarks}
{p}
{cmd:perturb} saves the coefficients for each interation in a matrix.
On completion, the matrix is transformed to data and summarized to
show the mean, standard deviation, minimum and maximum of the
parameter estimates for the {it:perturbed} variables. {cmd:perturb}
restores the data to its original state before exiting but the
estimates for each iteration are saved in the result
{result:r(perturb)}. The summary statistics are saved as
{result:r(StatTot)}. Optionally, the dataset of coefficients can saved
for subsequent analysis. Note that {cmd:perturb} modifies the output
of {cmd:tabstat} and prints variable labels instead of variable names.
The results using the saved dataset will not have this feature
{p}
{cmd:perturb} can be used with estimation procedures other than
{help regress}. On the other hand, collinearity is a result of extreme
(multiple) correlation among independent variables. Collinearity could
therefore be diagnosed by running {cmd:regress} with an arbitrary
dependent variable to use {cmd:perturb}, {help vif} and/or
{help collin} to assess collinearity. This will certainly be a faster
solution since maximum likelihood procedures require iterative
solutions whereas ols regression does not. It is possible though that
ML procedures are more sensitive to collinearity, in which case
{cmd:perturb} would be the preferred solution.
{title:Saved results}
{p 0 4}
{cmd:r(perturb)}
{break}A matrix of coefficients (columns) over the iterations (rows)
{p 0 4}
{cmd:r(StatTot)}
{break}A matrix of summary statistics produced by {help tabstat}.
{title:References}
{p 0 4}
Belsley, D.A. (1991).
{it:Conditioning diagnostics, collinearity and weak data in regression}.
New York: John Wiley & Sons.
{p 0 4}
{browse "http://www.xs4all.nl/~jhckx/perturb/":http://www.xs4all.nl/~jhckx/perturb/}
Direct comments to: {browse "mailto:John_Hendrickx@yahoo.com":John Hendrickx}
{p}
{cmd:perturb} is available at
{browse "http://ideas.uqam.ca/ideas/data/bocbocode.html":SSC-IDEAS}.
Use {help ssc} {cmd:install perturb} to obtain the latest version.
{p}
{net search collin:collin}, {net search coldiag:coldiag}, and
{net search coldiag2:coldiag2}
are also available from SSC. Click on a name to install or use
{cmd:ssc install}
{title:Also see}
{p 0 21}
On-line: help for
{help vif}, {help collin}, {help coldiag}, {help coldiag2},
{help reclass}
{p_end}