perturb
perturb : any_stata_command , poptions(options) [command_options]
Where poptions can contain the following options:
pvars(varlist) prange(numlist) uniform pfactors(varlist) pcnttabs(string) adjust bestmod qlist(numlist) ulist(numlist) distlist(numlist) assoc(string) statitics(string) format(string) save(string) niter(integer 100) misclass(numlist) verbose
Description
perturb is a tool for assessing ill-conditioning, i.e. the impact of small random changes (perturbations) to variables on parameter estimates. It is an alternative to collinearity diagnostics such as vif, collin, coldiag, coldiag2. perturb works with any model, not just linear regression and is suitable for models with categorical variables, interactions, or non-linear transformations of the independent variables.
perturb works by adding a small random "perturbation" value to selected independent variables, then re-estimating the model. This process is repeated niter times, after which a summary of the means, standard deviation, minimum and maximum of the parameter estimates is displayed. If collinearity is a serious problem in the data, then the estimates will be unstable and vary strongly.
perturb can be used with categorical variables. Categorical variables are reclassified according to a table of reclassification probabilities. There could for example be a 95% probability that each case is recoded to the same category, otherwise it is assigned to one of the others. Reclassification probabilities can be specified in the pcnttabs option. These are adjusted such that the expected frequencies of the reclassified variable are the same as the original and an appropriate pattern of association is imposed between the original and the reclassified variable. See reclass for further details.
If a model contains interaction or nonlinear transformation then perturbations are only added to the main effects/untransformed variables. perturb shows how the perturbations indirectly affect estimates of the derived terms whereas other collinearity diagnostics basicly treat interactions and transformations as separate independent variables.
Options
pvars Contains a list of variables to be perturbed. Random values are added to the variable, after which the model is re-estimated.
prange Contains a list of values determining the magnitude of perturbations. There should be as many prange values as pvars variables.
uniform By default, the random perturbations are drawn from a normal distribution N(0,x), where x is the prange value corresponding with the pvars variable in question. If the option uniform is specified, then the random perturbations are drawn from a uniform distribution U(-x/2,x/2) instead.
pfactors Contains a list of categorical variables to be perturbed.
pcnttabs Contains a list of values corresponding with each entry in pfactors. List elements can be numbers, row or column matrices or square matrices. If matrices are specified, their dimensions should correspond with the number of categories of the pfactor in question.
If a number is specified, its value should be between 0 and 100. The number should indicate the percentage of cases that will be reclassified to the same category. Note that this value is only used to derive initial reclassification probabilities and that the adjusted values will be somewhat different; see reclass for details.
A row or column matrix can also be specified with different values for each category of the pfactor entry. These values must be between 0 and 100 and indicate the probability of reclassification to the same category for each category. See reclass for further details.
If a square matrix is specified, it should specify initial reclassification probabilities with the original variable in the rows and the reclassified variable in the columns. Values need not add to 100 over the columns, this is handled by reclass. A square matrix is taken to indicate that the pfactor entry is an ordered variable.
adjust By default, the reclassification probabilities are adjusted such that the expected frequencies of the reclassified variable are the same as those of the original when the pcnttabs option is used. Use noadjust to suppress this and use the percentages specified in the pcnttabs option unmodified. noadjust implies nobestmod.
bestmod By default, an appropriate pattern of association is imposed between the original and the reclassified variable when the pcnttab option is used. Use nobestmod to avoid this. The reclassification probabilities will be adjusted to make the expected frequencies of the reclassified variable equal to those of the original but they will otherwise be close approximations of the values specified in the pcnttab option.
qlist Contains values for the multiplicative q parameter corresponding with each entry in pfactors. See reclass for further details.
ulist Contains values for multiplicative u corresponding with each entry in pfactors. See reclass for further details.
distlist Contains values for dist corresponding with each entry in pfactors. See reclass for further details.
assoc For users familiar with loglinear mobility models. Defines association patterns corresponding with each entry in pfactors. Each entry should refer to a program in which the variable paras is defined in terms of the variables orig and dest to produce a loglinear pattern of associaton. If assoc is defined, qlist and ulist are ignored.
statistics Specify summary statistics to be produced by tabstat. See the corresponding option in tabstat for valid values. The default is mean st min max.
format A valid format for specifying results of tabstat and reclass. The default is %8.3f.
save Specify a valid filename to save the coefficients as a dataset for further analysis
misclass maintained for compatability with version 1.Translated by reclass into pcnttab(100-misclass) noadjust.
niter Indicates the number of times to re-estimate the model. Default is 100.
verbose Used to print debugging information.
Transformations
Transformations are specified as global variables $ptrans1, $ptrans2, $ptransn. These global variables specify one variable as a function of others using a syntax suitable for a replace statement. For example:
global ptrans1 "exp2=exp^2" #delimit ; perturb: reg ses fses*eyr educyr*eyr fses*exp educyr*exp exp2, poptions(pvars(eyr exp) prange(5 5)) beta; #delimit cr
Categorical variables
In a perturbation analysis, categorical variables are reclassified with a high probability for each case to remain in the same category. The easiest way to do this is to specify an initial table of reclassification probabilities using the pcnttabs option. These initial probabilities will be adjusted so that there is an appropriate pattern of association between the original and the reclassified variable and that the expected frequency distribution of the reclassified variable is identical to that of the original. See reclass for further details.
Dummy variables for the categorical variables can be created using the builtin xi command or with xi3 or desmat, available from the ssc archives. The defcon and desrep options will be recognized if desmat is used. For example:
char eyr[pzat] dir #delimit ; mat p=(96, 4, 1, 0\ 4,91, 4, 1\ 1, 4,91, 4\ 0, 1, 4,96); perturb: desmat: reg ses fegp6 expc eyr, defcon(dev) desrep(all) poptions(pvars(eyr) prange(2.5) pfac(expc fegp6) pcnt(p 96) save("tstdat") replace ) ;
The same example using xi3:
perturb: xi3: reg ses e.fegp6 e.expc eyr, poptions(pvars(eyr) prange(2.5) pfac(expc fegp6) pcnt(p 96));
In these examples, the matrix p contains initial reclassification probabilities for the variable expc. For the variable fegp6, the initial probability of reclassification to the same category is 96% for all categories.
Remarks
perturb saves the coefficients for each interation in a matrix. On completion, the matrix is transformed to data and summarized to show the mean, standard deviation, minimum and maximum of the parameter estimates for the perturbed variables. perturb restores the data to its original state before exiting but the estimates for each iteration are saved in the result r(perturb). The summary statistics are saved as r(StatTot). Optionally, the dataset of coefficients can saved for subsequent analysis. Note that perturb modifies the output of tabstat and prints variable labels instead of variable names. The results using the saved dataset will not have this feature
perturb can be used with estimation procedures other than regress. On the other hand, collinearity is a result of extreme (multiple) correlation among independent variables. Collinearity could therefore be diagnosed by running regress with an arbitrary dependent variable to use perturb, vif and/or collin to assess collinearity. This will certainly be a faster solution since maximum likelihood procedures require iterative solutions whereas ols regression does not. It is possible though that ML procedures are more sensitive to collinearity, in which case perturb would be the preferred solution.
Saved results
r(perturb) A matrix of coefficients (columns) over the iterations (rows)
r(StatTot) A matrix of summary statistics produced by tabstat.
References
Belsley, D.A. (1991). Conditioning diagnostics, collinearity and weak data in regression. New York: John Wiley & Sons.
http://www.xs4all.nl/~jhckx/perturb/
Direct comments to: John Hendrickx
perturb is available at SSC-IDEAS. Use ssc install perturb to obtain the latest version.
collin, coldiag, and coldiag2 are also available from SSC. Click on a name to install or use ssc install
Also see On-line: help for vif, collin, coldiag, coldiag2, reclass