------------------------------------------------------------------------------- help forperturbJohn Hendrickx -------------------------------------------------------------------------------

perturb

perturb:any_stata_command,poptions(options)[command_options]Where

poptionscan contain the following options:

pvars(varlist)prange(numlist)uniformpfactors(varlist)pcnttabs(string)adjustbestmodqlist(numlist)ulist(numlist)distlist(numlist)assoc(string)statitics(string)format(string)save(string)niter(integer 100)misclass(numlist)verbose

Description

perturbis a tool for assessing ill-conditioning, i.e. the impact of small random changes (perturbations) to variables on parameter estimates. It is an alternative to collinearity diagnostics such as vif, collin, coldiag, coldiag2.perturbworks with any model, not just linear regression and is suitable for models with categorical variables, interactions, or non-linear transformations of the independent variables.

perturbworks by adding a small random "perturbation" value to selected independent variables, then re-estimating the model. This process is repeatednitertimes, after which a summary of the means, standard deviation, minimum and maximum of the parameter estimates is displayed. If collinearity is a serious problem in the data, then the estimates will be unstable and vary strongly.

perturbcan be used with categorical variables. Categorical variables are reclassified according to a table of reclassification probabilities. There could for example be a 95% probability that each case is recoded to the same category, otherwise it is assigned to one of the others. Reclassification probabilities can be specified in thepcnttabsoption. These are adjusted such that the expected frequencies of the reclassified variable are the same as the original and anappropriatepattern of association is imposed between the original and the reclassified variable. See reclass for further details.If a model contains interaction or nonlinear transformation then perturbations are only added to the main effects/untransformed variables.

perturbshows how the perturbations indirectly affect estimates of the derived terms whereas other collinearity diagnostics basicly treat interactions and transformations as separate independent variables.

Options

pvarsContains a list of variables to be perturbed. Random values are added to the variable, after which the model is re-estimated.

prangeContains a list of values determining the magnitude of perturbations. There should be as manyprangevalues aspvarsvariables.

uniformBy default, the random perturbations are drawn from a normal distribution N(0,x), wherexis theprangevalue corresponding with thepvarsvariable in question. If the optionuniformis specified, then the random perturbations are drawn from a uniform distribution U(-x/2,x/2) instead.

pfactorsContains a list of categorical variables to be perturbed.

pcnttabsContains a list of values corresponding with each entry inpfactors. List elements can be numbers, row or column matrices or square matrices. If matrices are specified, their dimensions should correspond with the number of categories of thepfactorin question.If a number is specified, its value should be between 0 and 100. The number should indicate the percentage of cases that will be reclassified to the same category. Note that this value is only used to derive initial reclassification probabilities and that the adjusted values will be somewhat different; see reclass for details.

A row or column matrix can also be specified with different values for each category of the

pfactorentry. These values must be between 0 and 100 and indicate the probability of reclassification to the same category for each category. See reclass for further details.If a square matrix is specified, it should specify initial reclassification probabilities with the original variable in the rows and the reclassified variable in the columns. Values need not add to 100 over the columns, this is handled by reclass. A square matrix is taken to indicate that the

pfactorentry is an ordered variable.

adjustBy default, the reclassification probabilities are adjusted such that the expected frequencies of the reclassified variable are the same as those of the original when thepcnttabsoption is used. Usenoadjustto suppress this and use the percentages specified in thepcnttabsoption unmodified.noadjustimpliesnobestmod.

bestmodBy default, an appropriate pattern of association is imposed between the original and the reclassified variable when thepcnttaboption is used. Usenobestmodto avoid this. The reclassification probabilities will be adjusted to make the expected frequencies of the reclassified variable equal to those of the original but they will otherwise be close approximations of the values specified in thepcnttaboption.

qlistContains values for the multiplicativeqparameter corresponding with each entry inpfactors. See reclass for further details.

ulistContains values for multiplicativeucorresponding with each entry inpfactors. See reclass for further details.

distlistContains values fordistcorresponding with each entry inpfactors. See reclass for further details.

assocFor users familiar with loglinear mobility models. Defines association patterns corresponding with each entry inpfactors. Each entry should refer to aprogramin which the variableparasis defined in terms of the variablesoriganddestto produce a loglinear pattern of associaton. Ifassocis defined,qlistandulistare ignored.

statisticsSpecify summary statistics to be produced by tabstat. See the corresponding option intabstatfor valid values. The default ismean stmin max.

formatA valid format for specifying results of tabstat and reclass. The default is %8.3f.

saveSpecify a valid filename to save the coefficients as a dataset for further analysis

misclassmaintained for compatability with version 1.Translated byreclassintopcnttab(100-misclass) noadjust.

niterIndicates the number of times to re-estimate the model. Default is 100.

verboseUsed to print debugging information.

TransformationsTransformations are specified as global variables $ptrans1, $ptrans2, $ptrans

n. These global variables specify one variable as a function of others using a syntax suitable for a replace statement. For example:global ptrans1 "exp2=exp^2" #delimit ; perturb: reg ses fses*eyr educyr*eyr fses*exp educyr*exp exp2, poptions(pvars(eyr exp) prange(5 5)) beta; #delimit cr

Categorical variablesIn a perturbation analysis, categorical variables are reclassified with a high probability for each case to remain in the same category. The easiest way to do this is to specify an initial table of reclassification probabilities using the

pcnttabsoption. These initial probabilities will be adjusted so that there is an appropriate pattern of association between the original and the reclassified variable and that the expected frequency distribution of the reclassified variable is identical to that of the original. See reclass for further details.Dummy variables for the categorical variables can be created using the builtin xi command or with xi3 or desmat, available from the ssc archives. The

defconanddesrepoptions will be recognized ifdesmatis used. For example:char eyr[pzat] dir #delimit ; mat p=(96, 4, 1, 0\ 4,91, 4, 1\ 1, 4,91, 4\ 0, 1, 4,96); perturb: desmat: reg ses fegp6 expc eyr, defcon(dev) desrep(all) poptions(pvars(eyr) prange(2.5) pfac(expc fegp6) pcnt(p 96) save("tstdat") replace ) ;

The same example using

xi3:perturb: xi3: reg ses e.fegp6 e.expc eyr, poptions(pvars(eyr) prange(2.5) pfac(expc fegp6) pcnt(p 96));

In these examples, the matrix

pcontains initial reclassification probabilities for the variableexpc. For the variablefegp6, the initial probability of reclassification to the same category is 96% for all categories.

Remarks

perturbsaves the coefficients for each interation in a matrix. On completion, the matrix is transformed to data and summarized to show the mean, standard deviation, minimum and maximum of the parameter estimates for theperturbedvariables.perturbrestores the data to its original state before exiting but the estimates for each iteration are saved in the result r(perturb). The summary statistics are saved as r(StatTot). Optionally, the dataset of coefficients can saved for subsequent analysis. Note thatperturbmodifies the output oftabstatand prints variable labels instead of variable names. The results using the saved dataset will not have this feature

perturbcan be used with estimation procedures other than regress. On the other hand, collinearity is a result of extreme (multiple) correlation among independent variables. Collinearity could therefore be diagnosed by runningregresswith an arbitrary dependent variable to useperturb, vif and/or collin to assess collinearity. This will certainly be a faster solution since maximum likelihood procedures require iterative solutions whereas ols regression does not. It is possible though that ML procedures are more sensitive to collinearity, in which caseperturbwould be the preferred solution.

Saved results

r(perturb)A matrix of coefficients (columns) over the iterations (rows)

r(StatTot)A matrix of summary statistics produced by tabstat.

ReferencesBelsley, D.A. (1991).

Conditioning diagnostics, collinearity and weak data inregression. New York: John Wiley & Sons.http://www.xs4all.nl/~jhckx/perturb/

Direct comments to: John Hendrickx

perturbis available at SSC-IDEAS. Use sscinstall perturbto obtain the latest version.collin, coldiag, and coldiag2 are also available from SSC. Click on a name to install or use

ssc install

On-line: help for vif, collin, coldiag, coldiag2, reclassAlso see