Title

qic-- QIC criterion for model selection in GEE analyses

Syntax

qicdepvar[indepvars] [if] [in] [,options]

optionsDescription ------------------------------------------------------------------------- Modeli(varname_i)usevarname_ias the panel ID variablet(varname_t)usevarname_tas the time variablefamily(family)distribution ofdepvarlink(link)link functionModel 2

exposure(varname)include ln(varname) in model with coefficient constrained to 1offset(varname)includevarnamein model with coefficient constrained to 1noconstantsuppress constant termforceestimate even if observations unequally spaced in timeCorrelation

corr(correlation)within-group correlation structureSE/Robust

robustsynonym forvce(robust)nmpuse divisor N-P instead of the default Nrgfmultiply the robust variance estimate by (N-1)/(N-P)scale(x2)set scale parameter to Pearson chi-squared statisticscale(dev)set scale parameter to deviance divided by degrees of freedomscale(phi)do not rescale the variancescale(#)set scale parameter to#Reporting

level(#)set confidence level; default islevel(95)eformreport exponentiated coefficientsOpt options

optimize_optionscontrol the optimization process; seldom used

nodisplaysuppress display of header and coefficients-------------------------------------------------------------------------

depvarandindepvarsmay contain time-series operators; see tsvarlist.iweights,fweights, andpweights are allowed; see weight. Weights must be constant within panel.

familyDescription -------------------------------------------------------------------------gaussianGaussian (normal);family(normal)is a synonymigaussianinverse GaussianbinomialBernoulli/binomial (k=1)poissonPoissonnbinomialnegative binomial (k=1)gammagamma -------------------------------------------------------------------------

linkDescription -------------------------------------------------------------------------identityidentity; y=yloglog; ln(y)logitlogit; ln{y/(1-y)}, natural log of the oddsprobitprobit; inverse Gaussian cumulativecloglogcloglog; ln{-ln(1-y)}power[#] power; y^k with k=#; #=1 if not specifiedopower[#] odds power; [{y/(1-y)}^k - 1]/k with k=#; #=1 if not specifiednbinomialnegative binomialreciprocalreciprocal; 1/y -------------------------------------------------------------------------

correlationDescription -------------------------------------------------------------------------exchangeableexchangeableindependentindependentunstructuredunstructuredfixedmatnameuser-specifiedar#autoregressive of order#stationary#stationary of order#nonstationary#nonstationary of order#-------------------------------------------------------------------------

Description

qiccalculates the QIC and QIC_u criteria for model selection in GEE, which is an extension of the widely used AIC criterion in ordinary regression (Pan 2001). It allows for specification of all 7 distributions - gaussian, inverse Gaussian, Bernoulli/binomial, Poisson, negative binomial and gamma, all link functions and working correlation structures and all se/robust options, except for the vce option, avaiable in Stata 9.0. It also calculates the trace of the matrix O^{-1}V, where O is the variance estimate under the independent correlation structure and V is the variance estimate under the specified working correlation structure in GEE. When trace is close to the number of parametr p, the QIC_u is a good approximation to QIC.

Options+-------+ ----+ Model +------------------------------------------------------------

i(varname_i),t(varname_t); see estimation options.

qicdoes not need to knowt()for thecorr(independent)andcorr(exchangeable)correlation structures. Whether you specifyt()makes no difference in these two cases.

family(family)specifies the distribution ofdepvar;family(gaussian)is the default.

link(link)specifies the link function; the default is the canonical link for thefamily()specified.+---------+ ----+ Model 2 +----------------------------------------------------------

exposure(varname)andoffset(varname)are different ways of specifying the same thing.exposure()specifies a variable that reflects the amount of exposure over which thedepvarevents were observed for each observation; ln(varname) with coefficient constrained to be 1 is entered into the regression equation.offset()specifies a variable that is to be entered directly into the log-link function with its coefficient constrained to be 1; thus, exposure is assumed to be e^varname. If you were fitting a Poisson regression model,family(poisson) link(log), for instance, you would account for exposure time for specifyingoffset()containing the log of exposure time.

noconstantspecifies that the linear predictor has no intercept term, thus forcing it through the origin on the scale defined by the link function.

forcespecifies that estimation be forced even thought()is not equally spaced. This is relevant only for correlation structures that require knowledge oft()and that require observations be equally spaced.+-------------+ ----+ Correlation +------------------------------------------------------

corr(correlation); see estimation options.+-----------+ ----+ SE/Robust +--------------------------------------------------------

robustspecifies that the Huber/White/sandwich estimator of variance is to be used in place of the default GLS variance estimator; This produces valid standard errors even if the correlations within group are not as hypothesized by the specified correlation structure. It does, however, require that the model correctly specifies the mean. As such, the resulting standard errors are labeled "semi-robust" instead of "robust". Note that although there is nocluster()option, results are as if there were acluster()option and you specified clustering oni().

nmp; see estimation options.

rgfspecifies that the robust variance estimate is multiplied by (N-1)/(N-P), where N = # of observations, and P = # of coefficients estimated. This option can be used only withfamily(gaussian)whenrobustis either specified or implied by the use ofpweights. Using this option implies that the robust variance estimate is not invariant to the scale of any weights used.

scale(x2|dev|#|phi)overrides the default scale parameter ofscale(1); see estimation options.+-----------+ ----+ Reporting +--------------------------------------------------------

level(#); see estimation options.

eformdisplays the exponentiated coefficients and corresponding standard erros and confidence intervals as described inmaximize. Forfamily(binomial) link(logit)(i.e., logistic regression), exponentiation results in odds ratios; forfamily(poisson) link(log)(i.e., Poisson regression), exponentiated coefficients are incidence-rate ratios.+-------------+ ----+ Opt options +------------------------------------------------------

optimize_optionscontrol the iterative optimization process. These options are seldom used.

iterate(#)specifies the maximum number of iterations. When the number of iterations equals #, the optimization stops and presents the current results, even if the convergence tolerance has not been reached. The default value ofiterate()is 100.

tolerance(#)specifies the tolerance for the coefficient vector. When the relative change in the coefficient vector from one iteration to the next is less than or equal to #, the optimization process is stopped.tolerance(1e-6)is the default.

nologsuppress the display of the iteration log.

tracespecifies that the current estimates should be printed at each iteration.

nodisplaysuppresses the display of the header and coefficients.

Examples1use http://www.stata-press.com/data/r9/nlswork2, clear

iis id

qic ln_w grade age if race == 2

qic ln_w grade age, t(year) corr(uns) scale(dev) force nolog nodis trace

qic ln_w grade age, t(year) corr(exc) force

Examples2use http://www.stata-press.com/data/r9/union, clear

iis idcode

tis year

qic union age grade not_smsa south if black == 1, fam(bin)

qic union age grade not_smsa south, fam(bin) link(probit) corr(uns) force tol(1e-8) iter(20)

qic union age grade not_smsa south, fam(bin) link(cloglog) corr(ar) force scale(x2)

Examples3use http://www.stata-press.com/data/r9/ships, clear

egen wave = group(yr_con yr_op)

iis ship

tis wave

qic accident op_75_79 co_65_69 co_70_74 co_75_79 if wave <= 6, fam(poi) corr(exc) ex(service)

qic accident op_75_79 co_65_69 co_70_74 co_75_79, fam(poi) corr(sta) ex(service) force tol(1e-10) scale(dev)

qic accident op_75_79 co_65_69 co_70_74 co_75_79, fam(poi) corr(exc) ex(service) force nodis

Examples4use http://www.stata-press.com/data/r9/airacc, clear

iis airline

tis time

qic i_cnt inprog if airline <= 15, fam(nb 2) corr(exc) exposure(pmiles)

qic i_cnt inprog, fam(nb 2) corr(sta) exposure(pmiles) force tol(1e-8) nodis

qic i_cnt inprog, fam(nb 2) corr(uns) exposure(pmiles) force scale(x2)

qic i_cnt inprog, fam(gam) corr(sta) exposure(pmiles) force scale(dev)

qic i_cnt inprog, fam(ig) corr(uns) exposure(pmiles) force

ReferenceCui J. QIC program and model selection in GEE analyses.

Stata Journal2007; 7:209-220.Cui J and Qian G. Selection of working correlation structure and best model in GEE analyses of longitudinal data.

Communications inStatistics, Simulation and Computation2007; 36:987-996.Cui J and Feng L. Correlation structure and model selection for negative binomial distribution in GEE.

Communications in Statistics,Simulation and Computation2008 (in press).Pan W. Akaike's information criterion in generalized estimating equations.

Biometrics2001; 57:120-125.

AuthorJames Cui, WHO Collaborating Centre for Obesity Prevention, Deakin University.

Email: jisheng.cui@deakin.edu.au

