help confa                                                       (SJ9-3: st001)


confa -- Confirmatory factor analysis


confa factorspec [factorspec ...] [if] [in] [weight] [, options]

factorspec is (factorname: varlist)

options Description ------------------------------------------------------------------------- Model correlated(corrspec [corrspec ...]) correlated measurement errors; see below unitvar(factorlist|_all) set variance of the factor(s) to 1 free do not impose any constraints by default; seldom used constraint(numlist) user-supplied constraints; must be used with free missing full-information maximum-likelihood estimation with missing data usenames alternative coefficient labeling Variance estimation vce(vcetype) vcetype may be robust, cluster clustvar, oim, opg, or sbentler Reporting level(#) set confidence level; default is level(95) Other svy respect survey settings from(ones|2sls|ivreg|smart|ml_init_args) control the starting values loglevel(#) specify the details of output; programmers only ml_options maximization options


confa estimates single-level confirmatory factor analysis (CFA) models. In a CFA model, each of the variables is assumed to be an indicator of underlying unobserved factor(s) with a linear dependence between the factors and observed variables:

y_i = m_i + l_i1 f_1 + ... + l_iK f_K + e_i

where y_i is the ith variable in the varlist, m_i is its mean, l_ik are the latent variable loading(s), f_k are the kth latent factor(s) (k = 1,...,K), and e_i is the measurement error. Thus the specification (factorname: varlist) is interpreted as follows: the latent factor f_k is given factorname (for display purposes only); the variables specified in the varlist have their loadings, l_ik, estimated; and all other observed variables in the model have fixed loadings, l_ik = 0.

The model is estimated by the maximum likelihood procedure; see ml.

As with all latent variable models, a number of identifying assumptions need to be made about the latent variables f_k. They are assumed to have mean zero, and their scales are determined by the first variable in the varlist (i.e., l_1k is set to equal 1 for all k). Alternatively, identification can be achieved by setting the variance of the latent variable to 1 (with option unitvar()). More sophisticated identification conditions can be achieved by specifying the free option and then providing the necessary constraints in the constraint() option.

Please cite this package as Kolenikov (2009). See full bibliographic details in References below.


+-------+ ----+ Model +------------------------------------------------------------

correlated(corrspec [corrspec ...]) specifies the correlated measurement errors e_i and e_j. Here corrspec is of the form [(]varname_k:varname_j[)] where varname_k and varname_j are some of the observed variables in the model; that is, they must appear in at least one factorspec statement. If there is only one correlation specified, the optional parentheses shown above may be omitted. There should be no space between the colon and varname_j.

unitvar(factorlist|_all) specifies the factors (from those named in factorspec) that will be identified by setting their variances to 1. The keyword _all can be used to specify that all the factors have their variances set to 1 (and hence the matrix Phi can be interpreted as a correlation matrix).

free frees up all the parameters in the model (making it underidentified). It is then the user's responsibility to provide identification constraints and adjust the degrees of freedom of the tests. This option is seldom used.

constraint(numlist) can be used to supply additional constraints. There are no checks implemented for redundant or conflicting constraints, so in some rare cases, the degrees of freedom may be incorrect. It might be wise to run the model with the free and iterate(0) options and then look at the names in the output of matrix list e(b) to find out the specific names of the parameters.

missing requests full-information maximum-likelihood estimation with missing data. By default, estimation proceeds by listwise deletion.

usenames requests that the parameters be labeled with the names of the variables and factors rather than with numeric values (indices of the corresponding matrices). It is a technical detail that does not affect the estimation procedure in any way, but it is helpful when working with several models simultaneously, tabulating the estimation results, and transferring the starting values between models.

+---------------------+ ----+ Variance estimation +----------------------------------------------

vce(vcetype) specifies different estimators of the variance-covariance matrix. Common estimators (vce(oim), observed information matrix, the default; vce(robust), sandwich information matrix; vce(cluster clustvar), clustered sandwich estimator with clustering on clustvar) are supported, along with their aliases (the robust and cluster(clustvar) options). See vce_option.

An additional estimator specific to structural equation modeling is the Satorra-Bentler estimator (Satorra and Bentler 1994). It is requested by vce(sbentler) or vce(satorrabentler). When this option is specified, additional Satorra-Bentler scaled and adjusted goodness-of-fit statistics are computed and presented in the output.

+-----------+ ----+ Reporting +--------------------------------------------------------

level(#) changes the confidence level for confindence-interval reporting. See estimation options.

+-------+ ----+ Other +------------------------------------------------------------

svy instructs confa to respect the complex survey design, if one is specified. See svyset.

from(ones|2sls|ivreg|smart|ml_init_args) provides the choice of starting values for the maximization procedure. The ml command's internal default is to set all parameters to zero, which leads to a noninvertible matrix, Sigma, and ml has to make many changes to those initial values to find anything feasible. Moreover, this initial search procedure sometimes leads to a domain where the likelihood is nonconcave, and optimization might fail there.

ones sets all the parameters to values of one except for covariance parameters (off-diagonal values of the Phi and Theta matrices), which are set to 0.5. This might be a reasonable choice for data with variances of observed variables close to 1 and positive covariances (no inverted scales).

2sls or ivreg requests that the initial parameters for the freely estimated loadings be set to the two-stage least-squares instrumental-variable estimates of Bollen (1996). This requires the model to be identified by scaling indicators (i.e., setting one of the loadings to 1) and to have at least three indicators for each latent variable. The instruments used are all other indicators of the same factor. No checks for their validity or search for other instruments is performed.

smart provides an alternative set of starting values that is often reasonable (e.g., assuming that the reliability of observed variables is 0.5).

Other specification of starting values, ml_init_args, should follow the format of ml init. Those typically include the list of starting values of the form from(# # ... #, copy) or a matrix of starting values from(matname, [copy|skip]). See [R] ml.

loglevel(#) specifies the details of output about different stages of model setup and estimation, and is likely of interest only to programmers. Higher numbers imply more output.

For other options, see maximize.

Saved results

Aside from the standard estimation results, confa also performs the overall goodness-of-fit test with results saved in e(lr_u), e(df_u), and e(p_u) for the test statistic, its goodness of fit, and the resulting p-value. A test versus the model with the independent data is provided with the ereturn results with the indep suffix. Here, under the null hypothesis, the covariance matrix is assumed diagonal.

When sbentler is specified, Satorra-Bentler standard errors are computed and posted as e(V), with intermediate matrices saved in e(SBU), e(SBV), e(SBGamma), and e(SBDelta). Also, a number of corrected overall fit test statistics is reported and saved: T scaled (ereturn results with the Tsc suffix) and T adjusted (ereturn results with the Tadj suffix). Scalars e(SBc) and e(SBd) are the scaling constants, with the latter also being the approximate degrees of freedom of the chi-squared test from Satorra and Bentler (1994), and T double bar from Yuan and Bentler (1997) (with the T2 suffix).


confa relies on listutil for some parsing tasks. If it is not installed with your Stata, confa will try to install it from {SSC:ssc install listutil}. If installation is unsuccessful, confa will issue an error message and stop.

In large models, confa may be restricted by Stata limits of 244 characters in the string expression. The user might want to rename their variables and give them shorter names.


Holzinger-Swineford data . use

Basic model with different starting values . confa (vis: x1 x2 x3) (text: x4 x5 x6) (math: x7 x8 x9), from(ones) . confa (vis: x1 x2 x3) (text: x4 x5 x6) (math: x7 x8 x9), from(iv) . confa (vis: x1 x2 x3) (text: x4 x5 x6) (math: x7 x8 x9), from(smart)

Robust and Satorra-Bentler standard errors . confa (vis: x1 x2 x3) (text: x4 x5 x6) (math: x7 x8 x9), from(iv) vce(sbentler) . confa (vis: x1 x2 x3) (text: x4 x5 x6) (math: x7 x8 x9), from(iv) robust

Correlated measurement errors . confa (vis: x1 x2 x3) (text: x4 x5 x6) (math: x7 x8 x9), from(iv) corr( x7:x8 )

An alternative identification . confa (vis: x1 x2 x3) (text: x4 x5 x6) (math: x7 x8 x9), from(ones) unitvar(_all) corr(x7:x8)

Missing data . forvalues k=1/9 { . gen y`k' = cond( uniform()<0.0`k', ., x`k') . } . confa (vis: y1 y2 y3) (text: y4 y5 y6) (math: y7 y8 y9), from(iv) . confa (vis: y1 y2 y3) (text: y4 y5 y6) (math: y7 y8 y9), from(iv) missing difficult


Bollen, K. A. 1996. An alternative two stage least squares (2SLS) estimator for latent variable equations. Psychometrika 61: 109-121.

Kolenikov, S. 2009. Confirmatory factor analysis using confa. Stata Journal, 9(3): 329--373.

Satorra, A., and P. M. Bentler. 1994. Corrections to test statistics and standard errors in covariance structure analysis. In Latent Variables Analysis, ed. A. von Eye and C. C. Clogg, 399-419. Thousand Oaks, CA: Sage.

Yuan, K.-H., and P. M. Bentler. 1997. Mean and covariance structure analysis: Theoretical and practical improvements. Journal of the American Statistical Association 92: 767-774.


Stanislav Kolenikov Department of Statistics University of Missouri Columbia, MO

Also see

Article: Stata Journal, volume 9, number 3: st0001

Online: factor, bollenstine, confa postestimation (if installed), gllamm (if installed).