help confa(SJ9-3: st001) -------------------------------------------------------------------------------

Title

confa-- Confirmatory factor analysis

Syntax

confafactorspec[factorspec ...] [if] [in] [weight] [,options]

factorspecis(factorname:varlist)

optionsDescription ------------------------------------------------------------------------- Modelcorrelated(corrspec[corrspec...])correlated measurement errors; see belowunitvar(factorlist|_all)set variance of the factor(s) to 1freedo not impose any constraints by default; seldom usedconstraint(numlist)user-supplied constraints; must be used withfreemissingfull-information maximum-likelihood estimation with missing datausenamesalternative coefficient labeling Variance estimationvce(vcetype)vcetypemay berobust,clusterclustvar,oim,opg, orsbentlerReportinglevel(#)set confidence level; default islevel(95)Othersvyrespect survey settingsfrom(ones|2sls|ivreg|smart|ml_init_args)control the starting valuesloglevel(#)specify the details of output; programmers onlyml_optionsmaximization options

Description

confaestimates single-level confirmatory factor analysis (CFA) models. In a CFA model, each of the variables is assumed to be an indicator of underlying unobserved factor(s) with a linear dependence between the factors and observed variables:

y_i=m_i+l_i1 f_1+ ... +l_iK f_K+e_iwhere

y_iis theith variable in thevarlist,m_iis its mean,l_ikare the latent variable loading(s),f_kare thekth latent factor(s) (k= 1,...,K), ande_iis the measurement error. Thus the specification(factorname:varlist)is interpreted as follows: the latent factorf_kis givenfactorname(for display purposes only); the variables specified in thevarlisthave their loadings,l_ik, estimated; and all other observed variables in the model have fixed loadings,l_ik= 0.The model is estimated by the maximum likelihood procedure; see

ml.As with all latent variable models, a number of identifying assumptions need to be made about the latent variables

f_k. They are assumed to have mean zero, and their scales are determined by the first variable in thevarlist(i.e.,l_1kis set to equal 1 for allk). Alternatively, identification can be achieved by setting the variance of the latent variable to 1 (with optionunitvar()). More sophisticated identification conditions can be achieved by specifying thefreeoption and then providing the necessary constraints in theconstraint()option.Please cite this package as Kolenikov (2009). See full bibliographic details in References below.

Options+-------+ ----+ Model +------------------------------------------------------------

correlated(corrspec[corrspec...])specifies the correlated measurement errorse_iande_j. Herecorrspecis of the form [(]varname_k:varname_j[)] wherevarname_kandvarname_jare some of the observed variables in the model; that is, they must appear in at least onefactorspecstatement. If there is only one correlation specified, the optional parentheses shown above may be omitted. There should be no space between the colon andvarname_j.

unitvar(factorlist|_all)specifies the factors (from those named infactorspec) that will be identified by setting their variances to 1. The keyword_allcan be used to specify that all the factors have their variances set to 1 (and hence the matrix Phi can be interpreted as a correlation matrix).

freefrees up all the parameters in the model (making it underidentified). It is then the user's responsibility to provide identification constraints and adjust the degrees of freedom of the tests. This option is seldom used.

constraint(numlist)can be used to supply additional constraints. There are no checks implemented for redundant or conflicting constraints, so in some rare cases, the degrees of freedom may be incorrect. It might be wise to run the model with thefreeanditerate(0)options and then look at the names in the output ofmatrix list e(b)to find out the specific names of the parameters.

missingrequests full-information maximum-likelihood estimation with missing data. By default, estimation proceeds by listwise deletion.

usenamesrequests that the parameters be labeled with the names of the variables and factors rather than with numeric values (indices of the corresponding matrices). It is a technical detail that does not affect the estimation procedure in any way, but it is helpful when working with several models simultaneously, tabulating the estimation results, and transferring the starting values between models.+---------------------+ ----+ Variance estimation +----------------------------------------------

vce(vcetype)specifies different estimators of the variance-covariance matrix. Common estimators (vce(oim), observed information matrix, the default;vce(robust), sandwich information matrix;vce(clusterclustvar), clustered sandwich estimator with clustering onclustvar) are supported, along with their aliases (therobustandcluster(clustvar)options). See vce_option.An additional estimator specific to structural equation modeling is the Satorra-Bentler estimator (Satorra and Bentler 1994). It is requested by

vce(sbentler)orvce(satorrabentler). When this option is specified, additional Satorra-Bentler scaled and adjusted goodness-of-fit statistics are computed and presented in the output.+-----------+ ----+ Reporting +--------------------------------------------------------

level(#)changes the confidence level for confindence-interval reporting. Seeestimation options.+-------+ ----+ Other +------------------------------------------------------------

svyinstructsconfato respect the complex survey design, if one is specified. Seesvyset.

from(ones|2sls|ivreg|smart|ml_init_args)provides the choice of starting values for the maximization procedure. Themlcommand's internal default is to set all parameters to zero, which leads to a noninvertible matrix, Sigma, andmlhas to make many changes to those initial values to find anything feasible. Moreover, this initial search procedure sometimes leads to a domain where the likelihood is nonconcave, and optimization might fail there.

onessets all the parameters to values of one except for covariance parameters (off-diagonal values of the Phi and Theta matrices), which are set to 0.5. This might be a reasonable choice for data with variances of observed variables close to 1 and positive covariances (no inverted scales).

2slsorivregrequests that the initial parameters for the freely estimated loadings be set to the two-stage least-squares instrumental-variable estimates of Bollen (1996). This requires the model to be identified by scaling indicators (i.e., setting one of the loadings to 1) and to have at least three indicators for each latent variable. The instruments used are all other indicators of the same factor. No checks for their validity or search for other instruments is performed.

smartprovides an alternative set of starting values that is often reasonable (e.g., assuming that the reliability of observed variables is 0.5).Other specification of starting values,

ml_init_args, should follow the format ofml init. Those typically include the list of starting values of the formfrom(# #...#, copy)or a matrix of starting valuesfrom(matname,[copy|skip]). See[R] ml.

loglevel(#)specifies the details of output about different stages of model setup and estimation, and is likely of interest only to programmers. Higher numbers imply more output.For other options, see

maximize.

Saved resultsAside from the standard

estimation results,confaalso performs the overall goodness-of-fit test with results saved ine(lr_u),e(df_u), ande(p_u)for the test statistic, its goodness of fit, and the resulting p-value. A test versus the model with the independent data is provided with theereturnresults with theindepsuffix. Here, under the null hypothesis, the covariance matrix is assumed diagonal.When

sbentleris specified, Satorra-Bentler standard errors are computed and posted ase(V), with intermediate matrices saved ine(SBU),e(SBV),e(SBGamma), ande(SBDelta). Also, a number of corrected overall fit test statistics is reported and saved: T scaled (ereturnresults with theTscsuffix) and T adjusted (ereturnresults with theTadjsuffix). Scalarse(SBc)ande(SBd)are the scaling constants, with the latter also being the approximate degrees of freedom of the chi-squared test from Satorra and Bentler (1994), and T double bar from Yuan and Bentler (1997) (with theT2suffix).

Remarks

confarelies on listutil for some parsing tasks. If it is not installed with your Stata,confawill try to install it from {SSC:ssc install listutil}. If installation is unsuccessful,confawill issue an error message and stop.In large models,

confamay be restricted by Stata limits of 244 characters in the string expression. The user might want to rename their variables and give them shorter names.

ExamplesHolzinger-Swineford data

. use http://web.missouri.edu/~kolenikovs/stata/hs-cfa.dtaBasic model with different starting values

. confa (vis: x1 x2 x3) (text: x4 x5 x6) (math: x7 x8 x9), from(ones). confa (vis: x1 x2 x3) (text: x4 x5 x6) (math: x7 x8 x9), from(iv). confa (vis: x1 x2 x3) (text: x4 x5 x6) (math: x7 x8 x9),from(smart)Robust and Satorra-Bentler standard errors

. confa (vis: x1 x2 x3) (text: x4 x5 x6) (math: x7 x8 x9), from(iv)vce(sbentler). confa (vis: x1 x2 x3) (text: x4 x5 x6) (math: x7 x8 x9), from(iv)robustCorrelated measurement errors

. confa (vis: x1 x2 x3) (text: x4 x5 x6) (math: x7 x8 x9), from(iv)corr( x7:x8 )An alternative identification

. confa (vis: x1 x2 x3) (text: x4 x5 x6) (math: x7 x8 x9), from(ones)unitvar(_all) corr(x7:x8)Missing data

. forvalues k=1/9 {. gen y`k' = cond( uniform()<0.0`k', ., x`k'). }. confa (vis: y1 y2 y3) (text: y4 y5 y6) (math: y7 y8 y9), from(iv). confa (vis: y1 y2 y3) (text: y4 y5 y6) (math: y7 y8 y9), from(iv)missing difficult

ReferencesBollen, K. A. 1996. An alternative two stage least squares (2SLS) estimator for latent variable equations.

Psychometrika61: 109-121.Kolenikov, S. 2009. Confirmatory factor analysis using

confa.StataJournal, 9(3): 329--373.Satorra, A., and P. M. Bentler. 1994. Corrections to test statistics and standard errors in covariance structure analysis. In

Latent VariablesAnalysis, ed. A. von Eye and C. C. Clogg, 399-419. Thousand Oaks, CA: Sage.Yuan, K.-H., and P. M. Bentler. 1997. Mean and covariance structure analysis: Theoretical and practical improvements.

Journal of theAmerican Statistical Association92: 767-774.

AuthorStanislav Kolenikov Department of Statistics University of Missouri Columbia, MO kolenikovs@missouri.edu

Also seeArticle:

Stata Journal, volume 9, number 3: st0001Online:

factor,bollenstine,confa postestimation(if installed),gllamm(if installed).