{smcl}
{* 15May2011}{...}
{* 30Aug2009}{...}
{* 24Mar2009}{...}
{* 24sep2006}{...}
{hline}
help for {hi:propcnsreg}
{hline}
{title:Fitting a measurement model with causal indicators}
{p 8 17 2}
{cmd:propcnsreg}
{depvar}
[{indepvars}]
{ifin}
{weight}
{cmd:,}
{opt con:strained(varlist_c)}
{opt lambda(varlist_l)}
[
{opt stand:ardized}
{opt lcons}
{opt unit(varname)}
{opt mimic}
{opt logit}
{cmdab:r:obust}
{opt cl:uster(clustervar)}
{opt l:evel(#)}
{opt or}
{it:{help propcnsreg##em_maximize_options:em_maximize_options}}
{it:{help propcnsreg##maximize_options:maximize_options}}
]
{p 4 4 2}{cmd:by} {it:...} {cmd::} may be used with {cmd:propcnsreg}; see help
{help by}.
{p 4 4 2}{cmd:fweight}s, {cmd:pweight}s, {cmd:aweight}s, and {cmd:iweight}s are
allowed; see help {help weights}.
{title:Description}
{p 4 4 2} {cmd:propcnsreg} combines information from several observed
variables into a single latent variable and estimates the effect of this latent
variable on the dependent variable. {cmd:propcnsreg} assumes that the observed
variables influence the latent variable. A common alternative assumption is
that the latent variable influences the observed variables. For example,
{help factor:factor analysis} is based in this alternative assumption. To
distinguish between these two situations some authors, following Bollen (1984)
and Bollen and Lennox (1991), call the observed variables "effect indicators"
when they are influenced by the latent variable, while they call the observed
variables "causal indicators" when they influence the latent variable.
Distinguishing between these two is important as they require very different
strategies for recovering the latent variable. In a basic (exploratory) factor
analysis, which is a model for effect indicators, one assumes that the only
thing that the observed variables have in common is the latent variable, so any
correlation between the observed variables must be due to the latent variable,
and it is this correlation that is used to recover the latent variable. In
{cmd:propcnsreg}, which estimates models for causal indicators, we assume that the
latent variable is a weighted sum of the observed variables (and optionally an
error term), and the weights are estimated such that they are optimal for
predicting the dependent variable.
{p 4 4 2} Models for dealing with causal indicators come in roughly three
flavors: A model with "sheaf coefficients" (Heise 1972), a model with
"parametricaly weighted covariates" (Yamaguchi 2002), and a
Multiple Indicators and Multiple Causes (MIMIC) model (Hauser Goldberger 1971).
The latter two can be estimated using {cmd:propcnsreg}, while the former
can be estimated using {cmd:sheafcoef}, which is also available from SSC.
{dlgtab:Sheaf coefficient}
{p 4 4 2}The sheaf coefficient is the simplest model of the three. Say we want
to explain a variable y using three observed variables x1, x2, and x3, and we
think that x1 and x2 actually influence y through a latent variable eta.
Because eta is a latent variable we need to fix the origin and the unit. The
origin can be fixed by setting eta to 0 when both x1 and x2 are 0, the
unit can be fixed by setting the standard deviation of eta equal to 1. The
model starts with simple regression model, where the b-s are the regression
coefficients and e a normally distributed error term, with a mean of 0 and
a standard deviation that is to be estimated:
{p 8 4 2}
(1) y = b0 + b1 x1 + b2 x2 + b3 x3 + e
{p 4 4 2}and we want to turn this into, where l is the effect of the latent
variable and the c-s are the effects of the observed variables on the latent
variable:
{p 8 4 2}
(2) y = b0 + l eta + b3 x3 + e
{p 8 4 2}
(3) eta = c0 + c1 x1 + c2 x2
{p 4 4 2}
We can fix the origin of eta by constraining c0 to be 0, this way eta will be
0 when both x1 and x2 equal 0. This leaves c1 and c2. We want to choose
values for these parameters such that eta optimally predicts y, and the standard
deviation of eta equals 1. This means that c1 and c2 are going to be a
transformation of b1 and b2. We can start with an initial guess that c1 equals
b1 and c2 equals b2, and call the resulting latent variable eta'. This will
get us closer to where we want to be, as we now have values for all parameters:
c0=0, c1'=b1, c2'=b2, and l'=1. The value for l' is derived from the fact that
that is the only value where equations (2) and (3) lead to equation (1).
However, the standard deviation of eta' will generally not be equal to 1,
actually we can calculate the standard deviation of eta' as follows:
{p 8 4 2}
sd(eta') = sqrt{c -(}b1^2 var(x1) + b2^2 var(x2) + 2 b1 b2 cov(x1, x2){c )-}
{p 4 4 2}
We can recover eta by dividing eta' by its standard deviation, which means that
the true values of c1 and c2 are actually b1/sd(eta') and b2/sd(eta'). If we
divide eta' by its standard deviation, then we must multiply l' by that same
number to ensure that equations (2) and (3) continue to lead to equation (1).
As a consequence l will equal sd(eta').
{p 4 4 2}
Notice that the effect of the latent variable will thus always be positive.
This is necesary because we have only specified the origin and unit of the
latent variable but not its direction. Say, x1 is the proportion of vegetables
in a person's diet and x2 the number minutes spent a day excercizing. If we
did not fix the effect of the latent variable to be positive, then there would
always be two sets of estimates that would represent exactly the same
information. If the c's are positive then the latent variable represent the
healtyness of someone's lifestyle, and if the c's are negative then the latent
variable represent the unhealtyness of that person's lifestyle. Saying that
the healthyness of someone's lifestyle has a positive effect is exactly the
same as saying that the unhealthyness of someone's lifestyle has a negative
effect. Stata can't choose between these two, since both statements are the
same, so we need to choose for it. We can do so by either fixing the
direction of the latent variable or fixing the direction of the effect. The
default is to fix the direction of the effect, but we can also specify one
key variable and fix the direction of the latent variable relative to this
key variable either by stating that the latent variable is high when the key
variable is high and low when the key variable is low, or exactly the
opposite.
{p 4 4 2}
This illustrates how the following set assumptions can be used to recover
the latent variable and its effect of the dependent variable:
{pmore}
- the latent variable is a weighted sum of the observed variables such that the
latent variable optimally predicts the dependent variable.
{pmore}
- a constraint that fixes the origin of the latent variable.
{pmore}
- a constraint that fixes the unit of the latent variable.
{pmore}
- a constraint that either fixes the direction of the latent variable or the
direction of the effect of the latent variable.
{p 4 4 2}
However, a sheaf coefficient just reorders the information you obtained from
a regular regression. It is just a different way of looking at the regression
results, which can be useful but it does not impose a testable constraint.
{p 4 4 2}
One possible application of the sheaf coefficient is the comparison of effect
sizes of different blocks of variables. For example, we may have a block of
variables representing the family situation of the respondent and another
block of variables representing characteristics of the work situation and we
wonder whether the work situation or the family situation is more important
for determining a certain outcome variable. In that case we would estimate
a model with two latent variables, one for the family situation and one for
the work situation, and since both latent variables are standardized their
effects will be comparable.
{dlgtab: Parametricaly weighted covariates}
{p 4 4 2}
The model with parametricaly weighted covariates builds on the model with
sheaf coefficients, but adds a testable constraint by assuming that the
effect of the latent variable changes over another observed variable. This
means that instead of equation (2) we will be estimating equation (4) where
the effect of eta changes over x3:
{p 8 4 2}
(4) y = b0 + (l0 + l1 x3) eta + b3 x3 + e
{p 4 4 2}
If we replace eta with equation (3), and fix the unit of eta by constraining c0
to be zero, we get:
{p 8 4 2}
y = b0 + (l0 + l1 x3) (c1 x1 + c2 x2) + b3 x3 + e
{p 8 4 2}
= b0 + (l0 + l1 x3) c1 x1 + (l0 + l1 x3) c2 x2 + b3 x3 + e
{p 4 4 2}
This means the effect of x1 (through eta) on y equals (l0 + l1 x3) c1, and that the
effect of x2 (through eta) on y equals (l0 + l1 x3) c2. This implies the following
constraint: for every value of x3, the effect of x1 relative to x2 will always be
{c -(}(l0 + l1 x3) c1{c )-} / {c -(}(l0 + l1 x3) c2{c )-} = c1/c2, which is a
constant. In other words, the model with parametricaly weighted covariates imposes
a proportionality constraint. A test of this constraint is reported at the bottom
of the output from {cmd:propcnsreg} (when the {cmd:mimic} option is not specified).
{p 4 4 2} This proportionality constraint can also be of substantive interest without
referring to a latent variable. Consider a
model where one wants to explain the respondent's education ({it:ed}) with the
eduction of the father ({it:fed}) and the mother ({it:med}), and that one is
interested in testing whether the relative contribution of the mother's
education has increased over time. {cmd:propcnsreg} will estimate this model
under the null hypothesis that the relative contributions of {it:fed} and {it:med}
have remained constant overtime. Notice that the effects of {it:fed} and {it:med}
are allowed to change over time, but the effects of {it:fed} and {it:med} are
constrained to change by the same proportion over time. So if the effect of
{it:fed} drops by 10% over a decade, than so does the effect of {it:med}.
{p 4 4 2}{cmd:propcnsreg} will allow you to identify the unit of the latent
variable in one of the following three ways:
{pmore}
- By setting its standard deviation of the latent variable to 1, effectively
standardizing the latent variable. This is the default parametrization , but can also
be explicitly requesting by specifying the {cmd:standardized} option. One can
specify one key variable by prefixing that variable in the {cmd:constrained}
option with either a {cmd:+} or a {cmd:-}. The {cmd:+} means that the latent
variable is high when the key variable is high and the latent variable is low
when the key variable is low. The {cmd:-} means exactly the opposite. If no
key variable is specified then l0 is constrained to be postive.
{pmore}
- By setting the coefficient l0 to 1, which means that c1 and c2 represent the
indirect effects of x1 and x2 through the latent variable on y when x3 equals 0.
{pmore}
- By setting either the coefficient c1 or c2 to 1, which means that the unit of the latent
variable will equal the unit of either x1 or x2 respectively. This can be done by
specifying the {opt unit(varname)} option.
{dlgtab:MIMIC}
{p 4 4 2}
The MIMIC model builds on the model with parametricaly weighted covariates by
assuming that the latent variable is measured with error. This means that the
following model is estimated:
{p 8 4 2}
(5) y = b0 + (l0 + l1 x3) eta + b3 x3 + e_y
{p 8 4 2}
(6) eta = c0 + c1 x1 + c2 x2 + e_eta
{p 4 4 2}
Where e_y and e_eta are independent normally distributed error terms with means zero
and standard deviations that need to be estimated. By replacing eta in equation (5)
with equation (6) one can see that the error term of this model is:
{p 8 4 2}
e_y + (l0 + l1 x3) e_eta
{p 4 4 2}
This combined error term will also be normally distributed, as the sum of two
independent normally distributed variables is itself also normally distributed, with
a mean zero and the following standard deviation:
{p 8 4 2}
sqrt{c -(}var(e_y) + (l0 + l1 x3)^2 var(e_eta){c )-}
{p 4 4 2}
So the empirical information that is used to separate the standard deviation of e_y
from the standard deviation of e_eta is the changes in the residual variance over x3.
So the data will only contain rather indirect information that can be used for
estimating this model, and the model may thus not always converge. However, if the
model is correct it will enable one to control for measurement error in the latent
variable.
{p 4 4 2}
There is an important downside to this model, and that is that heteroscedasticity,
and in particular changes in the variance of e_y over x3, could have a distorting
influence on the parameter estimates of l0 and l1. Consider again the example where
one wants to explain the respondent's education with the education of the father and
the mother, but now assume that we are interested in how the effect of the latent
variable changes over time. In this case we have good reason to suspect that the
variance of e_y will also change over time: Education consists of a discrete number
of categories, and in early cohorts most of the respondents tend to cluster in the
lowest categories. Over time the average level of education tends to increase, which
means that the respondents tend to cluster less in the lowest category, and have
more room to differ from one another. As a consequence the residual variance is likely
to have increased over time. Normally this heteroscedasticity would not be an issue
of great concern, but in a MIMIC model this heteroscedasticity is incorrectly
interpreted as indicating that there is measurement error in the latent variable
representing parental education. Moreover, this "information" on the measurement error
is used to "refine" the estimates of l0 and l1. So, this would be an example where the
MIMIC model would not be appropriate.
{dlgtab:Maximization of the likelihood function}
{p 4 4 2} A difficulty with both the model with parametricaly weighted covariates and
the MIMIC model is that the parameters are highly correlated, thus making it hard for
the standard maximization algorithms to find the maximum of the likelihood function.
To overcome this issue an EM algorithm is first used to find suitable starting values.
The EM algorithm breaks the correlation by first treating the weights for the observed
variables as fixed and estimate the effect of the latent variable, and than treat the
effect of the latent variable as fixed and estimate the weights. This is iterated by
default 20 times or till either the vector of parameters or the log likelihood changes
less than some predetermined amount. (see:
{help propcnsreg##em_maximize_options:em_maximize_options}) These parameter estimates
are then used as starting values for the regular {help ml} algorithm.
{title:Options}
{dlgtab 4 2:Model}
{phang}
{opt con:strained(varlist_c)} specifies the variables can be thought of as being
measurements of the same latent variable. The effects of these variables are to be
constrained to change by the same proportion as the variables specified in
{opt lambda()} change.
{pmore}
If the {cmd:standardized} option is specified one can
identify one variable as a key variable that identifies the direction of the latent
variable, either in the same direction as the key variable ({cmd:+}) or in the
opposite direction ({cmd:-}). If the {cmd:standardized} option is specified but no
key variable is specified, then the constant of the lambda equation will be
constrained to be positive.
{phang}
{opt lambda(varlist_l)} specifies the variables along which the effects of the
latent variable changes.
{phang}
{opt mimic} specifies that a MIMIC model is to be estimated.
{phang}
{opt logit} specifies that the dependent variable is binary and that the influence of
the latent and control variables on the probability is modeled through a logistic
regression model.
{dlgtab 4 2:Identification}
{phang}
{opt standardized} specified that the unit of the latent variable is identified by
constraining the standardard deviation of the latent variable to be equal to 1. This
is the default parametrization.
{phang}
{opt lcons} specifies that the parameters of the variables specified in the
option {cmd: constrained()} measure the indirect effect of these variables through the
latent variable on the dependent variable when all variables specified in the option
{cmd: lamda()} are zero.
{phang}
{opt unit(varname)} specifies that the scale of the latent variable is
indentified by constraining the unit of the latent variable to be equal to the unit
of {it: varname}. The variable {it: varname} must be specified in {it: varlist_c}.
{dlgtab 4 2:SE/robust/reporting}
{phang}
{opt r:obust} specifies that the Huber/White/sandwich estimator
of variance is to be used in place of the traditional calculation; see
{hi:[U] 23.14 Obtaining robust variance estimates}. {cmd:robust}
combined with {cmd:cluster()} allows observations which are not
independent within cluster (although they must be independent between
clusters).
{phang}
{opt c:luster(clustervar)} specifies that the observations
are independent across groups (clusters) but not necessarily within groups.
{it:clustervar} specifies to which group each observation belongs; e.g.,
{cmd:cluster(personid)} in data with repeated observations on individuals. See
{hi:[U] 23.14 Obtaining robust variance estimates}. Specifying {cmd:cluster()}
implies {cmd:robust}.
{phang}
{opt l:evel(#)} specifies the confidence level, in percent,
for the confidence intervals of the coefficients; see help {help level}.
{phang}
{opt or} specifies that odds ratios are to be displayed. If the {cmd:lcons} option
is specified than the parameters in all three equations (unconstrained, lambda, and
unconstrained) will be exponentiated. In all other cases only the parameters in the
first two equations (unconstrained, and lambda) will be exponentiated.
{marker em_maximize_options}{...}
{dlgtab 4 2:em_maximize_options}
{phang}
{opt emiter:ate(#)} specifies the maximum number of iterations for the
EM algorithm. When the number of iterations equals {cmd:iterate()}, the EM
algorithm stops. If convergence is declared before this threshold is reached,
it will stop when convergence is declared. The default value of
{opt iterate(#)} is 20.
{phang}
{opt emtol:erance(#)} specifies the tolerance for the coefficient vector.
When the relative change in the coefficient vector from one iteration to the
next is less than or equal to {opt emtolerance()}, the {opt emtolerance()}
convergence criterion is satisfied. {cmd:emtolerance(1e-6)} is the default.
{phang}
{opt emltol:erance(#)} specifies the tolerance for the log likelihood. When
the relative change in the log likelihood from one iteration to the next is
less than or equal to {opt emltolerance()}, the {opt emltolerance()}
convergence is satisfied. {cmd:emltolerance(1e-7)} is the default
{phang}
These options are seldom used.
{marker maximize_options}{...}
{dlgtab 4 2:maximize_options}
{p 4 4 2}
{opt diff:icult},
{opt tech:nique(algorithm_spec)},
{opt iter:ate(#)},
{opt tr:ace},
{opt grad:ient},
{opt showstep},
{opt hess:ian},
{opt shownr:tolerance},
{opt tol:erance(#)},
{opt ltol:erance(#)},
{opt gtol:erance(#)},
{opt nrtol:erance(#)},
{opt nonrtol:erance(#)};
see {help maximize}. These options are seldom used.
{title:Example}
{pstd}
Example illustrating the use of {help predict} to help with interpreting the model:
{cmd}
sysuse nlsw88, clear
gen hs = grade == 12 if grade < .
gen sc = grade > 12 & grade < 16 if grade < .
gen c = grade >= 16 if grade < .
gen lnwage = ln(wage)
gen tenure2 = tenure^2
gen white = race == 1 if race < .
propcnsreg lnwage white tenure tenure2, /*
*/ lambda(tenure tenure2 white) /*
*/ constrained(hs sc c) unit(c)
predict effect, xb eq(lambda)
predict se_effect, stdp eq(lambda)
gen lb = effect - 1.96*se_effect
gen ub = effect + 1.96*se_effect
sort tenure
twoway rarea lb ub tenure if white == 1 || /*
*/ rarea lb ub tenure if white == 0, /*
*/ astyle(ci ci) || /*
*/ line effect tenure if white == 1 || /*
*/ line effect tenure if white == 0, /*
*/ yline(0) clpattern(longdash shortdash) /*
*/ legend(label(1 "95% conf. int.") /*
*/ label(2 "96% conf. int.") /*
*/ label(3 "white") /*
*/ label(4 "non-white") /*
*/ order(3 4 1 2)) /*
*/ ytitle("effect of education on wage"),
{txt}
{phang2}{it:({stata "propcnsreg_ex 1":click to run})}{p_end}
{pstd}
An example for a binary dependent variable. Note that in this
case both the parameters in the unconstrained and the lambda
equation are both odds ratio (nameing the constant in the
latter equation __cons instead of _cons is a trick to force
its display).
{cmd}
sysuse nlsw88, clear
gen byte high = occupation < 3 if !missing(occupation)
gen byte white = race == 1 if !missing(race)
gen byte hs = grade == 12 if !missing(grade)
gen byte sc = grade > 12 & grade < 16 if !missing(grade)
gen byte c = grade >= 16 if !missing(grade)
propcnsreg high white ttl_exp married never_married age, ///
lambda(ttl_exp white) ///
constrained(hs sc c) unit(c) logit or
{phang2}{it:({stata "propcnsreg_ex 2":click to run})}{p_end}
{txt}
{title:Author}
{p 4 4 2}Maarten L. Buis, Universitaet Tuebingen{break}maarten.buis@ifsoz.uni-tuebingen.de
{title:References}
{p 4 4 2}
Bollen, Kenneth A. 1984. "Multiple Indicators: Internal Consistency or
No Necessary Relationship" {it:Quality and Quantity} 18(4): 377{c -}385.
{p 4 4 2}
Bollen, Kenneth A. and Richard Lennox. 1991. "Conventional Wisdom
on Measurement: A Structural Equation Perspective"
{it:Psychological Bulletin} 110(2): 305{c -}314.
{p 4 4 2}
Hauser, Robert M. and Arthur S. Goldberger. 1971.
"The Treatment of Unobservable Variables in Path Analysis."
{it:Sociological Methodology} 3: 81{c -}117.
{p 4 4 2}
Heise, David R. 1972. "Employing nominal variables, induced
variables, and block variables in path analysis."
{it:Sociological Methods & Research} 1(2): 147{c -}173.
{p 4 4 2}
Yamaguchi, Kazuo. 2002. "Regression models with parametrically
weighted explanatory variables." {it: Sociological Methodology}
32: 219{c -}245.
{title:Suggested citation if using propcnsreg in published work}
{p 4 4 2}
{cmd:propcnsreg} is not an official Stata command. It is a free contribution
to the research community, like a paper. Please cite it as such.
{p 4 4 2}
Buis, Maarten L. 2007. "PROPCNSREG: Stata program fitting a linear regression
with a proportionality constraint by maximum likelihood"
{browse "http://ideas.repec.org/c/boc/bocode/s456858.html"}
{title:Also see:}
{p 4 4 2}
{helpb factor}, {helpb sheafcoef} (if installed)