{smcl}
{* 30Jun2006}{...}
{hline}
help for {hi:apc_ie}
{hline}

{title:Intrinsic estimator for age-period-cohort effects in generalized linear models}

{p 4}Syntax

{p 8 14}{cmd:apc_ie} {it:depvar} [{it:indepvars}]
[{cmd:if} {it:exp}] [{cmd:in} {it:range}] [{it:weight}]
{cmd:,}
{bind:[{cmd:age(}{it:varname}{cmd:)}}
{cmd:period(}{it:varname}{cmd:)}
{cmd:cohort(}{it:varname}{cmd:)}
{cmdab:gen:erate(}{it:newvarname}{cmd:)}
{bind:{it:glm_options} ]}

{p 4}Replay syntax

{p 8 14}{cmd:apc_ie} {bind:[{cmd:,}} {cmdab:nohead:er}
{cmdab:le:vel(}{it:cilevel}{cmd:)} {bind:{cmdab:ef:orm} ]}

{p 4} {cmd:apc_ie} allows all {it:varlists} and {it:weights} that are
allowed by {cmd:glm}.


{title:Description}

{p} {cmd:apc_ie} estimates generalized linear models with age, period and 
cohort effects using the intrinsic estimator (IE) described by Yang, Fu 
and Land (2004). The structure of the design matrix -- i.e., the numbers 
of age groups and time periods -- may affect the estimates obtained from 
conventional CGLIM estimators. (See {help apc_cglim}.) The IE employs 
a special principal components regression that removes the influence of 
the null (column) space of the design matrix on the estimator.


{title:Methods}

{p} The IE estimates a constrained parameter vector that corresponds to 
the projection of the model parameter on the non-null (column) subspace of 
the design matrix. {cmd:apc_ie} computes this constrained parameter vector by 
a special principal components regression.

{p} The estimation algorithm proceeds by first applying an orthonormal 
matrix transformation to the {it:X'X} matrix of the linear model (or its 
equivalent in a generalized linear model). This transformation produces 
the nonzero eigenvalues and corresponding eigenvectors of the matrix. The 
principal components regression then is estimated by using these 
eigenvectors as variables. After estimation of the principal components 
regression model, the orthonormalizing matrix transformation is used in 
reverse to transform the estimated regression coefficients back to the 
original age, period, and cohort effects for ease of interpretation.

{p} Instead of omitting one reference category from each set of 
indicator variables, the IE uses the constraint that the sum of 
coefficients in each set is zero. The algorithm in {cmd:apc_ie} adds an 
indicator variable for each unique value of {it:age}, {it:period} and 
{it:cohort} to the list of independent variables, but omits one category 
for each of {it:age}, {it:period} and {it:cohort} for computational 
purposes. Because {it:age}+{it:cohort}={it:period}, one additional 
indicator variable is redundant. {cmd:apc_ie} therefore replaces the 
{it:A-1+P-1+C-1} indicator variables with the {it:A-1+P-1+C-2} principal 
components that correspond to nonzero eigenvalues. After estimating the
principal components regression, the IE uses the zero-sum constraints
to obtain estimates for the deleted age, period and cohort categories.

{p} Because the IE -- including the coefficients obtained from zero-sum 
constraints -- is a linear transformation of the coefficients in the 
principal components model, the variance-covariance matrix of the IE is 
{it:V_ie=B*V_pc*B'}, where {it:B} is a transformation matrix and {it:V_pc} 
is the variance-covariance matrix of the principal components estimator. 
See {help glm} for options for computing variance-covariance matrices in 
generalized linear models; {cmd:apc_ie} simply applies a transformation to 
whatever variance-covariance matrix you choose to compute.

{p} The IE is always defined based on principal components of a design 
matrix that has one observation per age-by-period cell. Missing data or 
multiple observations per cell do not affect which principal components 
are used in calculating the IE. Likewise, the principal components do not 
depend on {it:indepvars} or {it:weights}, if any. However, if any of the 
{it:indepvars} are collinear with {it:age}, {it:period} or {it:cohort},
or if too many cells in the age-by-period matrix have no data, 
eliminating only one principal component does not resolve the 
identification problem. {cmd:apc_ie} returns an error when this happens.


{title:Options}

{p 0 4} {cmd:age(}{it:varname}{cmd:)}, {cmd:period(}{it:varname}{cmd:)}
and {cmd:cohort(}{it:varname}{cmd:)} specify the {it:age}, {it:period} and
{it:cohort} variables. At least two of these three must be specified. If
all three are specified, they must satisfy 
{it:age}+{it:cohort}={it:period}.
If only two are specified, the missing variable is generated according to
{it:age}+{it:cohort}={it:period}.

{p 0 4} {cmd:generate(}{it:newvarname}{cmd:)} stores the generated value
of {it:age}, {it:period} or {it:cohort} in a new variable.

{p} {it:glm_options} can be any valid options for {cmd:glm}.


{title:References}

{p 0 4}Yang, Y., Fu, W., and Land, K. 2004. A Methodological Comparison of 
Age-Period-Cohort Models: The Intrinsic Estimator and Conventional 
Generalized Linear Models. {it:Sociological Methodology} 34(1), 75-110.


{title:Authors}

Sam Schulhofer-Wohl
Department of Economics
The University of Chicago
1126 E. 59th St.
Chicago, IL 60637
sschulh1@uchicago.edu

Yang Yang, Ph.D.
Department of Sociology
Population Research Center and Center on Aging at NORC
The University of Chicago
1126 E. 59th St.
Chicago, IL 60637
(O) 773-834-1113
yangy@uchicago.edu


{title:Also see}

{p 0 19} Online: help for {help glm}; {help apc_cglim} (if installed).{p_end}