{smcl}
{* *! version 1.1.7  03apr2020 Ben Jann & Simon Seiler}{...}
{vieweralsosee "[R] mlogit" "help mlogit"}{...}
{viewerjumpto "Syntax" "udiff##syntax"}{...}
{viewerjumpto "Description" "udiff##description"}{...}
{viewerjumpto "Options" "udiff##options"}{...}
{viewerjumpto "Postestimation" "udiff##postest"}{...}
{viewerjumpto "Examples" "udiff##examples"}{...}
{viewerjumpto "Methods and formulas" "udiff##methods"}{...}
{viewerjumpto "Saved results" "udiff##saved_results"}{...}
{viewerjumpto "References" "udiff##references"}{...}
{viewerjumpto "Authors" "udiff##authors"}{...}
{hi:help udiff}
{hline}

{title:Title}

{pstd}{hi:udiff} {hline 2}  Generalized unidiff model for individual-level data


{marker syntax}{...}
{title:Syntax}

{pstd}
    Simple syntax:

{p 8 15 2}
    {cmd:udiff} {depvar} {help varlist:{it:xvars}} {help varname:{it:layervar}} {ifin} {weight} [{cmd:,}
    {help udiff##opts:{it:options}} ]

{pstd}
    Advanced syntax:

{p 8 15 2}
    {cmd:udiff} {depvar} {it:term} [{it:term} ...] [{help varlist:{it:controlvars}}] {ifin} {weight} [{cmd:,}
    {help udiff##opts:{it:options}} ]

{pmore}
    where {it:term} is a unidiff term specified as

            {cmd:(}{help varlist:{it:xvars}} {help varname:{it:layervar}}{cmd:)}
        or
            {cmd:(}{help varlist:{it:xvars}} {cmd:<-} {help varlist:{it:layervars}}{cmd:)}
        or
            {cmd:(}{help varlist:{it:layervars}} {cmd:->} {help varlist:{it:xvars}}{cmd:)}

{pmore}
    {it:xvars} must be unique across unidiff terms, {it:layervars} may be repeated; parentheses may be
    omitted if there are no control variables and if only one unidiff term is specified.


{synoptset 22 tabbed}{...}
{marker opts}{...}
{synopthdr}
{synoptline}
{syntab :Main}
{synopt :{opt cf:only}}estimate constant-fluidity model instead of unidiff model{p_end}
{synopt :{opth constr:aints(numlist)}}apply specified linear constraints{p_end}
{synopt :{opt b:aseoutcome(#)}}value of {depvar} that will be the base outcome{p_end}
{synopt :{opt nocons:tant}}suppress constant term{p_end}

{syntab :SE/Robust}
{synopt :{opth vce(vcetype)}}{it:vcetype} may be {opt oim},
   {opt r:obust}, {opt cl:uster} {it:clustvar}, {opt boot:strap}, or
   {opt jack:knife}{p_end}
{synopt :{opt r:obust}}synonym for {cmd:vce(robust)}{p_end}
{synopt :{opt cl:uster(clustvar)}}synonym for {cmd:vce(cluster} {it:clustvar}{cmd:)}{p_end}

{syntab :Reporting}
{synopt :{opt l:evel(#)}}set confidence level; default is {cmd:level(95)}{p_end}
{synopt :{opt all:equations}}report results for all equations; by default only the unidiff parameters are displayed{p_end}
{synopt :{opt eform}}report coefficients in exponentiated form{p_end}
{synopt :{opt noh:eader}}suppress header display above coefficient table{p_end}
{synopt :{it:{help estimation_options##display_options:display_options}}}standard display options{p_end}
{synopt :{opt coefl:egend}}display legend instead of statistics{p_end}
{synopt :{opt noi:sily}}display output from initial constant-fluidity model{p_end}

{syntab :Maximization}
{synopt :{it:{help maximize:maximize_options}}}maximization options{p_end}
{synoptline}
{p 4 6 2}{it:xvars}, {it:layervars}, and {it:controlvars} may contain factor variables; see {help fvvarlist}.{p_end}
{p 4 6 2}{helpb svy} and {helpb mi estimate} are supported; see {help prefix}.{p_end}
{p 4 6 2}{cmd:fweight}s, {cmd:aweight}s, {cmd:iweight}s, and {cmd:pweight}s are allowed; see help {help weight}.{p_end}
{p 4 6 2}{helpb udiff##postest:predict} and other postestimation commands are available after {cmd:udiff}; see {help udiff##postest:below}.{p_end}
{p 4 6 2}{helpb udiff##postest:estat rescale} computes rescaled unidiff parameters after {cmd:udiff}; see {help udiff##postest:below}.{p_end}
{p 4 6 2}{helpb udiff##postest:estat lambda} computes lambda coefficients after {cmd:udiff}; see {help udiff##postest:below}.{p_end}
{p 4 6 2}{helpb udiff##postest:estat kappa} computes kappa indices after {cmd:udiff}; see {help udiff##postest:below}.{p_end}


{marker description}{...}
{title:Description}

{pstd}
    {cmd:udiff} estimates parameters of the so-called unidiff model (Erikson
    and Goldthorpe 1992), also known as the log-multiplicative layer effect
    model (Xie 1992), which is often used to study differences in
    intergenerational class mobility between birth cohorts or countries.

{pstd}
    The original unidiff model has been expressed as a log-linear model of cell
    frequencies in a three-way contingency table (origin by destination by
    cohort or country). The model, however, can also be expressed at the
    individual-level (similar to a multinomial logit model). {cmd:udiff} estimates such a
    re-expressed unidiff model for individual-level data. Furthermore, it generalized the
    model to allow for multiple layers and non-categorical predictors. For details see
    {help udiff##methods:Methods and Formulas} below. For an implementation
    of the classic log-linear unidiff model for aggregate data see Pisati (2000).

{pstd}
    {it:depvar} is the (categorical) destination variable (e.g. class of
    respondent).

{pstd}
    {it:xvars} specifies the origin variable(s) (e.g. class of
    respondent's parents). Typically, {it:xvars} is a single
    categorical variable specified as {cmd:i.}{it:varname}, although multiple
    variables as well as continuous variables are allowed.

{pstd}
    {it:layervars} specifies the layer variable(s) to be interacted with
    {it:xvars}. Typically, {it:layervars} is a single categorical variable
    specified as {cmd:i.}{it:varname} (e.g. countries or birth-cohort
    categories), although multiple variables as well as continuous variables
    are allowed. For example, specify {cmd:(}{it:xvars} {cmd:<-}
    {cmd:c.cohort##c.cohort}{cmd:)} to model the unidiff scaling factor
    as a quadratic function of variable {cmd:cohort}. Likewise, if your data
    contains information on countries and birth cohorts, you could type
    {cmd:(}{it:xvars} {cmd:<-} {cmd:i.country i.cohort}{cmd:)} to include
    separate unidiff parameters for both dimensions. Furthermore, you could
    type  {cmd:(}{it:xvars} {cmd:<-} {cmd:i.country##i.cohort}{cmd:)}
    to include unidiff parameters for all country-cohort combinations.

{pstd}
    {it:controllvars} are control variables whose effects are assumed to be
    constant across layers.


{marker options}{...}
{title:Options}

{phang}
    {opt cfonly} causes the constant-fluidity model to be reported instead
    of the unidiff model. Estimation of the unidiff model will be skipped.

{phang}
    {opth constraints(numlist)} applies linear constraints to
    the estimation. {it:numlist} specifies the constraints by number, after
    they have been defined using the {helpb constraint} command. An
    {help udiff##exconstr:example} is provided below.

{phang}
    {opt baseoutcome(#)} specifies the value of {depvar} to be treated as the base
    outcome. The default is to choose the most frequent outcome.

{phang}
    {opt noconstant} suppresses the constant (outcome-specific intercepts)
    in the model.

{phang}
    {opt vce(vcetype)} specifies the type of variance estimation to be used
    to determine the standard errors. {it:vcetype} may be {opt oim},
   {opt r:obust}, {opt cl:uster} {it:clustvar}, {opt boot:strap}, or
   {opt jack:knife}; see {help vce_option:[R] {it:vce_option}}.

{phang}
    {opt robust} is a synonym for {cmd:vce(robust)}.

{phang}
    {opt cluster(clustvar)} is a synonym for {cmd:vce(cluster} {it:clustvar}{cmd:)}.

{phang}
    {opt level(#)} specifies the confidence level, as a percentage, for
    confidence intervals. The default is {cmd:level(95)}
    or as set by {helpb set level}.

{phang}
    {opt allequations} reports results for all equations of the model. By default,
    only the first equation containing the unidiff parameters is displayed.

{phang}
    {opt eform} displays the coefficients in exponentiated form. That is, for each coefficient,
    exp({it:b}) rather than {it:b} is displayed, and standard errors and
    confidence intervals are transformed accordingly.

{phang}
    {opt noheader} suppresses the header above the coefficient table
    that displays the final log-likelihood value, the number of observations,
    and the unidiff significance test.

{phang}
    {it:display_options} are standard display options; see
    {helpb estimation_options##display_options:[R] estimation options}.

{phang}
    {opt coeflegend} specifies that the legend of the coefficients and how
    to specify them in an expression be displayed rather than displaying the
    statistics for the coefficients.

{phang}
    {opt noisily} displays the {helpb mlogit} output of the initial
    constant-fluidity model. By default, the initial model is not displayed.

{phang}
    {it:maximize_options} are maximization options such as {cmd:iterate()} or
    {cmd:difficult}. See {helpb maximize:[R] maximize}. These options will only
    be applied to the unidiff model, but not to the initial constant-fluidity model.


{marker postest}{...}
{title:Postestimation commands}

{pstd}
    Usual postestimation commands such as {helpb predict}, {helpb test}, {helpb estat},
    {helpb lincom}, {helpb nlcom}, {helpb margins}, or {helpb suest} are available
    after {cmd:udiff}. Details on {cmd:estat} and {cmd:predict} are as follows.

{pstd}
    Note that, after a model that has been estimated using the {helpb svy}
    prefix, {cmd:estat rescale}, {cmd:estat lambda}, and {cmd:estat kappa} have to be specified as

        {cmd:. udiff_estat} {it:subcmd}

{pstd}
    where {it:subcmd} is {cmd:rescale}, {cmd:lambda}, or {cmd:kappa}.

{dlgtab:estat rescale}

{p 8 15 2}
    {cmd:estat} {cmdab:res:cale} [{it:{help numlist}}] [{cmd:,} {opt post} {opt l:evel(#)} {it:{help estimation_options##display_options:display_options}} ]

{pstd}
    Report rescaled unidiff parameters using the normalization suggested by
    Xie (1992). The normalization is only supported for unidiff terms that contain
    a single categorical layer variable specified as {cmd:i.}{it:varname}
    (factor variable). The normalization is such that the sum of the squared
    parameters equals 1 (within each unidiff term).

{phang}
    {it:numlist} specifies the unidiff terms to be included; this is only
    relevant if a model contains multiple unidiff terms. The default is to
    include all unidiff terms found in the model. To only include, say, the
    second unidiff term, type {cmd:estat rescale 2}.

{phang}
    {opt post} causes the rescaled results to be posted in {cmd:e(b)} and {cmd:e(V)}. This
    will clear out the previous estimation results. Without the {cmd:post} option, the results
    are stored in {cmd:r(b)} and {cmd:r(V)}; see {help udiff##saved_results:Saved results} below.

{phang}
    {opt level(#)} specifies the confidence level, as a percentage, for
    confidence intervals. The default is {cmd:level(95)}
    or as set by {helpb set level}.

{phang}
    {it:display_options} are standard display options; see
    {helpb estimation_options##display_options:[R] estimation options}.

{dlgtab:estat lambda}

{p 8 15 2}
    {cmd:estat} {cmdab:lam:bda} [{it:#}] [{cmd:,} {opt std:ize} {opt eform} {opt comp:act} {opt post} {opt l:evel(#)} {it:{help estimation_options##display_options:display_options}} ]

{pstd}
    Report lambda coefficients for unidiff term {it:#} (if {it:#} is omitted,
    the first unidiff term is used). {cmd:estat lambda} only supports unidiff terms that contain
    a single categorical layer variable and a single categorical predictor, both specified
    as {cmd:i.}{it:varname} (factor variable). See Pisati (2000) for a definition
    of the lambda coefficients.

{phang}
    {opt stdize} requests standardized lambda coefficients. The default is to
    report raw lambda coefficients.

{phang}
    {opt eform} reports the results in exponentiated form.

{phang}
    {opt compact} requests that the lambda coefficients be displayed in a
    two-way table with one column per outcome level. Standard errors will not be displayed
    in this case. The default is to display the coefficients in a one-way
    table including standard errors and confidence intervals.

{phang}
    {opt post} causes the lambda coefficients to be posted in {cmd:e(b)} and {cmd:e(V)}. This
    will clear out the previous estimation results. Without the {cmd:post} option, the results
    are stored in {cmd:r(b)} and {cmd:r(V)}; see {help udiff##saved_results:Saved results} below.

{phang}
    {opt level(#)} specifies the confidence level, as a percentage, for
    confidence intervals. The default is {cmd:level(95)}
    or as set by {helpb set level}.

{phang}
    {it:display_options} are standard display options; see
    {helpb estimation_options##display_options:[R] estimation options}.

{dlgtab:estat kappa}

{p 8 15 2}
    {cmd:estat} {cmdab:kap:pa} [{it:#}] [{cmd:,} {opt post} {opt l:evel(#)} {it:{help estimation_options##display_options:display_options}} ]

{pstd}
    Report kappa indices coefficients for unidiff term {it:#} (if {it:#} is omitted,
    the first unidiff term is used). {cmd:estat kappa} only supports unidiff terms that contain
    a single categorical layer variable and a single categorical predictor, both specified
    as {cmd:i.}{it:varname} (factor variable). See Pisati (2000) for a definition
    of the kappa indices.

{phang}
    {opt post} causes the kappa indices to be posted in {cmd:e(b)} and {cmd:e(V)}. This
    will clear out the previous estimation results. Without the {cmd:post} option, the results
    are stored in {cmd:r(b)} and {cmd:r(V)}; see {help udiff##saved_results:Saved results} below.

{phang}
    {opt level(#)} specifies the confidence level, as a percentage, for
    confidence intervals. The default is {cmd:level(95)}
    or as set by {helpb set level}.

{phang}
    {it:display_options} are standard display options; see
    {helpb estimation_options##display_options:[R] estimation options}.

{dlgtab:predict}

{p 8 15 2}
    {cmd:predict} [{it:{help datatypes:type}}] {newvar} {ifin} [{cmd:,} {opt xb} {opt e:quation(equation)} ]

{p 8 15 2}
    {cmd:predict} [{it:{help datatypes:type}}] {newvar} {ifin}{cmd:,} {opt p:r} [ {opt o:utcome(outcome)} ]

{p 8 15 2}
    {cmd:predict} [{it:{help datatypes:type}}] {c -(}{it:stub}{cmd:*} | {help newvarlist:{it:newvarlist}}{c )-}  {ifin}{cmd:,}
    {opt sc:ores} [ {opt e:quation(equation)} ]

{phang}
    {opt xb} calculates linear predictions for the equation specified by
    {cmd:equation()}. {cmd:xb} is the default unless {cmd:pr} or {cmd:scores}
    is specified. If {opt equation()} is omitted, linear predictions are calculated
    for the first equation.

{phang}
    {opt equation(equation)} specifies the equation for which linear
    predictions are to be calculated. {it:equation} can be an equation name, or
    an equation index specified as {cmd:#1}, {cmd:#2}, etc. Option
    {opt equation()} is not allowed with {cmd:pr}.

{phang}
    {opt pr} calculates predicted probabilities for the outcome specified by
    {cmd:outcome()}. If {opt outcome()} is omitted, predicted probabilities are
    calculated for the first outcome.

{phang}
    {opt outcome(outcome)} specifies the outcome for which predicted
    probabilities are to be calculated. {it:outcome} can be an
    outcome value, or an outcome index specified as {cmd:#1}, {cmd:#2}, etc. Option
    {opt outcome()} is only allowed with {cmd:pr}.

{phang}
    {opt scores} calculates equation-level score variables (first derivative
    of the log likelihood). If {opt equation()} is omitted, score variables
    are generated for all equations (one variable per equation; if {it:k} is
    the number of outcomes, then the number of equations is equal to ({it:k}-1)*2+1).


{marker examples}{...}
{title:Examples}

    {help udiff##exbasic:Basic example}
    {help udiff##exrescale:Normalized unidiff parameters}
    {help udiff##exlambda:Lambda coefficients and kappa indices}
    {help udiff##exconstr:Specifying constraints}
    {help udiff##exfit:Testing model fit}
    {help udiff##excontinuous:Continuous origin variables}
    {help udiff##exmultiple:Multiple unidiff terms}
    {help udiff##excont:Continuous layer variables}
    {help udiff##excontrol:Control variables}

{marker exbasic}{...}
{dlgtab:Basic example}

{pstd}
    The unidiff model in Example 2 in Pisati (2000) can be reproduced as follows:

        . {stata "use http://www.stata.com/stb/stb55/sg142/example2.dta, clear"}
        . {stata udiff son i.father i.country [fweight=obs]}

{pstd}
    Using advanced syntax we could type

        . {stata udiff son (i.father <- i.country) [fweight=obs]}
    or
        . {stata udiff son (i.country -> i.father) [fweight=obs]}

{pstd}
    A likelihood-ratio test against the constant-fluidity model is included in the
    header of the output table. In the example, the test is highly significant and confirms that
    there are differences in the unidiff parameters between the countries.

{pstd}
    By default, {cmd:udiff} omits the base category from the output
    (Australia in this example) and displays the unidiff parameters in logarithmic form. To
    include the base category in the output, specify {cmd:baselevels}; to
    report unidiff parameters as multipliers, add the {cmd:eform} option:

        . {stata udiff, eform baselevels}

{pstd}
    Furthermore, by default only the unidiff scaling parameters are reported. To
    report all parameters of the model, specify option {cmd:all}:

        . {stata udiff, all}

{marker exrescale}{...}
{dlgtab:Normalized unidiff parameters}

{pstd}
    To obtain rescaled unidiff parameters using the normalization suggested
    by Xie (1992), you can apply command {helpb udiff##postest:estat rescale} after
    model estimation:

        . {stata "use http://www.stata.com/stb/stb55/sg142/example2.dta, clear"}
        . {stata udiff son i.father i.country [fweight=obs], eform base}
        . {stata estat rescale}

{pstd}
    Note that {helpb udiff##postest:estat rescale} is only supported for unidiff terms
    that contain a single categorical layer variable.

{marker exlambda}{...}
{dlgtab:Lambda coefficients and kappa indices}

{pstd}
    To obtain lambda coefficients (see Pisati 2000) you can apply command
    {helpb udiff##postest:estat lambda} after
    model estimation:

        . {stata "use http://www.stata.com/stb/stb55/sg142/example1.dta, clear"}
        . {stata udiff son i.father i.country [fweight=obs], eform base}
        . {stata estat lambda, stdize eform compact}

{pstd}
    The {cmd:compact} option has been specified to display the coefficients
    in a two-way table. This means that standard errors are not
    shown. Omit the {cmd:compact} option if you are interested in the standard
    errors or confidence intervals.

{pstd}
    The kappa indices, which are based on standardized lambda coefficients, can
    be obtained as follows:

        . {stata estat kappa}

{pstd}
    Note that {helpb udiff##postest:estat lambda} and {helpb udiff##postest:kappa} are
    only supported for unidiff terms
    that contain a single categorical layer variable and a single categorical
    predictor.

{marker exconstr}{...}
{dlgtab:Specifying constraints}

{pstd}
    In case of empty cells or similar problems, it may be necessary to specify
    constraints for the model to converge. Using the same data as above, assume
    that the combinations of father = "NonManual" and son = "Farm" is missing:

        . {stata "use http://www.stata.com/stb/stb55/sg142/example2.dta, clear"}
        . {stata replace obs = 0 if son==3 & father==1}

{pstd}
    To make {cmd:udiff} converge in this example, we can set the parameter
    for "NonManual" in the psi-equation for "Farm" to zero (while at the same time
    making sure that "NonManual" is not used as the base category). The following
    commands would do:

        . {stata "constraint 1 [Psi_3]: 1.father"}
        . {stata udiff son ib2.father i.country [fweight=obs], allequations constraints(1)}

{marker exfit}{...}
{dlgtab:Testing model fit}

{pstd}
    To test the fit of the unidiff model, a likelihood-ratio test against a
    saturated model can be performed, where the saturated model is a
    fully-interacted multinomial logit. A significant test statistic would
    indicate, that the saturated model fits the data significantly better than the
    unidiff model. An example is as follows:

        . {cmd:use http://www.stata.com/stb/stb55/sg142/example1.dta, clear}
        . {cmd:udiff son i.father i.country [fweight=obs]}
        . {cmd:estimates store udiff}
        . {cmd:mlogit son i.father##i.country [fweight=obs]}
        . {cmd:lrtest udiff ., force}

{pstd}
    Option {cmd:force} is needed because different estimation commands have been
    used to estimate the two models.

{pstd}
    Be aware that the likelihood-ratio test is only valid in case of simple
    random sampling. Do not use the test with complex samples, i.e., if
    sampling weights or the {cmd:svy} prefix have been specified.

{marker excontinuous}{...}
{dlgtab:Continuous origin variables}

{pstd}
    Assume that, apart from the categorical information on father's class,
    your data also contains a continuous origin variable such as father's ISEI score
    ({cmd:fisei}). Such information could easily be included in the model by adding the
    variable to the list of predictors in the unidiff term:

        . {cmd:udiff son (i.father fisei <- i.country)}

{marker exmultiple}{...}
{dlgtab:Multiple unidiff terms}

{pstd}
    Assume your data also contains information on mothers. You could include this
    information in the unidiff model, for example, as follows:

        . {cmd:udiff son (i.father i.mother <- i.country)}

{pstd}
    In this case, a single unidiff scaling factor would be used for both the effects of
    fathers and the effects of mothers. To use different unidiff factors and thus
    allow the effects of father and mothers to vary differently across countries, you
    could type:

        . {cmd:udiff son (i.father <- i.country) (i.mother <- i.country)}

{marker excont}{...}
{dlgtab:Continuous layer variables}

{pstd}
    The layer variable(s) do not need to be categorical. For example, if you have
    individual-level data containing information on the birth years
    of the respondents, you could model the layer effects
    as a parabolic function of the birth year to analyze how social mobility changes over
    time. To avoid convergence issues it is
    a good idea to center the birth years at a date that actually exists in the data. For
    example, define {cmd:cohort} = (birth year - 1980) and then type

        . {cmd:udiff son (i.father <- c.cohort##c.cohort)}

{pstd}
    Likewise, you could model the layer effects in terms of country characteristics:

        . {stata "use http://www.stata.com/stb/stb55/sg142/example2.dta, clear"}
        . {stata udiff son (i.father <- develop socdem i.east i.asia) [fweight=obs]}

{pstd}
    Statistical inference may not be credible in this example and we might want to
    cluster on countries:

        . {stata udiff son (i.father <- develop socdem i.east i.asia) [fweight=obs], cluster(country)}

{pstd}
    No value for the joint Wald test of the unidiff parameters (i.e. the
    test against the constant-fluidity model) is reported in this case due to the
    way how {helpb ml} (the underlying command used for model estimation) determines the
    degrees of freedom for the test. You can obtain the test using the
    {helpb test} command after model estimation:

        . {stata test [Phi]}

{pstd}
    However, note that the number of countries is small. Cluster-robust
    standard errors may be inconsistent in such a setting (a general recommendation
    is that the number of clusters should be at least 40 or 50).

{marker excontrol}{...}
{dlgtab:Control variables}

{pstd}
    Assume that the age structure (or distribution of birth years) is different
    across countries and you want to take account of that in your analysis. You could,
    for example, type

        . {cmd:udiff son (i.father <- i.country) age}

{pstd}
    In this way a an age effect that is common to all countries is included
    in the model. You could, of course, also use a more complex specification,
    such as, e.g.,

        . {cmd:udiff son (i.father <- i.country) c.age##c.age}


{marker methods}{...}
{title:Methods and formulas}

{dlgtab:The unidiff model}

{pstd}
    The unidiff model is typically used to study differences in
    intergenerational social mobility between birth cohorts or countries. Let
    {it:mu}(x,y,z) be the cell frequencies in a three-way table of X (origin
    class, e.g. class of parents) by Y (destination class, e.g. class of
    children) by Z (e.g. cohort). Lowercase x, y, and z denote the
    levels of X, Y, and Z. In a saturated log-linear model the cell
    frequencies are parametrized as

        ln {it:mu}(x,y,z) = {it:a} + {it:a}(x) + {it:a}(y) + {it:a}(z) + {it:a}(x,y) + {it:a}(x,z) + {it:a}(y,z) + {it:a}(x,y,z)

{pstd}
    where {it:a} is an overall intercept capturing the average cell frequency,
    {it:a}(x), {it:a}(y), and {it:a}(z) are factors capturing the marginal distributions
    of X, Y, and Z, {it:a}(x,y), {it:a}(x,z), and {it:a}(y,z)
    capture two-way associations, and {it:a}(x,y,z) captures the three-way
    association. For example, if X, Y, and Z are independent from each other,
    {it:a}(x,y), {it:a}(x,z), {it:a}(y,z), and {it:a}(x,y,z) will be zero for
    all x, y, and z. Likewise, if the association between X and Y is
    constant over cohorts, {it:a}(x,y,z) will be zero for all x, y, and z,
    such that

        ln {it:mu}(x,y,z) = {it:a} + {it:a}(x) + {it:a}(y) + {it:a}(z) + {it:a}(x,y) + {it:a}(x,z) + {it:a}(y,z)

{pstd}
    This is the so-called constant-fluidity model. The saturated
    model accurately describes the data, but has too many parameters
    to be informative; the constant-fluidity model is too
    simplistic because it assumes away any change in relative mobility. The unidiff
    model takes a middle ground in that it allows the association between X and Y
    to vary with Z, but places a specific restriction on the form of this
    variation. In particular, the unidiff model introduces a scaling factor
    {it:b}(z) such that

        ln {it:mu}(x,y,z) = {it:a} + {it:a}(x) + {it:a}(y) + {it:a}(z) + {it:a}(x,z) + {it:a}(y,z) + {it:a}(x,y) * {it:b}(z)

{pstd}
    That is, the unidiff model assumes that there is a common association pattern
    between X and Y, but the "strength" of the pattern can differ across
    cohorts.

{dlgtab:Re-expression at the individual level}

{pstd}
    Traditionally, the unidiff model has been estimated from tabular data.
    However, the model (or, at least, the interesting part of it) can also be
    expressed such that it takes the form of a regression model fitted to
    individual-level data. From a perspective with Y as the "dependent"
    variable, the saturated log-linear model is equivalent to a multinomial
    logit of Y on X, Z, and the interaction between X and Z, where X and Z are
    treated as factor variables. Likewise, the constant-fluidity model is a
    multinomial logit of Y on X and Z, without interaction between X and Z.
    Furthermore, the unidiff model is equivalent to a multinomial logit written
    as

        Pr(Y = y| X, Z) = exp(W'{it:theta}(y) + X'{it:psi}(y) * exp(Z'{it:phi})) / D

{pstd}
    where D is the sum of the expression in the numerator across all levels of
    Y, and W is equal to Z augmented by a constant, i.e. W = (1,Z')' (again, X
    and Z are treated as factor variables, i.e. think of X and Z as vectors
    of dummy variables). {it:theta}(y), {it:phi},
    and {it:psi}(y) are parameter vectors; {it:phi} is common to all levels of
    Y, {it:theta}(y) and {it:psi}(y) are level-specific. In this model,
    {it:theta}(y) represents {it:a}(y) and {it:a}(y,z) (the marginal
    distribution of Y as well as the main effects of Z, i.e. how the marginal
    distribution of Y depends on Z), exp({it:phi}) represents {it:b}(z) (the
    unidiff scaling factors), and {it:psi}(y) represents {it:a}(x,y) (the
    association between X and Y). Terms {it:a}(x) (marginal distribution of X),
    {it:a}(z) (marginal distribution of Z), {it:a}(x,z) (association between X
    and Z) are not represented in the model (i.e., the model only contains
    parameters that are related to Y).

{dlgtab:Generalization: multiple unidiff terms}

{pstd}
    Generally seen, the unidiff model is just a multinomial logit model
    that contains a special kind of interaction terms. The model may thus be
    useful also for research questions that have nothing to do with social
    mobility. Furthermore, the model can be generalized so that it contains
    multiple unidiff terms. Let X1 and X2 be two sets of independent
    variables, Z1 and Z2 two sets of layer variables, and C a set of
    control variables that are not interacted with Z1 or Z2. The model can then be
    written as:

{p 8 8 2}Pr(Y = y| X1, Z1, X2, Z2, C) ={p_end}
{p 12 12 2}exp(W'{it:theta}(y) + X1'{it:psi1}(y) * exp(Z1'{it:phi1}) + X2'{it:psi2}(y) * exp(Z2'{it:phi2})) / D{p_end}

{pstd}
    where W = (1, Z1', Z2', C')'. The model can be extended analogously
    to accommodate more than two unidiff terms.

{dlgtab:Estimation}

{pstd}
    {cmd:udiff} estimates the unidiff model using {helpb ml}. To obtain good
    starting values, {cmd:udiff} first fits a constant-fluidity model (which is
    equivalent to a standard {helpb mlogit} model ignoring the layer
    variables). A test of the unidiff model against the constant-fluidity model
    is included in the output (as an LR test or a Wald test, depending on
    context).

{pstd}
    As usual in a multinomial logit, the coefficients are set to zero for one
    of the levels of Y to identify the model. Furthermore, as is usual for factor variables,
    {it:phi} is set to zero for one of the levels of Z if Z is a categorical variable. exp({it:phi})
    then expresses the unidiff scaling factors with respect to this base category.

{pstd}
    Estimating the unidiff model from individual-level data is more demanding
    than fitting the model to a contingency table (although note that, for
    efficient computation, {cmd:fweight}s can be used on collapsed data),
    but it brings about enhanced flexibility. For example, it is easily
    possible to include continuous (rather than categorical) origin and layer
    variables, control variables whose effects as assumed constant over cohorts
    can be taken into account (by including them in W), and standard errors for
    the parameter estimates are readily available (including support for
    sampling weights or other characteristics of a complex survey design).


{marker saved_results}{...}
{title:Saved results}

{pstd}
    {cmd:udiff} stores results as described in {helpb ml##results:[R] ml},
    as well as the following elements:

{p2colset 7 22 26 2}{...}
{p2col 5 22 26 2: Scalars}{p_end}
{p2col : {cmd:e(k_out)}}number of outcomes
    {p_end}
{p2col : {cmd:e(ibaseout)}}index of the base outcome
    {p_end}
{p2col : {cmd:e(k_unidiff)}}number of unidiff terms
    {p_end}
{p2col : {cmd:e(k_eform)}}number of equations to be affected by the {cmd:eform} option
    {p_end}

{p2col 5 22 26 2: Macros}{p_end}
{p2col : {cmd:e(cmd)}}{cmd:udiff}
    {p_end}
{p2col : {cmd:e(predict)}}{cmd:udiff_p}
    {p_end}
{p2col : {cmd:e(estat_cmd)}}{cmd:udiff_estat}
    {p_end}
{p2col : {cmd:e(cfonly)}}{cmd:cfonly} or empty
    {p_end}
{p2col : {cmd:e(layervars)}}names of layer variables; if {cmd:e(k_unidiff)}=1
    {p_end}
{p2col : {cmd:e(layervars#)}}names of layer variables of #th unidiff term; if {cmd:e(k_unidiff)}>1
    {p_end}
{p2col : {cmd:e(xvars)}}names of independent variables; if {cmd:e(k_unidiff)}=1
    {p_end}
{p2col : {cmd:e(xvars#)}}names of independent variables of #th unidiff term; if {cmd:e(k_unidiff)}>1
    {p_end}
{p2col : {cmd:e(controlvars)}}names of control variables
    {p_end}
{p2col : {cmd:e(eqnames)}}names of equations
    {p_end}
{p2col : {cmd:e(out)}}values of {it:depvar}
    {p_end}
{p2col : {cmd:e(baseout)}}value of {it:depvar} treated as the base outcome
    {p_end}
{p2col : {cmd:e(out_labels)}}value labels of {it:depvar} (if available)
    {p_end}

{pstd}
    Without the {cmd:post} option, {cmd:estat rescale}, {cmd:estat lambda}, and {cmd:estat kappa} store the following
    results in {cmd:r()}:

{p2colset 7 22 26 2}{...}
{p2col 5 22 26 2: Scalars}{p_end}
{p2col : {cmd:r(N)}}number of observations
    {p_end}

{p2col 5 22 26 2: Matrices}{p_end}
{p2col : {cmd:r(b)}}coefficients
    {p_end}
{p2col : {cmd:r(V)}}variance matrix
    {p_end}
{p2col : {cmd:r(lambda)}}compact representation coefficients ({cmd:estat lambda} only)
    {p_end}

{pstd}
    Without the {cmd:post} option, {cmd:estat rescale},
    {cmd:estat lambda}, and
    {cmd:estat kappa}
    store the following results in {cmd:e()}:

{p2colset 7 22 26 2}{...}
{p2col 5 22 26 2: Scalars}{p_end}
{p2col : {cmd:r(N)}}number of observations
    {p_end}
{p2col : {cmd:r(N_clust)}}number of clusters (if {it:vcetype} is {cmd:cluster})
    {p_end}
{p2col : {cmd:r(k_eq)}}number of equations
    {p_end}
{p2col : {cmd:e(k_eform)}}number of equations to be affected by the {cmd:eform} option
    {p_end}

{p2col 5 22 26 2: Macros}{p_end}
{p2col : {cmd:e(cmd)}}{cmd:udiff_estat}
    {p_end}
{p2col : {cmd:e(subcmd)}}{cmd:rescale} or {cmd:lambda}
    {p_end}
{p2col : {cmd:e(estat_cmd)}}{cmd:udiff_estat}
    {p_end}
{p2col : {cmd:e(title)}}title used in output
    {p_end}
{p2col : {cmd:e(vce)}}{it:vcetype} as specified when calling {cmd:udiff}
    {p_end}
{p2col : {cmd:e(vcetype)}}title used to label Std. Err.
    {p_end}
{p2col : {cmd:e(clustvar)}}name of cluster variable
    {p_end}
{p2col : {cmd:e(properties)}}{cmd:b V}
    {p_end}

{p2col 5 22 26 2: Matrices}{p_end}
{p2col : {cmd:e(b)}}coefficients
    {p_end}
{p2col : {cmd:e(V)}}variance matrix
    {p_end}
{p2col : {cmd:r(lambda)}}compact representation coefficients ({cmd:estat lambda} only)
    {p_end}


{marker references}{...}
{title:References}

{phang}
    Erikson, R., J.H. Goldthorpe. 1992. The Constant Flux: A Study of Class
    Mobility in Industrial Societies. Oxford: Oxford University Press.
    {p_end}
{phang}
    Pisati, M. 2000. {stata "net describe sg142, from(http://www.stata.com/stb/stb55)":sg142}: Uniform
    layer effect models for the analysis of differences in two-way associations. Stata
    Technical Bulletin 55: 33-47.
    {p_end}
{phang}
    Xie, Y. 1992. The Log-Multiplicative Layer Effect Model for Comparing Mobility
    Tables. American Sociological Review 57(3): 380–395.
    {p_end}


{marker authors}{...}
{title:Authors}

{pstd}
    Ben Jann, University of Bern, ben.jann@soz.unibe.ch
    {p_end}
{pstd}
    Simon Seiler, University of Bern, simon.seiler@icer.unibe.ch

{pstd}
    Thanks for citing this software as follows:

{pmore}
    Jann, B., S. Seiler. 2019. udiff: Stata module to estimate the generalized
    unidiff model for individual-level data. Available from
    {browse "http://ideas.repec.org/c/boc/bocode/s458711.html"}.