{smcl} {* 31maj2022}{...} {hline} help {hi:cwmglm} {title:Title} {cmd:cwmglm} - Cluster Weighted Model for Generalized Linear Models {marker syntax}{...} {title:Syntax} {p 8 17 2} {cmd:cwmglm} {it:{help varname:depvar indepvars}} {ifin}, {cmdab:post:erior(stub)} [{cmd: start({it:svmethod)} k(#)} {cmdab:iter:ate(#)} {help cwmglm##xnormal_opts:{it:xnormal_opts}} {cmdab:xn:ormal(varlist)} {cmdab:xpoi:sson(varlist)} {cmdab:xbin:omial(varlist)} {cmdab:xmult:inomial(varlist)} {cmdab:nd:raw(#)} {cmdab:iteratex:norm(#)} {cmdab:conv:crit(#)}] {synoptset 22 tabbed}{...} {synopthdr} {synoptline} {syntab:Main Options} {synopt:{opt posterior(stub)}} generates posterior probabilities. Required option. {p_end} {synopt:{opt k(#)}} the number of mixture components. Default is 2 {p_end} {syntab:Marginalization Options} {synopt:{opt xnormal}({help varname:varlist})} normal covariates (marginal distribution) {p_end} {synopt:{opt xbinomial}({help varname:varlist})} binomial covariates (marginal distribution) {p_end} {synopt:{opt xmultinomial}({help varname:varlist})} multinomial covariates (marginal distribution) {p_end} {synopt:{opt xpoisson}({help varname:varlist})} xpoisson covariates (marginal distribution) {p_end} {synopt:{opt xnormal_opts}} parsimonious models for multivariate normal covariates. The possible options are eii, vii, eei, vei, evi, vvi, eee, vee, eve, vve, eev, vev, evv and vvv {p_end} {syntab:Regression Options} {synopt:{opt family}({it:familyname})} Family for the GLM. The allowed families are {it: gaussian, poisson} and {it: binomial}{p_end} {syntab:Initialization} {synopt:{opt start}({it:svmethod})} set the initialization method. Allowed methods are {it: kmeans}, {it: randomid}, {it: randompr} {p_end} {synopt:{opt initial}({help varname:varlist})} starting values of class membership. Applies only if start({it: custom}) is specified {p_end} {syntab:Maximization options} {synopt:{opt iterate(#)}} the number of iterations of the EM algorithm {p_end} {synopt:{opt iteratexnorm(#)}} the number of iterations for the maximization of parsimonious models {p_end} {synopt:{opt convcrit(#)}} the stopping criterion for the Aitken acceleration. (Default 1e-5) {p_end} {synoptline} {p2colreset}{...} {p 4 6 2}{it:indepvars} may contain factor variable operators; see {help fvvarlist}.{p_end} {marker weight}{...} {p 4 6 2} See {cmd: cwmglm postestimation} for features available after estimation. {p_end} {title:Description} {cmd:cwmglm} estimates mixtures of regression models with random covariates through maximum likelihood and expectation-maximization algorithm. {title:Options} {synoptset 30 tabbed}{...} {dlgtab:Main options} {synopt: {opt posterior(stub)}} Required option. Generates a set posterior group probabilities (one variable for each mixture component) {p_end} {synopt: {opt k(#)}} the number of mixture components. Default is 2 {p_end} {dlgtab:Marginalization options} {synopt: {opt xnormal(varlist)}} variables having normal distributions {p_end} {synopt: {opt xpoisson(varlist)}} variables having poisson distributions {p_end} {synopt: {opt xbinomial(varlist)}} variables having binomial distributions {p_end} {synopt: {opt xmultinomial(varlist)}} variables having multinomial distributions. Factor variable syntax is not allowed. Categories are detected automatically. {p_end} {synopt: {it: xnormal_opts}} indicates the parsimonious model to be fitted in {opt xnormal(varlist)}. If the number of variables in {opt xnormal(varlist)} is equal to one the possibile options are {opt eee} and {opt vvv}. Default is {opt vvv}. (see {help cwmglm##refrlink:Celeux and Govaert, 1995}) The possible multivariate normal models are the following{p_end} {marker xnormal_opts}{...} {pstd} {synopt: {opt eii}} Equal volume, spherical shape {p_end} {synopt: {opt vii}} Variable volume, spherical shape {p_end} {synopt: {opt eei}} Equal volume, equal shape, axis-aligned orientation {p_end} {synopt: {opt vei}} Variable volume, equal shape, axis-aligned orientation {p_end} {synopt: {opt evi}} Equal volume, variable shape, axis-aligned orientation {p_end} {synopt: {opt vvi}} Variable volume, variable shape, axis-aligned orientation {p_end} {synopt: {opt eee}} Equal volume, equal shape, equal orientation {p_end} {synopt: {opt vee}} Variable volume, equal shape, equal orientation {p_end} {synopt: {opt eve}} Equal volume, variable shape, equal orientation {p_end} {synopt: {opt vve}} Variable volume, variable shape, equal orientation {p_end} {synopt: {opt eev}} Equal volume, equal shape, variable orientation {p_end} {synopt: {opt vev}} Variable volume, equal shape, variable orientation {p_end} {synopt: {opt evv}} Equal volume, variable shape, variable orientation {p_end} {synopt: {opt vvv}} Variable volume, variable shape, variable orientation {p_end} {dlgtab:Regression options} {synopt: {opt family(familyname)}} specifies the distribution of {help varname:depvar} for the GLM (see {help glm}). {cmd:family(gaussian}{cmd:)} (link indentity) is the default. The other allowed distributions are {cmd:family(binomial}{cmd:)} (link logit) and {cmd:family(poisson}{cmd:)} (link log). {p_end} {dlgtab:Initialization options} {synopt: {opt start(svmethod)}} Specifies the initialization procedure of the component membership probabilities or the component memberships. {p_end} {synopt: {opt ndraws(#)}} specifies the number of random draws for selecting the starting values if {opt start(randompr)} or {opt start(randomid)} are specified. Starting values are selected if they have the highest log-likelihood value from the EM iterations. Default is 10. {p_end} {phang2} {opt start(kmeans)} specifies that starting values are computed by assigning each observation to an initial latent class that is determined by running a {opt kmeans} cluster analysis on {it:{help varname:depvar indepvars}}. This is the default. {p_end} {phang2} {opt start(randomid)} specifies that starting values are computed by randomly assigning observations to initial classes. {p_end} {phang2} {opt start(randompr)} specifies that starting values are computed by randomly assigning initial class probabilities. {p_end} {phang2} {opt start(custom)} specifies that starting values are provided by the user. {p_end} {phang2} {opt initial(varlist)} starting values of class memberhsip. varlist must contain a list of k numeric variables. This option is ignored if {opt start(custom)} is not specified. {p_end} {dlgtab:Maximization options} {synopt: {opt iterate(#)}} the number of EM iterations. Default is 1200 {p_end} {synopt: {opt iteratexnorm(#)}} the number of iterations for the parisimonious models (see {cmd: xnorm(varlist)} and {help cwmglm##xnormal_opts:{it:xnormal_opts}} options). It affects only the estimations of vee, eve, vve, vev and vei models. Default is 200 {p_end} {synopt: {opt convcrit(#)}} the stopping criterion for the Aitken acceleration procedure. Default threshold is 1e-5. {p_end} {title:Saved Results} {synoptset 15 tabbed}{...} {phang2}{p_end} {p2col 5 11 15 2:Scalars}{p_end} {synopt:{cmd:e(k)}} Number components {p_end} {synopt:{cmd:e(N)}} the number of observations {p_end} {synopt:{cmd:e(df_r)}} the number of estimated parameters {p_end} {synopt:{cmd:e(ll)}} log likelihood {p_end} {synopt:{cmd:e(bic)}} Bayesian information criterion (BIC) {p_end} {synopt:{cmd:e(aic)}} Akaike information criterion (AIC) {p_end} {synopt:{cmd:e(nmulti)}} Number of multinomial concomitant {p_end} {synoptset 15 tabbed}{...} {phang2}{p_end} {p2col 5 11 15 2:Matrices}{p_end} {synopt:{cmd:e(b)}} coefficient vector of the glm {p_end} {synopt:{cmd:e(V)}} variance-covariance matrix of the glm{p_end} {synopt:{cmd:e(phi0)}} dispersion parameter for the glm (see {cmd: help glm}) {p_end} {synopt:{cmd:e(cl_table)}} estimated group size {p_end} {synopt:{cmd:e(localdeviance)}} within deviance decomposition matrix for the glm{p_end} {synopt:{cmd:e(globaldeviance)}} the overall residual deviance, the overall explained deviance, the between deviance and the total deviance {p_end} {synopt:{cmd:e(R2)}} generalized coefficients of determination for the GLM {p_end} {synopt:{cmd:e(prior)}} mixture components weights{p_end} {synopt:{cmd:e(p_multi_#)}} probabilities of a each outcome for the {cmd: xmultinomial} variables . (returns n matrices where n is the number of multinomial variables) {p_end} {synopt:{cmd:e(p_binomial)}} probabilities of a positive outcome for the {cmd: xbinomial} variables {p_end} {synopt:{cmd:e(lambda)}} mean of the {cmd: xpoisson} variables{p_end} {synopt:{cmd:e(mu)}} mean of the {cmd: xnorm} variables{p_end} {synopt:{cmd:e(epsilon)}} variance-covariance matrices of the {cmd: xnorm} variables{p_end} {synopt:{cmd:e(ic)}} AIC and BIC {p_end} {synoptset 15 tabbed}{...} {phang2}{p_end} {p2col 5 11 15 2:Macros}{p_end} {synopt:{cmd:e(sample)}} marks estimation sample{p_end} {synopt:{cmd:e(depvar)}} the dependent variable {p_end} {synopt:{cmd:e(indepvars)}} list of covariates for the regression model {p_end} {synopt:{cmd:e(cmd)}} {cmd:cwmglm} {p_end} {synopt:{cmd:e(xnorm)}} the variables with normal marginalization {p_end} {synopt:{cmd:e(xnormodel)}} the parsimonious model used for the normal marginalization {p_end} {synopt:{cmd:e(xpoisson)}} the variables with poisson marginalization {p_end} {synopt:{cmd:e(xbinomial)}} the variables with binomial marginalization {p_end} {synopt:{cmd:e(xmultinomial)}} the variables with multinomial marginalization {p_end} {synopt:{cmd:e(glmcmd)}} the command used for the glm {p_end} {synoptset 15 tabbed}{...} {phang2}{p_end} {p2col 5 11 15 2:Function}{p_end} {synopt:{cmd:e(sample)}} marks estmations sample {p_end} {title:Examples} {pstd}Setup{p_end} {phang2} {cmd:. use students, clear} {phang2} {cmd:. describe} {pstd}Mixture of regressions with random covariates, model EEE{p_end} {phang2} {cmd:. cwmglm weight height heightf, k(2) posterior(z) xnormal(height heightf) eee } {hline} {pstd}Setup{p_end} {phang2} {cmd:. use multinorm, clear} {pstd} Preparing the loop for information criteria model selection {p_end} {phang2} {cmd:. local models vev evv vvv eei vei evi vvi eii vii eee vee eve vve eev } {phang2} {cmd:. local bestbic=10e20} {phang2} {cmd:. local bestaic=10e20} {phang2} {cmd:. cap matrix drop res} {pstd} Looping over different parsimonious multivariate normal models and letting k range from 2 to 5 {p_end} {phang2} {cmd:. foreach model of local models {c -(}} {phang3} {cmd:. forval i=2/5 {c -(}} {phang3} {cmd:. cap drop _tau*} {phang3} {c 47}{c 47} note the absence of {it: depvar indepvars} {phang3} {cmd:. quietly cwmglm, xnorm(x1 x2) k(`i') posterior(_tau) `model'} {phang3}{space 4}{cmd:.if (e(converged)==1) {c -(}} {phang3}{space 4}{cmd:.matrix ic=(e(ic),`i', e(ll))} {phang3}{space 4}{cmd:.matrix rownames ic= "`model'"} {phang3}{space 4}{cmd:.matrix res = nullmat(res) \ ic} {phang3}{space 4}{cmd:.local current_BIC=e(ic)[1,2]} {phang3}{space 4} {cmd:. if (`current_BIC'<`bestbic') {c -(}} {phang3}{space 8} {cmd:. local bestbic=`current_BIC'} {phang3}{space 8} {cmd:. local bestk_BIC=`i'} {phang3}{space 8} {cmd:. local bestmodel_BIC `model'} {phang3}{space 8} {cmd:. {c )-}} {phang3}{space 4}{cmd:.local current_AIC=e(ic)[1,1]} {phang3}{space 4} {cmd:. if (`current_AIC'<`bestAIC') {c -(}} {phang3}{space 8} {cmd:. local bestAIC=`current_AIC'} {phang3}{space 8} {cmd:. local bestk_AIC=`i'} {phang3}{space 8} {cmd:. local bestmodel_AIC `model'} {phang3}{space 8} {cmd:. {c )-}} {phang3}{space 4} {cmd:. {c )-}} {phang3}{space 4} {cmd:. else di in red "model `model' with `i' mixture component did not converge"} {phang3}{space 2}{cmd:. {c )-}} {phang2}{space 2}{cmd:. {c )-}} {phang2}{space 2}{cmd:. di as result "best model according to BIC: k=`bestk_BIC' type `bestmodel_BIC'"} {phang2}{space 2}{cmd:. di as result "best model according to AIC: k=`bestk_AIC' type `bestmodel_AIC'"} {title:References} {marker refrlink}{...} Celeux, G., & Govaert, G. (1995). {browse "https://www.sciencedirect.com/science/article/pii/0031320394001256": Gaussian parsimonious clustering models. Pattern recognition}, 28(5), 781-793. Ingrassia, S., Punzo, A., Vittadini, G., & Minotti, S. C. (2015). {browse "https://link.springer.com/article/10.1007/s00357-015-9177-z" :Erratum to: The generalized linear mixed cluster-weighted model}. Journal of Classification, 32(2), 327-355. {title:Authors} {phang} Daniele Spinelli, corresponding author (University of Milano-Bicocca, daniele.spinelli@unimib.it) {p_end} {phang} Salvatore Ingrassia (University of Catania, s.ingrassia@unict.it) {p_end} {phang} Giorgio Vittadini (University of Milano-Bicocca, giorgio.vittadinid@unimib.it) {p_end}