{smcl} {* 24oct2023}{...} {hline} help {hi:cwmglm} {title:Title} {cmd:cwmglm} - Cluster Weighted Model for Generalized Linear Models {marker syntax}{...} {title:Syntax} {p 8 17 2} {cmd:cwmglm} {it:{help varname:depvar indepvars}} {ifin}, {cmdab:post:erior(stub)} [{cmd: start({it:svmethod)} k(#)} {cmdab:iter:ate(#)} {help cwmglm##xnormal_opts:{it:xnormal_opts}} {cmdab:xn:ormal(varlist)} {cmdab:xpoi:sson(varlist)} {cmdab:xbin:omial(varlist)} {cmdab:xmult:inomial(varlist)} {cmdab:nd:raw(#)} {cmdab:iteratex:norm(#)} {cmdab:conv:crit(#)} {cmd:nolog} {cmdab:nocl:ustertable} {cmdab:nodev:iance} {cmdab:nomar:ginal} {cmdab:noregt:able}] {synoptset 22 tabbed}{...} {synopthdr} {synoptline} {syntab:Main Options} {synopt:{opt k(#)}} the number of mixture components. Default is 2. {p_end} {syntab:Marginalization Options} {synopt:{opt xnormal}({help varname:varlist})} normal covariates (marginal distribution) {p_end} {synopt:{opt xbinomial}({help varname:varlist})} binomial covariates (marginal distribution) {p_end} {synopt:{opt xmultinomial}({help varname:varlist})} multinomial covariates (marginal distribution) {p_end} {synopt:{opt xpoisson}({help varname:varlist})} xpoisson covariates (marginal distribution) {p_end} {synopt:{opt xnormal_opts}} parsimonious models for multivariate normal covariates. The possible options are eii, vii, eei, vei, evi, vvi, eee, vee, eve, vve, eev, vev, evv and vvv {p_end} {syntab:Regression Options} {synopt:{opt family}({it:familyname})} Family for the GLM. The allowed families are {it: gaussian, poisson} and {it: binomial}{p_end} {syntab:Initialization} {synopt:{opt start}({it:svmethod})} set the initialization method. Allowed methods are {it: kmeans}, {it: randomid}, {it: randompr} {p_end} {synopt:{opt initial}({help varname:varlist})} starting values of class membership. Applies only if start({it: custom}) is specified {p_end} {syntab:Maximization options} {synopt:{opt iterate(#)}} the number of iterations of the EM algorithm {p_end} {synopt:{opt iteratexnorm(#)}} the number of iterations for the maximization of parsimonious models {p_end} {synopt:{opt convcrit(#)}} the stopping criterion for the Aitken acceleration. (Default 1e-5) {p_end} {syntab:Display options} {synopt:{opt nlog}} supresses iteration log {p_end} {synopt:{opt noclustertable}} requests {cmd: cwmglm} not to display the clustering table {p_end} {synopt:{opt nodeviance}} requests {cmd: cwmglm} not to display the deviance measures {p_end} {synopt:{opt nomarginal}} requests {cmd: cwmglm} not to display the parameters of the marginal distributions {p_end} {synopt:{opt noregtable}} requests {cmd: cwmglm} not to display the regression table {p_end} {synoptline} {p2colreset}{...} {p 4 6 2}{it:indepvars} may contain factor variable operators; see {help fvvarlist}.{p_end} {marker weight}{...} {p 4 6 2} See {cmd: cwmglm postestimation} for features available after estimation. {p_end} {title:Description} {cmd:cwmglm} estimates mixtures of regression models with random covariates through maximum likelihood and expectation-maximization algorithm. {title:Options} {synoptset 30 tabbed}{...} {dlgtab:Main options} {synopt: {opt k(#)}} the number of mixture components. Default is 2 , the mimimum is 1. {p_end} {dlgtab:Marginalization options} {synopt: {opt xnormal(varlist)}} variables having normal distributions {p_end} {synopt: {opt xpoisson(varlist)}} variables having poisson distributions {p_end} {synopt: {opt xbinomial(varlist)}} variables having binomial distributions. This options only accepts {0,1} binary variables. {p_end} {synopt: {opt xmultinomial(varlist)}} variables having multinomial distributions. Factor variable syntax is not allowed. Categories are detected automatically. {p_end} {synopt: {it: xnormal_opts}} indicates the parsimonious model to be fitted in {opt xnormal(varlist)} (see {help cwmglm##refrlink:Celeux and Govaert, 1995}) . If the number of variables in {opt xnormal(varlist)} is equal to one the possibile options are {opt eee} and {opt vvv}. Default is {opt vvv}. The possible multivariate normal models are the following{p_end} {marker xnormal_opts}{...} {synopt: {opt eii}} Equal volume, spherical shape {p_end} {synopt: {opt vii}} Variable volume, spherical shape {p_end} {synopt: {opt eei}} Equal volume, equal shape, axis-aligned orientation {p_end} {synopt: {opt vei}} Variable volume, equal shape, axis-aligned orientation {p_end} {synopt: {opt evi}} Equal volume, variable shape, axis-aligned orientation {p_end} {synopt: {opt vvi}} Variable volume, variable shape, axis-aligned orientation {p_end} {synopt: {opt eee}} Equal volume, equal shape, equal orientation {p_end} {synopt: {opt vee}} Variable volume, equal shape, equal orientation {p_end} {synopt: {opt eve}} Equal volume, variable shape, equal orientation {p_end} {synopt: {opt vve}} Variable volume, variable shape, equal orientation {p_end} {synopt: {opt eev}} Equal volume, equal shape, variable orientation {p_end} {synopt: {opt vev}} Variable volume, equal shape, variable orientation {p_end} {synopt: {opt evv}} Equal volume, variable shape, variable orientation {p_end} {synopt: {opt vvv}} Variable volume, variable shape, variable orientation {p_end} {dlgtab:Regression options} {synopt: {opt family(familyname)}} specifies the distribution of {help varname:depvar} for the GLM (see {help glm}). {cmd:family(gaussian}{cmd:)} (link indentity) is the default. The other allowed distributions are {cmd:family(binomial}{cmd:)} (link logit) and {cmd:family(poisson}{cmd:)} (link log). {p_end} {dlgtab:Initialization options} {synopt: {opt start(svmethod)}} Specifies the initialization procedure of the component membership probabilities or the component memberships. {p_end} {synopt: {opt ndraws(#)}} specifies the number of random draws for selecting the starting values if {opt start(randompr)} or {opt start(randomid)} are specified. Starting values are selected if they have the highest log-likelihood value from the EM iterations. Default is 10. {p_end} {phang2} {opt start(kmeans)} specifies that starting values are computed by assigning each observation to an initial latent class that is determined by running a {opt kmeans} cluster analysis on {it:{help varname:depvar indepvars}}. This is the default. {p_end} {phang2} {opt start(randomid)} specifies that starting values are computed by randomly assigning observations to initial classes. {p_end} {phang2} {opt start(randompr)} specifies that starting values are computed by randomly assigning initial class probabilities. {p_end} {phang2} {opt start(custom)} specifies that starting values are provided by the user. {p_end} {phang2} {opt initial(varlist)} starting values of class memberhsip. varlist must contain a list of k numeric variables. This option is ignored if {opt start(custom)} is not specified. {p_end} {dlgtab:Maximization options} {synopt: {opt iterate(#)}} the number of EM iterations. Default is 1200 {p_end} {synopt: {opt iteratexnorm(#)}} the number of iterations for the parisimonious models (see {cmd: xnorm(varlist)} and {help cwmglm##xnormal_opts:{it:xnormal_opts}} options). It affects only the estimations of vee, eve, vve, vev and vei models. Default is 200 {p_end} {synopt: {opt convcrit(#)}} the stopping criterion for the Aitken acceleration procedure. Default threshold is 1e-5. {p_end} {dlgtab:Display options} {synopt:{opt nlog}} supresses iteration log {p_end} {synopt:{opt noclustertable}} requests {cmd: cwmglm} not to display the clustering table {p_end} {synopt:{opt nodeviance}} requests {cmd: cwmglm} not to display the deviance measures {p_end} {synopt:{opt nomarginal}} requests {cmd: cwmglm} not to display the parameters of the marginal distributions {p_end} {synopt:{opt noregtable}} requests {cmd: cwmglm} not to display the regression table {p_end} {title:Saved Results} {synoptset 15 tabbed}{...} {phang2}{p_end} {p2col 5 11 15 2:Scalars}{p_end} {synopt:{cmd:e(k)}} Number components {p_end} {synopt:{cmd:e(N)}} the number of observations {p_end} {synopt:{cmd:e(df_r)}} the number of estimated parameters {p_end} {synopt:{cmd:e(ll)}} log likelihood {p_end} {synopt:{cmd:e(bic)}} Bayesian information criterion (BIC) {p_end} {synopt:{cmd:e(aic)}} Akaike information criterion (AIC) {p_end} {synopt:{cmd:e(nmulti)}} Number of multinomial concomitant {p_end} {synoptset 15 tabbed}{...} {phang2}{p_end} {p2col 5 11 15 2:Matrices}{p_end} {synopt:{cmd:e(b)}} coefficient vector of the glm {p_end} {synopt:{cmd:e(V)}} variance-covariance matrix of the glm{p_end} {synopt:{cmd:e(phi0)}} dispersion parameter for the glm (see {cmd: help glm}) {p_end} {synopt:{cmd:e(cl_table)}} estimated group size {p_end} {synopt:{cmd:e(localdeviance)}} within deviance decomposition matrix for the glm{p_end} {synopt:{cmd:e(globaldeviance)}} the overall residual deviance, the overall explained deviance, the between deviance and the total deviance {p_end} {synopt:{cmd:e(R2)}} generalized coefficients of determination for the GLM {p_end} {synopt:{cmd:e(prior)}} mixture components weights{p_end} {synopt:{cmd:e(p_multi_#)}} probabilities of a each outcome for the {cmd: xmultinomial} variables . (returns n matrices where n is the number of multinomial variables) {p_end} {synopt:{cmd:e(p_binomial)}} probabilities of a positive outcome for the {cmd: xbinomial} variables {p_end} {synopt:{cmd:e(lambda)}} mean of the {cmd: xpoisson} variables{p_end} {synopt:{cmd:e(mu)}} mean of the {cmd: xnorm} variables{p_end} {synopt:{cmd:e(epsilon)}} variance-covariance matrices of the {cmd: xnorm} variables{p_end} {synopt:{cmd:e(ic)}} AIC and BIC {p_end} {synoptset 15 tabbed}{...} {phang2}{p_end} {p2col 5 11 15 2:Macros}{p_end} {synopt:{cmd:e(sample)}} marks estimation sample{p_end} {synopt:{cmd:e(depvar)}} the dependent variable {p_end} {synopt:{cmd:e(indepvars)}} list of covariates for the regression model {p_end} {synopt:{cmd:e(cmd)}} {cmd:cwmglm} {p_end} {synopt:{cmd:e(xnorm)}} the variables with normal marginalization {p_end} {synopt:{cmd:e(xnormodel)}} the parsimonious model used for the normal marginalization {p_end} {synopt:{cmd:e(xpoisson)}} the variables with poisson marginalization {p_end} {synopt:{cmd:e(xbinomial)}} the variables with binomial marginalization {p_end} {synopt:{cmd:e(xmultinomial)}} the variables with multinomial marginalization {p_end} {synopt:{cmd:e(glmcmd)}} the command used for the glm {p_end} {synoptset 15 tabbed}{...} {phang2}{p_end} {p2col 5 11 15 2:Function}{p_end} {synopt:{cmd:e(sample)}} marks estmations sample {p_end} {title:Examples} {pstd}Setup{p_end} {phang2}{cmd: . use covid, clear} {phang2}{cmd: . describe} {pstd}Mixture of Poisson GLM with random covariates (k=2){p_end} {phang2}{cmd: . cwmglm y x1 x2 x3 n1 female, xnormal(x1 x2 x3) vvv xpoisson(n1) xbin(female) k(2) family(poisson)} {hline} {pstd}Setup{p_end} {phang2} {cmd:. use students, clear} {phang2} {cmd:. describe} {pstd}Mixture of regressions with random covariates, model EEE{p_end} {phang2} {cmd:. cwmglm weight height heightf, k(2) xnormal(height heightf) eee } {hline} {pstd}Setup{p_end} {phang2} {cmd:. use multinorm, clear} {pstd} Preparing the loop for information criteria model selection {p_end} {phang2} {cmd:. local models vev evv vvv eei vei evi vvi eii vii eee vee eve vve eev } {phang2} {cmd:. global CWMs} {pstd} Looping over different parsimonious multivariate normal models and letting k range from 2 to 5 {p_end} {phang2} {cmd:. foreach model of local models {c -(}} {phang3} {cmd:. forval i=2/5 {c -(}} {phang3} {c 47}{c 47} note the absence of {it: depvar indepvars} {phang3} {cmd:. quietly cwmglm, xnorm(x1 x2) k(`i') `model'} {phang3}{space 4}{cmd:.if (e(converged)==1) {c -(}} {phang3}{space 4}{cmd:.estimates store `model'`i' } {phang3}{space 4}{cmd:.global CWMs $CWMs `model'`i'} {phang3}{space 4} {cmd:. {c )-}} {phang3}{space 4} {cmd:. else di in red "model `model' with `i' mixture component did not converge"} {phang3}{space 2}{cmd:. {c )-}} {phang2}{space 2}{cmd:. {c )-}} {pstd} Model selection {p_end} {phang2}{space 2}{cmd:. cwmcompare $CWMs} {pstd} Activating the estimates from the best model {p_end} {phang2}{space 2}{cmd:. estimates restore `r(bestAIC)' } {title:References} {marker refrlink}{...} Celeux, G., & Govaert, G. (1995). {browse "https://www.sciencedirect.com/science/article/pii/0031320394001256": Gaussian parsimonious clustering models. Pattern recognition}, 28(5), 781-793. Ingrassia, S., Punzo, A., Vittadini, G., & Minotti, S. C. (2015). {browse "https://link.springer.com/article/10.1007/s00357-015-9177-z" :Erratum to: The generalized linear mixed cluster-weighted model}. Journal of Classification, 32(2), 327-355. {title:Authors} {phang} Daniele Spinelli, corresponding author (University of Milano-Bicocca, daniele.spinelli@unimib.it) {p_end} {phang} Salvatore Ingrassia (University of Catania, s.ingrassia@unict.it) {p_end} {phang} Giorgio Vittadini (University of Milano-Bicocca, giorgio.vittadinid@unimib.it) {p_end}