{smcl}
{* 18jan2021}{...}
{hi:help mgof}{...}
{right:{browse "http://github.com/benjann/mgof/"}}
{hline}

{title:Title}

{pstd}{hi:mgof} {hline 2} Goodness-of-fit tests for multinomial data

{title:Syntax}{smcl}

{p 8 15 2}
{cmd:mgof} {varname} [ {cmd:=} {it:{help exp}} ] {ifin} {weight} [{cmd:,} {it:options} ]

{p 8 15 2}
{cmd:mgofi} {it:f1} {it:f2} {it:...} [ {cmd:/} {it:p1} {it:p2} {it:...} ] [{cmd:,} {it:options} ]


{synoptset 18 tabbed}{...}
{synopthdr:options}
{synoptline}
{syntab :Method 1}
{synopt :{opt a:pprox}[{cmd:(}{it:nfit}{cmd:)}]}compute large sample chi-squared
tests (the default)
{p_end}
{p2coldent:* {cmd:svy}[{cmd:(}{it:spec}{cmd:)}]}adjust tests for survey design
{p_end}
{p2coldent:* {opt vce(vcetype)}}adjust tests using {helpb proportion} variance estimate
{p_end}
{p2coldent:* {opt cl:uster(varname)}}adjust tests for intragroup correlation
{p_end}
{synopt :{opt noi:sily}}show output from {helpb proportion}
{p_end}

{syntab :Method 2}
{synopt :{opt mc}}compute Monte Carlo exact tests
  {p_end}
{synopt :{opt reps(#)}}number of replications for {cmd:mc}; default is {cmd:reps(10000)}
  {p_end}
{synopt :{opt l:evel(#)}}set confidence level for {cmd:mc}; default is {cmd:level(99)}
  {p_end}
{synopt :{opt ci:type(type)}}set confidence interval type
for {cmd:mc}; default is {cmd:citype(exact)}
{p_end}

{syntab :Method 3}
{synopt :{opt ee}}compute exhaustive enumeration exact tests
  {p_end}

{syntab :Test statistics}
{synopt :{opt nox:2}}suppress Pearson's X2 statistic
  {p_end}
{synopt :{opt nolr}}suppress log likelihood ratio statistic
  {p_end}
{synopt :{opt cr}[{cmd:(}{it:lambda}{cmd:)}]}include Cressie-Read ({it:lambda})
statistic; {it:lambda} defaults to {cmd:2/3}
  {p_end}
{synopt :{opt mlnp}}include log outcome probability statistic ({cmd:mc}
and {cmd:ee} only)
  {p_end}
{synopt :{opt ks:mirnov}}include Kolmogorov-Smirnov statistic ({cmd:mc}
and {cmd:ee} only)
  {p_end}

{syntab :Other options}
{synopt :{opt f:req}}display frequency distribution
  {p_end}
{synopt :{opt p:ercent}}display frequency distribution in percent
  {p_end}
{p2coldent:* {opt mat:rix(name)}}provide matrix containing observed and expected counts
  {p_end}
{synopt :{opt exp:ected(name)}}provide matrix (column vector) containing expected counts
  {p_end}
{synopt :{opt nodot:s}}suppress progress dots ({cmd:mc} and {cmd:ee} only)
  {p_end}
{synoptline}
{p 4 4 2}
* {cmd:svy}, {cmd:vce()}, {cmd:cluster()}, and {cmd:matrix()} not allowed with {cmd:mgofi}.

{p 4 4 2}
{cmd:by} is allowed (unless {cmd:svy} is specified); see help {helpb by}.

{p 4 4 2} {cmd:fweight}s, {cmd:pweight}s, and {cmd:iweight}s are allowed
(see help {help weight}), but {cmd:pweight}s are not allowed with {cmd:ee}
or {cmd:mc}, and {cmd:iweight}s are not allowed with {cmd:ee} and not
allowed with {cmd:mc} if the {cmd:mlnp} option is specified.


{title:Description}

{pstd} {cmd:mgof} computes goodness-of-fit tests for the distribution of
{varname}, where {varname} is a discrete (categorical, multinomial) variable. The
default is to perform classical large sample chi-squared approximation
tests based on Pearson's X2 statistic and the log likelihood ratio (G2)
statistic. Alternatively, {cmd:mgof} computes exact tests using
Monte Carlo methods or exhaustive enumeration (see the {cmd:mc} and
{cmd:ee} options).

{pstd}The (theoretical) null-distribution (the distribution against
which {varname} is tested) is specified by {it:{help exp}}. {it:{help exp}}
is assumed to evaluate to the hypothesized probabilities of the categories of {varname}
or to quantities proportional to these probabilities (e.g. expected counts;
the scale does not matter). If {it:{help exp}} is omitted, the uniform
(geometric, equiprobable) distribution is used.

{pstd}
{cmd:mgofi} is the immediate form of {cmd:mgof} (see {help immed}) where  {it:f1},
{it:f2}, etc. specify the observed counts and, optionally, {it:p1},
{it:p2}, etc. specify the theoretical probabilities or expected counts.

{pstd}The heavy lifting is done in {help Mata}. See help
{helpb mf_mm_mgof:mata mm_mgof()} for details. See Jann (2008) for an article
discussing multinomial goodness-of-fit tests and the {cmd:mgof} command.


{title:Dependencies}

{pstd}
{cmd:mgof} requires {cmd:moremata}. Type

        {com}. {net "describe moremata, from(http://fmwww.bc.edu/repec/bocode/m/)":ssc describe moremata}{txt}


{title:Options}

{dlgtab:Method 1}

{phang}{opt approx}[{cmd:(}{it:nfit}{cmd:)}],
the default method, computes classical large sample chi-squared approximation
tests based on Pearson's X2 and the log likelihood ratio (G2)
statistic (see, e.g., Horn 1977; Cressie and Read 1989; Sokal and
Rohlf 1995, Ch. 17). The degrees of freedom for
chi-squared tests are determined as {it:k}-{it:nfit}-1 where {it:k} is the
number of categories and {it:nfit}, provided by the user, indicates the number
of fitted parameters (imposed restrictions) ({it:nfit}'s default is 0). If
{cmd:pweight}s are specified, the tests are corrected as outlined in the
Methods and Formulas section.

{phang}{cmd:svy}[{cmd:(}[{it:vcetype}] [{cmd:,} {it:svy_options}]{cmd:)}]
specifies that the test results be adjusted for
survey design effects according to the {helpb svyset}
specifications. {it:vcetype} and {it:svy_options} are as described in
help {helpb svy}. The correction procedure is described in the
Methods and Formulas section below. {cmd:svy} is not allowed with {cmd:mgofi}.

{phang}{opt vce(vcetype)} specifies that the variance-covariance matrix of the
proportions be estimated using {helpb proportion}
and that the tests be adjusted based on this estimate (see Methods and Formulas
below). {it:vcetype} may be {cmd:analytic},
{bind:{cmd:cluster} {it:clustvar}}, {cmd:bootstrap}, or {cmd:jackknife}
(plus possible suboptions as described in help {it:{help vce_option}}). {cmd:analytic}
and {bind:{cmd:cluster} {it:clustvar}} are not allowed with Stata 9. {cmd:vce()} is
not allowed with {cmd:mgofi}.

{phang}{opt cluster(clustvar)} is Stata 9 syntax for
{bind:{cmd:vce(cluster} {it:clustvar}{cmd:)}}. {cmd:cluster()} is
not allowed with {cmd:mgofi}.

{phang}{opt noisily} displays the output from the {helpb proportion} command,
which is used to estimate the variances of the proportions if {cmd:svy},
{cmd:vce()}, or {cmd:cluster()} is specified or if {cmd:pweights} are applied.

{dlgtab:Method 2}

{phang}{opt mc} causes the exact p-values to be approximated by sampling
from the null distribution (Monte Carlo simulation). The default number
of replications for the simulation is
10000; see the {opt reps()} option (the same set of samples is used
for all test statistics). 99% confidence intervals are
displayed for the estimated p-values.

{phang}{opt reps(#)} sets the number of replications for the {cmd:mc} method. The
default is 10'000.

{phang}{opt level(#)} sets the level for the confidence intervals of the
p-values computed by the {cmd:mc} method. The default is
{cmd:level(99)}. Note that, unlike many other Stata commands,
{cmd:mgof} does {it:not} depend on {helpb level:set level}.

{phang}{opt citype(type)} specifies how the binomial
confidence intervals for the p-values from the {cmd:mc} method
are to be calculated. Available types are {opt exa:ct},
{opt wa:ld}, {opt w:ilson}, {opt a:gresti}, and
{opt j:effreys}. See help {helpb ci}. {cmd:citype(exact)} is the
default.

{dlgtab:Method 3}

{phang}{opt ee} causes the exact p-values to be computed by cycling through all
possible data compositions given the sample size and the number of categories. Since
the number of compositions grows very fast -- it is equal to
({it:n}+{it:k}-1)!/(({it:k}-1)!{it:n}!) where {it:n} is the sample size and
{it:k} is the number of categories -- the {cmd:ee} method is only feasible for
very small samples and few categories. An important exception
is when the null distribution is uniform (and {cmd:ksmirnov}
is not specified). In this case the tests are based
on enumerating partitions, which are much fewer in number than
compositions. For example, with {it:n}=40 and {it:k}=10,
the number of compositions is 2'054'455'634, but the number
of partitions is only 16'928 (Hirji 1997).

{dlgtab:Test statistics}

{phang}{opt nox2} suppresses Person's X2 statistic.

{phang}{opt nolr} suppresses the log likelihood ratio (G2) statistic.

{phang}{opt cr}[{cmd:(}{it:lambda}{cmd:)}] specifies that the
Cressie-Read statistic with parameter {it:lambda} be included (Cressie and Read
1984; also see Weesie 1997). The default for {it:lambda} is {cmd:2/3}.

{phang}{opt mlnp} requests that a test based on the (minus log) multinomial
probability of the observed outcome be included (see Horn 1977). {opt mlnp}
is not allowed with the {cmd:approx} method.

{phang}{opt ksmirnov} requests that the two-sided Kolmogorov-Smirnov
statistic be included.  The Kolmogorov-Smirnov statistic is sensitive to
the order of the categories and should only be used with variables that
have a natural order (i.e. ordinal or discrete metric data). Note that the
Kolmogorov-Smirnov test implemented in official Stata's {helpb ksmirnov}
is conservative in the case of discrete data (see, e.g., Conover 1972). The
methods implemented here are exact. {opt ksmirnov} is not
allowed with {cmd:approx}.

{dlgtab:Other options}

{phang}{opt freq} displays a table containing observed and
expected frequencies.

{phang}{opt percent} displays a table containing observed and
expected percent.

{phang}{opt matrix(name)} specifies that the observed and
expected counts are to be taken from matrix {it:name}
(see help {helpb matrix}). The first column of the
matrix provides the observed counts and the second column, if present,
provides the the expected counts or theoretical probabilities. The uniform
distribution is used if the matrix does not contain a second column. Do not
provide non-integer observed counts with the {cmd:ee} or {cmd:mc}
method. {cmd:matrix()} is not allowed with {cmd:mgofi}.

{phang}{opt expected(name)} specifies that the expected
counts or theoretical probabilities are to be taken from column vector
{it:name} (see help {helpb matrix}). {cmd:mgof} aborts if the
number of elements in the vector does not match the number outcomes.

{phang}{opt nodots} causes the progress dots for the {cmd:mc} and {cmd:ee}
methods to be suppressed. The default is to display a dot for each 2 percent
of completed computations.

{marker mf}
{title:Methods and formulas}

{pstd}The survey design correction procedure for the large-sample tests
is based on Rao and Scott (1981) and parallels the default independence test
correction used in {helpb "svy: tabulate twoway"} (see
{bf:[SVY] svy: tabulate twoway} and the references therein).

{pstd}Let V/(n-1) be a consistent estimate of the
variance-covariance matrix of the proportions p_i, i=1...k, where n is the number
of observations. Furthermore, let v_ij
denote an element of V, m be the number of PSUs/clusters, and L be the
number of strata. The correction then takes

        F  =  X2 / (delta * (a2 + 1)) / d  =  X2 / delta / (k-1)

{pstd}as asymptotically F(d, d*r) distributed where X2 stands for the uncorrected
test statistic and where

        delta = 1/(k-1) * sum_i v_ii/p_i

        d = (k-1) / (1 + a2)

        a2 = [ 1/(k-1) * sum_ij v_ij^2/(p_i*p_j) ] / delta^2 - 1

        r = m - L

{pstd}delta can be seen as the mean of the "generalized design effects"
for the proportions, a2 as the square of their variation
coefficient. V/(n-1) is estimated by {helpb proportion} taking into account
{cmd:pweight}s, cluster structure, or other complex survey design properties.


{title:Examples}

{p 8 8 2}{help mgof##ex1:Approximate chi-squared test}{p_end}
{p 8 8 2}{help mgof##ex2:Monte Carlo exact test}{p_end}
{p 8 8 2}{help mgof##ex3:Exhaustive enumeration exact test}{p_end}
{p 8 8 2}{help mgof##ex4:Kolmogorov-Smirnov test}{p_end}
{p 8 8 2}{help mgof##ex5:Complex survey design correction}{p_end}
{p 8 8 2}{help mgof##ex6:Test against non-uniform distribution}{p_end}
{p 8 8 2}{help mgof##ex7:Using the immediate command or the the matrix() option}{p_end}
{p 8 8 2}{help mgof##ex8:Empty cells}{p_end}

{marker ex1}{dlgtab:Approximate chi-squared test}

{pstd}A classical chi-squared tests against the uniform distribution can be performed as follows:

        {com}. drop _all
        {txt}
        {com}. set seed 38
        {txt}
        {com}. set obs 20
        {txt}obs was 0, now 20

        {com}. gen byte x = ceil(uniform()*5)
        {txt}
        {com}. mgof x, freq cr
        {res}
                               {txt}Number of obs ={res}      20
                               {txt}N of outcomes ={res}       5
                               {txt}Chi2 df       ={res}       4

        {txt}{hline 22}{c TT}{hline 23}
              Goodness-of-fit {c |}       Coef.    P-value
        {hline 22}{c +}{hline 23}
                 Pearson's X2 {c |}   {res}       10     0.0404
         {txt}Log likelihood ratio {c |}   {res}  9.55691     0.0486
           {txt}Cressie-Read (2/3) {c |}   {res} 9.699921     0.0458
        {txt}{hline 22}{c BT}{hline 23}

        {space 0}{space 0}{ralign 12:x}{space 1}{c |}{space 1}{ralign 9:observed}{space 1}{space 1}{ralign 9:expected}{space 1}
        {space 0}{hline 13}{c   +}{hline 11}{hline 11}
        {space 0}{space 0}{ralign 12:1}{space 1}{c |}{space 1}{ralign 9:{res:{sf:        3}}}{space 1}{space 1}{ralign 9:{res:{sf:        4}}}{space 1}
        {space 0}{space 0}{ralign 12:2}{space 1}{c |}{space 1}{ralign 9:{res:{sf:        1}}}{space 1}{space 1}{ralign 9:{res:{sf:        4}}}{space 1}
        {space 0}{space 0}{ralign 12:3}{space 1}{c |}{space 1}{ralign 9:{res:{sf:        9}}}{space 1}{space 1}{ralign 9:{res:{sf:        4}}}{space 1}
        {space 0}{space 0}{ralign 12:4}{space 1}{c |}{space 1}{ralign 9:{res:{sf:        2}}}{space 1}{space 1}{ralign 9:{res:{sf:        4}}}{space 1}
        {space 0}{space 0}{ralign 12:5}{space 1}{c |}{space 1}{ralign 9:{res:{sf:        5}}}{space 1}{space 1}{ralign 9:{res:{sf:        4}}}{space 1}
        {space 0}{hline 13}{c   +}{hline 11}{hline 11}
        {space 0}{space 0}{ralign 12:Total}{space 1}{c |}{space 1}{ralign 9:{res:{sf:       20}}}{space 1}{space 1}{ralign 9:{res:{sf:       20}}}{space 1}
        {txt}

{marker ex2}{dlgtab:Monte Carlo exact test}

{pstd}If the number of observations is low, exact p-values should be
computed. One approach is to approximate the exact
p-values by sampling from the null distribution ({cmd:mc} method):

        {com}. mgof x, mc cr
        {res}
        {txt}Percent completed ({res}10000{txt} replications)
        {txt}0 {hline 5} 20 {hline 6} 40 {hline 6} 60 {hline 6} 80 {hline 5} 100
        ..................................................

                                                       Number of obs ={res}      20
                                                       {txt}N of outcomes ={res}       5
                                                       {txt}Replications  ={res}   10000

        {txt}{hline 22}{c TT}{hline 47}
                              {c |}                  Exact
              Goodness-of-fit {c |}       Coef.    P-value    [99% Conf. Interval]
        {hline 22}{c +}{hline 47}
                 Pearson's X2 {c |}   {res}       10     0.0369      0.0322      0.0420
         {txt}Log likelihood ratio {c |}   {res}  9.55691     0.0753      0.0687      0.0824
           {txt}Cressie-Read (2/3) {c |}   {res} 9.699921     0.0402      0.0353      0.0455
        {txt}{hline 22}{c BT}{hline 47}
        {txt}

{marker ex3}{dlgtab:Exhaustive enumeration exact test}

{pstd} There are (20+5-1)!/((5-1)!20!) = 10626 different ways in which 20
observations can be divided into 5 (or less) categories. Performing an
exact test based on generating all these compositions would not be
much of a problem. However, because the null distribution is uniform,
only 192 partitions, a subset of the 10626 composition, have to be generated:

        {com}. mgof x, ee cr
        {res}
        {txt}Percent completed ({res}192{txt} partitions)
        {txt}0 {hline 5} 20 {hline 6} 40 {hline 6} 60 {hline 6} 80 {hline 5} 100
        ..................................................

                               Number of obs ={res}      20
                               {txt}N of outcomes ={res}       5
                               {txt}Partitions    ={res}     192

        {txt}{hline 22}{c TT}{hline 23}
                              {c |}                  Exact
              Goodness-of-fit {c |}       Coef.    P-value
        {hline 22}{c +}{hline 23}
                 Pearson's X2 {c |}   {res}       10     0.0392
         {txt}Log likelihood ratio {c |}   {res}  9.55691     0.0773
           {txt}Cressie-Read (2/3) {c |}   {res} 9.699921     0.0432
        {txt}{hline 22}{c BT}{hline 23}
        {txt}

{marker ex4}{dlgtab:Kolmogorov-Smirnov test}

{pstd}The {cmd:ksmirnov} option performs an exact Kolmogorov-Smirnov test for discrete data
(not allowed with {cmd:approx}). Example:

        {com}. mgof x, ee nox2 nolr ksmirnov
        {res}
        {txt}Percent completed ({res}10626{txt} compositions)
        {txt}0 {hline 5} 20 {hline 6} 40 {hline 6} 60 {hline 6} 80 {hline 5} 100
        ..................................................

                               Number of obs ={res}      20
                               {txt}N of outcomes ={res}       5
                               {txt}Compositions  ={res}   10626

        {txt}{hline 22}{c TT}{hline 23}
                              {c |}                  Exact
              Goodness-of-fit {c |}       Coef.    P-value
        {hline 22}{c +}{hline 23}
         Kolmogorov-Smirnov D {c |}   {res}       .2     0.1656
        {txt}{hline 22}{c BT}{hline 23}
        {txt}

{marker ex5}{dlgtab:Complex survey design correction}

{pstd}{cmd:mgof} may be used with {cmd:pweight}s, in which case the correction
outlined in the Methods and Formulas section is applied. Example:

        {com}. gen w = exp(invnorm(uniform()))
        {txt}
        {com}. mgof x [pw = w]
        {res}
                                          {txt}Number of obs ={res}      20
                                          {txt}N of outcomes ={res}       5
                                          {txt}F df1         ={res} 3.46153
                                          {txt}F df2         ={res}  65.769

        {txt}{hline 22}{c TT}{hline 34}
              Goodness-of-fit {c |}       Coef.    F-value    P-value
        {hline 22}{c +}{hline 34}
                 Pearson's X2 {c |}   {res} 6.867921     1.1741     0.3287
         {txt}Log likelihood ratio {c |}   {res} 7.946182     1.3584     0.2611
        {txt}{hline 22}{c BT}{hline 34}{txt}

{pstd}More generally, use the {cmd:svy} option to take account of the complex survey
design set by {helpb svyset}:

        {com}. svyset [pw = w]

              {txt}pweight:{col 16}{res}w
                  {txt}VCE:{col 16}{res}linearized
          {txt}Single unit:{col 16}{res}missing
             {txt}Strata 1:{col 16}<one>
                 SU 1:{col 16}<observations>
                FPC 1:{col 16}<zero>
        {p2colreset}{...}

        {com}. mgof x, svy
        {res}
        {txt}Number of strata ={res}       1        {txt}Number of obs ={res}      20
        {txt}Number of PSUs   ={res}      20        {txt}Pop size      ={res}  35.039
                                          {txt}Design df     ={res}      19
                                          {txt}N of outcomes ={res}       5
                                          {txt}F df1         ={res} 3.46153
                                          {txt}F df2         ={res}  65.769

        {txt}{hline 22}{c TT}{hline 34}
              Goodness-of-fit {c |}       Coef.    F-value    P-value
        {hline 22}{c +}{hline 34}
                 Pearson's X2 {c |}   {res} 6.867921     1.1741     0.3287
         {txt}Log likelihood ratio {c |}   {res} 7.946182     1.3584     0.2611
        {txt}{hline 22}{c BT}{hline 34}
        {txt}

{marker ex6}{dlgtab:Test against non-uniform distribution}

        {com}. recode x (1=0.1) (2=0.2) (3=0.4) (4=0.2) (5=0.1), generate(p)
        {txt}(20 differences between x and p)

        {com}. mgof x = p, freq ee nodots
        {res}
                               {txt}Number of obs ={res}      20
                               {txt}N of outcomes ={res}       5
                               {txt}Compositions  ={res}   10626

        {txt}{hline 22}{c TT}{hline 23}
                              {c |}                  Exact
              Goodness-of-fit {c |}       Coef.    P-value
        {hline 22}{c +}{hline 23}
                 Pearson's X2 {c |}   {res}    8.375     0.0762
         {txt}Log likelihood ratio {c |}   {res} 8.170615     0.1115
        {txt}{hline 22}{c BT}{hline 23}

        {space 0}{space 0}{ralign 12:x}{space 1}{c |}{space 1}{ralign 9:observed}{space 1}{space 1}{ralign 9:expected}{space 1}
        {space 0}{hline 13}{c   +}{hline 11}{hline 11}
        {space 0}{space 0}{ralign 12:1}{space 1}{c |}{space 1}{ralign 9:{res:{sf:        3}}}{space 1}{space 1}{ralign 9:{res:{sf:        2}}}{space 1}
        {space 0}{space 0}{ralign 12:2}{space 1}{c |}{space 1}{ralign 9:{res:{sf:        1}}}{space 1}{space 1}{ralign 9:{res:{sf:        4}}}{space 1}
        {space 0}{space 0}{ralign 12:3}{space 1}{c |}{space 1}{ralign 9:{res:{sf:        9}}}{space 1}{space 1}{ralign 9:{res:{sf:        8}}}{space 1}
        {space 0}{space 0}{ralign 12:4}{space 1}{c |}{space 1}{ralign 9:{res:{sf:        2}}}{space 1}{space 1}{ralign 9:{res:{sf:        4}}}{space 1}
        {space 0}{space 0}{ralign 12:5}{space 1}{c |}{space 1}{ralign 9:{res:{sf:        5}}}{space 1}{space 1}{ralign 9:{res:{sf:        2}}}{space 1}
        {space 0}{hline 13}{c   +}{hline 11}{hline 11}
        {space 0}{space 0}{ralign 12:Total}{space 1}{c |}{space 1}{ralign 9:{res:{sf:       20}}}{space 1}{space 1}{ralign 9:{res:{sf:       20}}}{space 1}
        {txt}

{marker ex7}{dlgtab:Using the immediate command or the the matrix() option}

        {com}. mgofi 3 1 9 2 5 / 2 4 8 4 2, ee nodots
        {res}
                               {txt}Number of obs ={res}      20
                               {txt}N of outcomes ={res}       5
                               {txt}Compositions  ={res}   10626

        {txt}{hline 22}{c TT}{hline 23}
                              {c |}                  Exact
              Goodness-of-fit {c |}       Coef.    P-value
        {hline 22}{c +}{hline 23}
                 Pearson's X2 {c |}   {res}    8.375     0.0762
         {txt}Log likelihood ratio {c |}   {res} 8.170615     0.1115
        {txt}{hline 22}{c BT}{hline 23}

        {com}. matrix A = (3\1\9\2\5),(2\4\8\4\2)
        {txt}
        {com}. mgof, matrix(A) ee nodots
        {res}
                               {txt}Number of obs ={res}      20
                               {txt}N of outcomes ={res}       5
                               {txt}Compositions  ={res}   10626

        {txt}{hline 22}{c TT}{hline 23}
                              {c |}                  Exact
              Goodness-of-fit {c |}       Coef.    P-value
        {hline 22}{c +}{hline 23}
                 Pearson's X2 {c |}   {res}    8.375     0.0762
         {txt}Log likelihood ratio {c |}   {res} 8.170615     0.1115
        {txt}{hline 22}{c BT}{hline 23}
        {txt}

{marker ex8}{dlgtab:Empty cells}

{pstd}Empty cells can be introduced by adding observations with 0 weights
(or by using {cmd:mgofi} or the {cmd:matrix()} option):

        {com}. gen w = 1
        {txt}
        {com}. set obs 21
        {txt}obs was 20, now 21

        {com}. replace x = 6 in 21
        {txt}(1 real change made)

        {com}. replace w = 0 in 21
        {txt}(1 real change made)

        {com}. mgof x [fw = w], freq ee nodots
        {res}
                               {txt}Number of obs ={res}      20
                               {txt}N of outcomes ={res}       6
                               {txt}Partitions    ={res}     282

        {txt}{hline 22}{c TT}{hline 23}
                              {c |}                  Exact
              Goodness-of-fit {c |}       Coef.    P-value
        {hline 22}{c +}{hline 23}
                 Pearson's X2 {c |}   {res}       16     0.0071
         {txt}Log likelihood ratio {c |}   {res} 16.84977     0.0083
        {txt}{hline 22}{c BT}{hline 23}

        {space 0}{space 0}{ralign 12:x}{space 1}{c |}{space 1}{ralign 9:observed}{space 1}{space 1}{ralign 9:expected}{space 1}
        {space 0}{hline 13}{c   +}{hline 11}{hline 11}
        {space 0}{space 0}{ralign 12:1}{space 1}{c |}{space 1}{ralign 9:{res:{sf:        3}}}{space 1}{space 1}{ralign 9:{res:{sf: 3.333333}}}{space 1}
        {space 0}{space 0}{ralign 12:2}{space 1}{c |}{space 1}{ralign 9:{res:{sf:        1}}}{space 1}{space 1}{ralign 9:{res:{sf: 3.333333}}}{space 1}
        {space 0}{space 0}{ralign 12:3}{space 1}{c |}{space 1}{ralign 9:{res:{sf:        9}}}{space 1}{space 1}{ralign 9:{res:{sf: 3.333333}}}{space 1}
        {space 0}{space 0}{ralign 12:4}{space 1}{c |}{space 1}{ralign 9:{res:{sf:        2}}}{space 1}{space 1}{ralign 9:{res:{sf: 3.333333}}}{space 1}
        {space 0}{space 0}{ralign 12:5}{space 1}{c |}{space 1}{ralign 9:{res:{sf:        5}}}{space 1}{space 1}{ralign 9:{res:{sf: 3.333333}}}{space 1}
        {space 0}{space 0}{ralign 12:6}{space 1}{c |}{space 1}{ralign 9:{res:{sf:        0}}}{space 1}{space 1}{ralign 9:{res:{sf: 3.333333}}}{space 1}
        {space 0}{hline 13}{c   +}{hline 11}{hline 11}
        {space 0}{space 0}{ralign 12:Total}{space 1}{c |}{space 1}{ralign 9:{res:{sf:       20}}}{space 1}{space 1}{ralign 9:{res:{sf:       20}}}{space 1}
        {txt}

{title:Returned results}

{pstd}Depending on options, {cmd:mgof} saves the following in {cmd:r()}:

{pstd}Scalars{p_end}
{p2colset 7 23 24 2}{...}
{p2col : {cmd:r(N)}}             number of observations{p_end}
{p2col : {cmd:r(N_pop)}}         population size{p_end}
{p2col : {cmd:r(N_strata)}}      number of strata{p_end}
{p2col : {cmd:r(N_psu)}}         number of PSUs{p_end}
{p2col : {cmd:r(N_clust)}}       number of clusters{p_end}
{p2col : {cmd:r(df_r)}}          design degrees of freedom{p_end}
{p2col : {cmd:r(df)}}            degrees of freedom for the chi2 statistics{p_end}
{p2col : {cmd:r(df1)}}           numerator d.f. for the F statistics{p_end}
{p2col : {cmd:r(df2)}}           denominator d.f. for the F statistics{p_end}
{p2col : {cmd:r(delta)}}         mean generalized design effect{p_end}
{p2col : {cmd:r(a2)}}            squared variation coefficient of generalized design effects{p_end}
{p2col : {cmd:r(reps)}}          number of replications{p_end}
{p2col : {cmd:r(partitions)}}    number of partitions{p_end}
{p2col : {cmd:r(compositions)}}  number of compositions{p_end}
{p2col : {cmd:r(}{it:stat}{cmd:)}}      value of test statistic{p_end}
{p2col : {cmd:r(F_}{it:stat}{cmd:)}}      value of the corrected F statistic{p_end}
{p2col : {cmd:r(p_}{it:stat}{cmd:)}}    p-value of {cmd:r(}{it:stat}{cmd:)} or {cmd:r(F_}{it:stat}{cmd:)} {p_end}
{p2col : {cmd:r(p_}{it:stat}{cmd:_srs)}}    uncorrected p-value of {cmd:r(}{it:stat}{cmd:)}{p_end}
{p2col : {cmd:r(p_}{it:stat}{cmd:_lb)}} lower C.I. bound for {cmd:r(p_}{it:stat}{cmd:)}{p_end}
{p2col : {cmd:r(p_}{it:stat}{cmd:_ub)}} upper C.I. bound for {cmd:r(p_}{it:stat}{cmd:)}{p_end}

{p 23 23 2}where {it:stat} is:{p_end}
{p 23 23 2}{cmd:x2} (Pearson's X2){p_end}
{p 23 23 2}{cmd:lr} (log likelihood ratio){p_end}
{p 23 23 2}{cmd:cr} (Cressie-Read statistic){p_end}
{p 23 23 2}{cmd:mlnp} (minus log outcome probability){p_end}
{p 23 23 2}{cmd:ksmirnov} (Kolmogorov-Smirnov D){p_end}

{pstd} Macros{p_end}
{p2col : {cmd:r(depvar)}} name of tabulated variable{p_end}
{p2col : {cmd:r(h0)}} definition of the theoretical distribution; either
{cmd:"= }{it:exp}{cmd:"} or {cmd:"= 1/}{it:k}{cmd:"}, depending on whether
{it:exp} was specified, where {it:k} is the number of categories{p_end}
{p2col : {cmd:r(method)}} method used to compute the p-values{p_end}
{p2col : {cmd:r(stats)}} list of tested statistics{p_end}
{p2col : {cmd:r(lambda)}} lambda parameter of the Cressie-Read statistic{p_end}
{p2col : {cmd:r(citype)}} Monte Carlo confidence interval type {p_end}
{p2col : {cmd:r(cilevel)}} Monte Carlo confidence level{p_end}

{pstd} Matrices{p_end}
{p2col : {cmd:r(count)}} observed counts and expected counts{p_end}


{title:References}

{phang} Conover, W. J. (1972). A Kolmogorov Goodness-of-Fit
  Test for Discontinuous Distributions. Journal of the
  American Statistical Association 67: 591-596.

{phang}Cressie, N., T. R. C. Read (1984). Multinomial Goodness-of-Fit Tests. Journal
  of the Royal Statistical Society (Series B) 46: 440-464.

{phang}Cressie, N., T. R. C. Read (1989). Pearson's X^2 and
  the Loglikelihood Ratio Statistic G^2: A Comparative Review. International
  Statistical Review 57: 19-43.

{phang}Hirji, K. F. (1997). A Comparison of Algorithms for Exact
Goodness-of-Fit Tests for Multinomial Data. Communications in
Statistics-Simulation and Computation 26: 1197-1227.

{phang}Horn, S. D. (1977). Goodness-of-Fit Tests for Discrete Data: A
  Review and an Application to a Health Impairment Scale. Biometrics 33: 237-247.

{phang}Jann, B. (2008). Multinomial goodness-of-fit: large sample tests
with survey design correction and exact tests for small samples. The Stata
Journal 8(2): 147-169. [Working paper version available from
{browse "http://ideas.repec.org/p/ets/wpaper/2.html"}.]

{phang}Rao, J. N. K., A. J. Scott (1981). The Analysis of Categorical
Data From Complex Sample Surveys: Chi-Squared Tests for Goodness of Fit and
Independence in Two-Way Tables. Journal of the American Statistical
Association 76: 221-230.
{p_end}

{phang}Sokal, R. R., F. J. Rohlf (1995). Biometry. The Principles and
Practice of Statistics in Biological Research. Third Edition. New
York: W. H. Freeman and Company.

{phang}Weesie, J. (1997). sg68: Goodness-of-fit
statistics for multinomial distributions. Stata Technical
Bulletin Reprints 6: 183-186.


{title:Author}

{pstd} Ben Jann, University of Bern, ben.jann@soz.unibe.ch

{pstd} Thanks for citing {cmd:mgof} as follows:

{pmore}
Jann, B. (2008). Multinomial goodness-of-fit: large sample tests
with survey design correction and exact tests for small
samples. {browse "http://doi.org/10.1177%2F1536867X0800800201":The Stata Journal 8(2): 147-169}.

{pstd} or

{pmore}
Jann, B. (2007). mgof: Stata module to perform goodness-of-fit tests for
multinomial data. Available from
{browse "http://ideas.repec.org/c/boc/bocode/s456854.html"}.


{title:Also see}

{psee} Online:  {helpb ksmirnov}, {helpb matrix}, {helpb "svy: tabulate twoway"},
{helpb proportion}, {helpb mf_mm_mgof:mm_mgof()}, {helpb moremata}

{psee} User packages:{p_end}

             {bf:{net "describe tab_chi, from(http://fmwww.bc.edu/RePEc/bocode/t)":tabchi}}  by Nicholas J. Cox
             {bf:{net "describe multgof, from(http://www.fss.uu.nl/soc/iscore/stata)":multgof}} by Jeroen Weesie
             {bf:{net "describe chi2fit, from(http://web.missouri.edu/~kolenikovs/stata)":chi2fit}} by Stas Kolenikov
             {bf:{net "describe csgof, from(http://www.ats.ucla.edu/stat/stata/ado/analysis)":csgof}}   by Michael N. Mitchell