help ghk2()
-------------------------------------------------------------------------------

Title

    ghk2() -- Geweke-Hajivassiliou-Keane (GHK) multivariate normal simulator
                       using pre-generated points


Syntax

                   P = ghk2setup(real scalar n, real scalar m, real scalar d,
                         string scalar type,
                         | real scalar pi, pointer (real colvector function)
                         pfn)

        real colvector ghk2(P, real matrix X, real matrix V, real scalar
                         anti, real scalar start)

        real colvector ghk2(P, real matrix Xl, real matrix Xu, real matrix V,
                         real scalar anti, real scalar start)

        real colvector ghk2(P, real matrix X, real matrix V, real scalar
                         anti, real scalar start, real matrix dfdx, real
                         matrix dfdv)

        real colvector ghk2(P, real matrix Xl, real matrix Xu, real matrix V,
                         real scalar anti, real scalar start, real matrix
                         dfdxl, real matrix dfdxu, real matrix dfdv)

        real colvector ghk2SqrtScrambler(real scalar p)

        real colvector ghk2NegSqrtScrambler(real scalar p)

    where P, if it is declared, should be declared

                transmorphic P
                
    where pfn, if it is passed, should point to a Mata function declared like
    ghk2SqrtScrambler() or ghk2NegSqrtScrambler(),

    and where dfdx, dfdxl, dfdxu, and dfdv are outputs

                real matrix dfdx
                real matrix dfdxl
                real matrix dfdxu
                real matrix dfdv
 

Input parameters

        n            Number of observations for which to prepare draws
        m            Draws per observation and simulated dimension
        d            Dimension of cumulative integrals for which to be
        prepared to simulate
        type         Sequence type
        pi           Optional starting index of prime bases for Halton
        sequences (1->2, 2->3, 3->5, 4->7...) (default=1)
        pfn          Optional pointer to scrambling function such as
        ghk2SqrtScrambler() or ghk2NegSqrtScrambler()
        P            Draws prepared by ghk2setup()
        X            Upper bounds of integration
        Xl, Xu       Lower and upper bounds of integration
        V            Covariance matrix
        anti         Optional dummy for inclusion of antithetics (default=0)
        start        Starting point to use in block of draws prepared by
        ghk2setup()
        p            Number, normally prime, for which the vector (0, 1, ...,
        p-1)' should be scrambled
                
Output parameters

        dfdx         Scores with respect to X
        dfdxl, dfdxu Scores with respect to Xl, Xu
        dfdv         Scores with respect to V, stored as vectorized
        lower-triangular matrices


Description

    ghk2() implements the Geweke-Hajivassiliou-Keane algorithm for simulating
    the cumulative multivariate normal distribution, optionally computing
    scores, and optionally accepting lower as well as upper bounds. (See
    Cappellari and Jenkins 2003, 2005; Gates 2006.) It is modeled on
    ghkfast(), which is included in Stata 10, and which see for more
    explanation. Like ghkfast(), its first argument is a pre-generated set of
    points on the unit interval, in this case produced by ghk2setup(), which
    has the same syntax and semantics as ghkfastsetup(). The two commands'
    point sets are not interchangeable. ghk2() differs from ghkfast() in the
    following ways:

      * It accepts lower as well as upper bounds for integration (second and
        fourth syntaxes above). This allows efficient estimation of
        probabilities over bounded rectilinear regions such as {(x1, x2) |
        l1<=x1<=u1, l2<=x2<=u2}. Without this feature, the routine would need
        to be called 2^d times, where d is the dimension of distribution. For
        example, the probability just mentioned would have to be computed as
        Phi(u1, u2) - Phi(l1, u2) - Phi(u1, l2) + Phi(l1, l2), where Phi is a
        bivariate cumulative normal distribution with some given covariance
        structure. Individual entries in the lower and upper bounds, Xl and
        Xu, may be missing ("."), and are interpreted as -infinity and
        +infinity, respectively. The fourth syntax differs from the second in
        requesting score matrices for upper and lower bounds, as well as for
        the covariance matrix V.

      * ghk2() does not "pivot" the bounds of integration (in X, Xl, or Xu).
        On the recommendation of Genz (1992), ghk() and ghkfast() reorder
        each vector of bounds to put the larger entries toward the end, which
        reduces the variability of the simulated probability. However,
        pivoting has the disadvantage of creating discontinuities in results.
        Small changes in the bounds can produce relatively large ones in the
        results when they trigger reorderings of the pivoted vector.
        Especially when the number of draws is low, these discontinuities can
        stymie a search by ml. Thus ghk2() behaves very smoothly even at low
        draw counts, at the expense of more variability. (As of Stata 10.1,
        ghk() and ghkfast() also allow pivoting to be turned off.)

      * ghk2() works in Stata 9. Stata 9 does ship with ghk(), but this does
        not use pre-generated points, and so is slower.

      * ghk2() is generally faster than ghkfast(), at least in
        single-processor versions of Stata. It is optimized for contexts with
        a large number of observations relative to draws. In extreme cases,
        such as 10,000 observations and 10 draws, it can perform an order of
        magnitude faster. But at the opposite extreme, with, say, 100
        observations and 1,000 draws, it can run half as fast.

      * ghk2() accepts an optional scrambling function. Halton sequences
        based on large primes can have decidedly non-uniform coverage of the
        unit hypercube (Drukker and Gates 2006). "Scrambling" the sequences
        can increase uniformity (Kolenikov 2012).

      * The start argument allows the user to shift the starting observation
        within the pre-computed block of draws. E.g., if the pre-computed
        block of draws is for 200 observation rows, when calling ghk2() with
        a data matrix that has only 100 rows the start argument would allow
        rows 101-200 of the draws to be used rather than the usual 1-100.

      * The anti argument, specifying whether to include antithetical draws,
        is required. Any non-zero value is interpreted as requesting them.

      * It does not take a rank argument.  (ghk() and ghkfast() lost it in
        Stata 10.1 as well.)

Remarks

    The type argument may be "halton", "hammersley", "random", or "ghalton".
    "Random" and generalized Halton sequences are influenced by the state of
    the random number generator just before ghk2setup() is called. See [M-5]
    uniform().

    The ghk2() routine performs error checking and then calls one of four
    additional routines, whose syntaxes correspond to the four listed above:
    _ghk2(), _ghk2_2(), _ghk2_d, and _ghk2_2d. You can call these routines
    directly for a slight speed gain.

    ghk2SqrtScrambler(p) scrambles the modulo-p numbers u=(0, 1, ... p-1}'
    with the formula mod(u * floor(sqrt(p)), p). ghk2NegSqrtScrambler(p) uses
    mod(u * ( -floor(sqrt(p))), p). The user may provide alternative
    functions with the same type of arguments and output; see Kolenikov
    (2012) for ideas.

Examples (colored text is clickable)

    * ghk() and ghkfast() syntax changed in Stata 10.1, but these examples
        are not updated yet.
    . version 9.0

    * Exact matches, using Halton sequence
    p = ghkfastsetup(10000, 5, 3, "halton")
    p2 = ghk2setup(10000, 5, 3, "halton")
    V = 1, .5, .4 \ .5, 1, .3 \ .4, .3, 1
    rank = dfdx = dfdv = .
    anti = 0
    start = .

    * Exact matches, using Halton sequence
    ghk((1,2,3), V, (1,5), rank)
    ghkfast(p, (1,2,3), V, rank)
    ghk2(p2, (1,2,3), V, anti, start)

    * Inexact matches because ghk() and ghkfast() pivot the data vector,
        ordering from low to high
    ghk((3,2,1), V, (1,5), rank)
    ghkfast(p, (3,2,1), V, rank)
    ghk2(p2, (3,2,1), V, anti, start)

    * Timing comparisons for many observations, few draws, with and without
        score computation
    X = J(10000,3,1)
    timer_clear()
    timer_on(1); mean(ghkfast(p, X, V, rank, ., anti)); timer_off(1)
    timer_on(2); mean(ghk2(p2, X, V, anti, start)); timer_off(2)
    timer()

    timer_clear()
    timer_on(1); mean(ghkfast(p, X, V, rank, ., anti, dfdx, dfdv));
        timer_off(1)
    timer_on(2); mean(ghk2(p2, X, V, anti, start, dfdx, dfdv)); timer_off(2)
    timer()

    * Timing comparisons for fewer observations, many draws, including
        antithetical draws
    anti = 1
    p = ghkfastsetup(1000, 250, 3, "halton")
    p2 = ghk2setup(1000, 250, 3, "halton")
    X = J(1000,3,1)
    timer_clear()
    timer_on(1); mean(ghkfast(p, X, V, rank, ., anti)); timer_off(1)
    timer_on(2); mean(ghk2(p2, X, V, anti, start)); timer_off(2)
    timer()

    timer_clear()
    timer_on(1); mean(ghkfast(p, X, V, rank, ., anti, dfdx, dfdv));
        timer_off(1)
    timer_on(2); mean(ghk2(p2, X, V, anti, start, dfdx, dfdv)); timer_off(2)
    timer()

    * Demonstration of using lower and upper bounds. The two versions agree
        asymptotically in the number of draws.
    * The first is 8 times faster than the last.
    l1=l2=l3=0; u1=1; u2=2; u3=3
    ghk2(p2, (l1,l2,l3), (u1,u2,u3), V, anti, start)
    ghk2(p2,(u1,u2,u3),V,1,.)-ghk2(p2,(l1,u2,u3),V,1,.)-ghk2(p2,(u1,l2,u3),V,
        > 1,.)-ghk2(p2,(u1,u2,l3),V,1,.)+ghk2(p2,(l1,l2,u3),V,1,.)+ghk2(p2,(u
        > 1,l2,l3),V,1,.)+ghk2(p2,(l1,u2,l3),V,1,.)-ghk2(p2,(l1,l2,l3),V,1,.)

    * Demonstration of scrambling. Square-root scrambling doesn't affect
        primes 2 and 3; negative square-root doesn't affect 2.
    (0::1), ghk2SqrtScrambler(2), ghk2NegSqrtScrambler(2)
    (0::2), ghk2SqrtScrambler(3), ghk2NegSqrtScrambler(3)
    (0::4), ghk2SqrtScrambler(5), ghk2NegSqrtScrambler(5)

    * Examples of scrambling in ghk2(): 4-dimensional problem uses primes 2,
        3, 5.
    V = 1, .5, .5, .5 \ .5, 1, .5, .5 \ .5, .5, 1, .5 \ .5, .5, .5, 1
    p2 = ghk2setup(1, 5, 4, "halton", 1)
    ghk2(p2, (1,1,1,1), V, anti, start)
    p2 = ghk2setup(1, 5, 4, "halton", 1, &ghk2SqrtScrambler())
    ghk2(p2, (1,1,1,1), V, anti, start)
    p2 = ghk2setup(1, 5, 4, "halton", 1, &ghk2NegSqrtScrambler())
    ghk2(p2, (1,1,1,1), V, anti, start)



Conformability

    ghk2setup(n, m, d, type, | pi):
                n:  1 x 1 
                m:  1 x 1
                d:  1 x 1 
             type:  1 x 1 
               pi:  1 x 1 
           result:  transmorphic

    ghk2(P, X, V, anti, start):
        input:
                P:  transmorphic
                X:  n x d 
                V:  d x d (symmetric, positive definite)
             anti:  1 x 1
            start:  1 x 1
        output:
           result:  n x 1 

    ghk2(P, Xl, Xu, V, anti, start):
        input:
                P:  transmorphic
               Xl:  n x d 
               Xu:  n x d 
                V:  d x d (symmetric, positive definite)
             anti:  1 x 1
            start:  1 x 1
        output:
           result:  n x 1 

    ghk2(P, X, V, anti, start, dfdx, dfdv):
        input:
                P:  transmorphic
                X:  n x d
                V:  d x d (symmetric, positive definite)
             anti:  1 x 1
            start:  1 x 1
        output:
           result:  n x 1
             dfdx:  n x d
             dfdv:  n x d(d+1)/2

    ghk2(P, Xl, Xu, V, anti, start, dfdxl, dfdxu, dfdv):
        input:
                P:  transmorphic
               Xl:  n x d
               Xu:  n x d
                V:  d x d (symmetric, positive definite)
             anti:  1 x 1
            start:  1 x 1
        output:
           result:  n x 1
            dfdxl:  n x d
            dfdxu:  n x d
             dfdv:  n x d(d+1)/2

    ghk2SqrtScrambler(p):
        input:
                p:  1 x 1
        output:
           result:  n x 1

    ghk2NegSqrtScrambler(p):
        input:
                p:  1 x 1
        output:
           result:  n x 1

Source code

    ghk2.mata


References

    Cappellari, L., and S. Jenkins. 2003. Multivariate probit regression
        using simulated maximum likelihood.  Stata Journal 3(3): 278-94.
    Cappellari, L., and S. Jenkins. 2006. Calculation of multivariate normal
        probabilities by simulation, with applications to maximum simulated
        likelihood estimation. Stata Journal 6(2): 156-89.
    Drukker, D., and R. Gates. 2006. Generating Halton sequences using Mata.
        Stata Journal 6(2): 214-28.
    Kolenikov, S. 2012. Scrambled Halton sequences in Mata. Stata Journal
        12(1): 29-44.
    Gates, R. 2006. A Mata Geweke-Hajivassiliou-Keane multivariate normal
        simulator. Stata Journal 6(2): 190-213.
    Genz, A. 1992. Numerical computation of multivariate normal
        probabilities. Journal of Computational and Graphical Statistics 1:
        141–149.

Author

    David Roodman
    Senior Fellow
    Center for Global Development
    Washington, DC
    droodman@cgdev.org


Also see

    Online:  [M-5] ghk(), [M-5] ghkfast(), [M-5] halton()