help ghk2()
-------------------------------------------------------------------------------

Title

ghk2() -- Geweke-Hajivassiliou-Keane (GHK) multivariate normal simulator using pre-generated points

Syntax

P = ghk2setup(real scalar n, real scalar m, real scalar d, string scalar type, | real scalar pi, pointer (real colvector function) pfn)

real colvector ghk2(P, real matrix X, real matrix V, real scalar anti, real scalar start)

real colvector ghk2(P, real matrix Xl, real matrix Xu, real matrix V, real scalar anti, real scalar start)

real colvector ghk2(P, real matrix X, real matrix V, real scalar anti, real scalar start, real matrix dfdx, real matrix dfdv)

real colvector ghk2(P, real matrix Xl, real matrix Xu, real matrix V, real scalar anti, real scalar start, real matrix dfdxl, real matrix dfdxu, real matrix dfdv)

real colvector ghk2SqrtScrambler(real scalar p)

real colvector ghk2NegSqrtScrambler(real scalar p)

where P, if it is declared, should be declared

transmorphic P where pfn, if it is passed, should point to a Mata function declared like ghk2SqrtScrambler() or ghk2NegSqrtScrambler(),

and where dfdx, dfdxl, dfdxu, and dfdv are outputs

real matrix dfdx real matrix dfdxl real matrix dfdxu real matrix dfdv

Input parameters

n Number of observations for which to prepare draws m Draws per observation and simulated dimension d Dimension of cumulative integrals for which to be prepared to simulate type Sequence type pi Optional starting index of prime bases for Halton sequences (1->2, 2->3, 3->5, 4->7...) (default=1) pfn Optional pointer to scrambling function such as ghk2SqrtScrambler() or ghk2NegSqrtScrambler() P Draws prepared by ghk2setup() X Upper bounds of integration Xl, Xu Lower and upper bounds of integration V Covariance matrix anti Optional dummy for inclusion of antithetics (default=0) start Starting point to use in block of draws prepared by ghk2setup() p Number, normally prime, for which the vector (0, 1, ..., p-1)' should be scrambled Output parameters

dfdx Scores with respect to X dfdxl, dfdxu Scores with respect to Xl, Xu dfdv Scores with respect to V, stored as vectorized lower-triangular matrices

Description

ghk2() implements the Geweke-Hajivassiliou-Keane algorithm for simulating the cumulative multivariate normal distribution, optionally computing scores, and optionally accepting lower as well as upper bounds. (See Cappellari and Jenkins 2003, 2005; Gates 2006.) It is modeled on ghkfast(), which is included in Stata 10, and which see for more explanation. Like ghkfast(), its first argument is a pre-generated set of points on the unit interval, in this case produced by ghk2setup(), which has the same syntax and semantics as ghkfastsetup(). The two commands' point sets are not interchangeable. ghk2() differs from ghkfast() in the following ways:

* It accepts lower as well as upper bounds for integration (second and fourth syntaxes above). This allows efficient estimation of probabilities over bounded rectilinear regions such as {(x1, x2) | l1<=x1<=u1, l2<=x2<=u2}. Without this feature, the routine would need to be called 2^d times, where d is the dimension of distribution. For example, the probability just mentioned would have to be computed as Phi(u1, u2) - Phi(l1, u2) - Phi(u1, l2) + Phi(l1, l2), where Phi is a bivariate cumulative normal distribution with some given covariance structure. Individual entries in the lower and upper bounds, Xl and Xu, may be missing ("."), and are interpreted as -infinity and +infinity, respectively. The fourth syntax differs from the second in requesting score matrices for upper and lower bounds, as well as for the covariance matrix V.

* ghk2() does not "pivot" the bounds of integration (in X, Xl, or Xu). On the recommendation of Genz (1992), ghk() and ghkfast() reorder each vector of bounds to put the larger entries toward the end, which reduces the variability of the simulated probability. However, pivoting has the disadvantage of creating discontinuities in results. Small changes in the bounds can produce relatively large ones in the results when they trigger reorderings of the pivoted vector. Especially when the number of draws is low, these discontinuities can stymie a search by ml. Thus ghk2() behaves very smoothly even at low draw counts, at the expense of more variability. (As of Stata 10.1, ghk() and ghkfast() also allow pivoting to be turned off.)

* ghk2() works in Stata 9. Stata 9 does ship with ghk(), but this does not use pre-generated points, and so is slower.

* ghk2() is generally faster than ghkfast(), at least in single-processor versions of Stata. It is optimized for contexts with a large number of observations relative to draws. In extreme cases, such as 10,000 observations and 10 draws, it can perform an order of magnitude faster. But at the opposite extreme, with, say, 100 observations and 1,000 draws, it can run half as fast.

* ghk2() accepts an optional scrambling function. Halton sequences based on large primes can have decidedly non-uniform coverage of the unit hypercube (Drukker and Gates 2006). "Scrambling" the sequences can increase uniformity (Kolenikov 2012).

* The start argument allows the user to shift the starting observation within the pre-computed block of draws. E.g., if the pre-computed block of draws is for 200 observation rows, when calling ghk2() with a data matrix that has only 100 rows the start argument would allow rows 101-200 of the draws to be used rather than the usual 1-100.

* The anti argument, specifying whether to include antithetical draws, is required. Any non-zero value is interpreted as requesting them.

* It does not take a rank argument. (ghk() and ghkfast() lost it in Stata 10.1 as well.)

Remarks

The type argument may be "halton", "hammersley", "random", or "ghalton". "Random" and generalized Halton sequences are influenced by the state of the random number generator just before ghk2setup() is called. See [M-5] uniform().

The ghk2() routine performs error checking and then calls one of four additional routines, whose syntaxes correspond to the four listed above: _ghk2(), _ghk2_2(), _ghk2_d, and _ghk2_2d. You can call these routines directly for a slight speed gain.

ghk2SqrtScrambler(p) scrambles the modulo-p numbers u=(0, 1, ... p-1}' with the formula mod(u * floor(sqrt(p)), p). ghk2NegSqrtScrambler(p) uses mod(u * ( -floor(sqrt(p))), p). The user may provide alternative functions with the same type of arguments and output; see Kolenikov (2012) for ideas.

Examples (colored text is clickable)

* ghk() and ghkfast() syntax changed in Stata 10.1, but these examples are not updated yet. . version 9.0

* Exact matches, using Halton sequence p = ghkfastsetup(10000, 5, 3, "halton") p2 = ghk2setup(10000, 5, 3, "halton") V = 1, .5, .4 \ .5, 1, .3 \ .4, .3, 1 rank = dfdx = dfdv = . anti = 0 start = .

* Exact matches, using Halton sequence ghk((1,2,3), V, (1,5), rank) ghkfast(p, (1,2,3), V, rank) ghk2(p2, (1,2,3), V, anti, start)

* Inexact matches because ghk() and ghkfast() pivot the data vector, ordering from low to high ghk((3,2,1), V, (1,5), rank) ghkfast(p, (3,2,1), V, rank) ghk2(p2, (3,2,1), V, anti, start)

* Timing comparisons for many observations, few draws, with and without score computation X = J(10000,3,1) timer_clear() timer_on(1); mean(ghkfast(p, X, V, rank, ., anti)); timer_off(1) timer_on(2); mean(ghk2(p2, X, V, anti, start)); timer_off(2) timer()

timer_clear() timer_on(1); mean(ghkfast(p, X, V, rank, ., anti, dfdx, dfdv)); timer_off(1) timer_on(2); mean(ghk2(p2, X, V, anti, start, dfdx, dfdv)); timer_off(2) timer()

* Timing comparisons for fewer observations, many draws, including antithetical draws anti = 1 p = ghkfastsetup(1000, 250, 3, "halton") p2 = ghk2setup(1000, 250, 3, "halton") X = J(1000,3,1) timer_clear() timer_on(1); mean(ghkfast(p, X, V, rank, ., anti)); timer_off(1) timer_on(2); mean(ghk2(p2, X, V, anti, start)); timer_off(2) timer()

timer_clear() timer_on(1); mean(ghkfast(p, X, V, rank, ., anti, dfdx, dfdv)); timer_off(1) timer_on(2); mean(ghk2(p2, X, V, anti, start, dfdx, dfdv)); timer_off(2) timer()

* Demonstration of using lower and upper bounds. The two versions agree asymptotically in the number of draws. * The first is 8 times faster than the last. l1=l2=l3=0; u1=1; u2=2; u3=3 ghk2(p2, (l1,l2,l3), (u1,u2,u3), V, anti, start) ghk2(p2,(u1,u2,u3),V,1,.)-ghk2(p2,(l1,u2,u3),V,1,.)-ghk2(p2,(u1,l2,u3),V, > 1,.)-ghk2(p2,(u1,u2,l3),V,1,.)+ghk2(p2,(l1,l2,u3),V,1,.)+ghk2(p2,(u > 1,l2,l3),V,1,.)+ghk2(p2,(l1,u2,l3),V,1,.)-ghk2(p2,(l1,l2,l3),V,1,.)

* Demonstration of scrambling. Square-root scrambling doesn't affect primes 2 and 3; negative square-root doesn't affect 2. (0::1), ghk2SqrtScrambler(2), ghk2NegSqrtScrambler(2) (0::2), ghk2SqrtScrambler(3), ghk2NegSqrtScrambler(3) (0::4), ghk2SqrtScrambler(5), ghk2NegSqrtScrambler(5)

* Examples of scrambling in ghk2(): 4-dimensional problem uses primes 2, 3, 5. V = 1, .5, .5, .5 \ .5, 1, .5, .5 \ .5, .5, 1, .5 \ .5, .5, .5, 1 p2 = ghk2setup(1, 5, 4, "halton", 1) ghk2(p2, (1,1,1,1), V, anti, start) p2 = ghk2setup(1, 5, 4, "halton", 1, &ghk2SqrtScrambler()) ghk2(p2, (1,1,1,1), V, anti, start) p2 = ghk2setup(1, 5, 4, "halton", 1, &ghk2NegSqrtScrambler()) ghk2(p2, (1,1,1,1), V, anti, start)

Conformability

ghk2setup(n, m, d, type, | pi): n: 1 x 1 m: 1 x 1 d: 1 x 1 type: 1 x 1 pi: 1 x 1 result: transmorphic

ghk2(P, X, V, anti, start): input: P: transmorphic X: n x d V: d x d (symmetric, positive definite) anti: 1 x 1 start: 1 x 1 output: result: n x 1

ghk2(P, Xl, Xu, V, anti, start): input: P: transmorphic Xl: n x d Xu: n x d V: d x d (symmetric, positive definite) anti: 1 x 1 start: 1 x 1 output: result: n x 1

ghk2(P, X, V, anti, start, dfdx, dfdv): input: P: transmorphic X: n x d V: d x d (symmetric, positive definite) anti: 1 x 1 start: 1 x 1 output: result: n x 1 dfdx: n x d dfdv: n x d(d+1)/2

ghk2(P, Xl, Xu, V, anti, start, dfdxl, dfdxu, dfdv): input: P: transmorphic Xl: n x d Xu: n x d V: d x d (symmetric, positive definite) anti: 1 x 1 start: 1 x 1 output: result: n x 1 dfdxl: n x d dfdxu: n x d dfdv: n x d(d+1)/2

ghk2SqrtScrambler(p): input: p: 1 x 1 output: result: n x 1

ghk2NegSqrtScrambler(p): input: p: 1 x 1 output: result: n x 1

Source code

ghk2.mata

References

Cappellari, L., and S. Jenkins. 2003. Multivariate probit regression using simulated maximum likelihood. Stata Journal 3(3): 278-94. Cappellari, L., and S. Jenkins. 2006. Calculation of multivariate normal probabilities by simulation, with applications to maximum simulated likelihood estimation. Stata Journal 6(2): 156-89. Drukker, D., and R. Gates. 2006. Generating Halton sequences using Mata. Stata Journal 6(2): 214-28. Kolenikov, S. 2012. Scrambled Halton sequences in Mata. Stata Journal 12(1): 29-44. Gates, R. 2006. A Mata Geweke-Hajivassiliou-Keane multivariate normal simulator. Stata Journal 6(2): 190-213. Genz, A. 1992. Numerical computation of multivariate normal probabilities. Journal of Computational and Graphical Statistics 1: 141–149.

Author

David Roodman Senior Fellow Center for Global Development Washington, DC droodman@cgdev.org

Also see

Online: [M-5] ghk(), [M-5] ghkfast(), [M-5] halton()