help ghk2()-------------------------------------------------------------------------------

Title

ghk2()-- Geweke-Hajivassiliou-Keane (GHK) multivariate normal simulator using pre-generated points

Syntax

P=ghk2setup(real scalar n,real scalar m,real scalar d,string scalar type,|real scalar pi,pointer (real colvector function)pfn)

real colvectorghk2(P,real matrix X,real matrix V,real scalaranti,real scalar start)

real colvectorghk2(P,real matrix Xl,real matrix Xu,real matrix V,real scalar anti,real scalar start)

real colvectorghk2(P,real matrix X,real matrix V,real scalaranti,real scalar start,real matrix dfdx,realmatrix dfdv)

real colvectorghk2(P,real matrix Xl,real matrix Xu,real matrix V,real scalar anti,real scalar start,real matrixdfdxl,real matrix dfdxu,real matrix dfdv)

real colvectorghk2SqrtScrambler(real scalar p)

real colvectorghk2NegSqrtScrambler(real scalar p)where

P, if it is declared, should be declared

transmorphicPwherepfn, if it is passed, should point to a Mata function declared like ghk2SqrtScrambler() or ghk2NegSqrtScrambler(),and where

dfdx,dfdxl,dfdxu, anddfdvare outputs

real matrix dfdxreal matrix dfdxlreal matrix dfdxureal matrix dfdv

Input parameters

nNumber of observations for which to prepare drawsmDraws per observation and simulated dimensiondDimension of cumulative integrals for which to be prepared to simulatetypeSequence typepiOptional starting index of prime bases for Halton sequences (1->2, 2->3, 3->5, 4->7...) (default=1)pfnOptional pointer to scrambling function such as ghk2SqrtScrambler() or ghk2NegSqrtScrambler()PDraws prepared byghk2setup()XUpper bounds of integrationXl, XuLower and upper bounds of integrationVCovariance matrixantiOptional dummy for inclusion of antithetics (default=0)startStarting point to use in block of draws prepared byghk2setup()pNumber, normally prime, for which the vector (0, 1, ..., p-1)' should be scrambledOutput parameters

dfdxScores with respect toXdfdxl, dfdxuScores with respect toXl, XudfdvScores with respect toV, stored as vectorized lower-triangular matrices

Description

ghk2()implements the Geweke-Hajivassiliou-Keane algorithm for simulating the cumulative multivariate normal distribution, optionally computing scores, and optionally accepting lower as well as upper bounds. (See Cappellari and Jenkins 2003, 2005; Gates 2006.) It is modeled on ghkfast(), which is included in Stata 10, and which see for more explanation. Likeghkfast(), its first argument is a pre-generated set of points on the unit interval, in this case produced byghk2setup(), which has the same syntax and semantics as ghkfastsetup(). The two commands' point sets are not interchangeable.ghk2()differs from ghkfast() in the following ways:* It accepts lower as well as upper bounds for integration (second and fourth syntaxes above). This allows efficient estimation of probabilities over bounded rectilinear regions such as {(x1, x2) | l1<=x1<=u1, l2<=x2<=u2}. Without this feature, the routine would need to be called 2^d times, where d is the dimension of distribution. For example, the probability just mentioned would have to be computed as Phi(u1, u2) - Phi(l1, u2) - Phi(u1, l2) + Phi(l1, l2), where Phi is a bivariate cumulative normal distribution with some given covariance structure. Individual entries in the lower and upper bounds,

XlandXu, may be missing ("."), and are interpreted as -infinity and +infinity, respectively. The fourth syntax differs from the second in requesting score matrices for upper and lower bounds, as well as for the covariance matrixV.*

ghk2()does not "pivot" the bounds of integration (inX,Xl, orXu). On the recommendation of Genz (1992), ghk() and ghkfast() reorder each vector of bounds to put the larger entries toward the end, which reduces the variability of the simulated probability. However, pivoting has the disadvantage of creating discontinuities in results. Small changes in the bounds can produce relatively large ones in the results when they trigger reorderings of the pivoted vector. Especially when the number of draws is low, these discontinuities can stymie a search by ml. Thusghk2()behaves very smoothly even at low draw counts, at the expense of more variability. (As of Stata 10.1, ghk() and ghkfast() also allow pivoting to be turned off.)*

ghk2()works in Stata 9. Stata 9 does ship with ghk(), but this does not use pre-generated points, and so is slower.*

ghk2()is generally faster than ghkfast(), at least in single-processor versions of Stata. It is optimized for contexts with a large number of observations relative to draws. In extreme cases, such as 10,000 observations and 10 draws, it can perform an order of magnitude faster. But at the opposite extreme, with, say, 100 observations and 1,000 draws, it can run half as fast.*

ghk2()accepts an optional scrambling function. Halton sequences based on large primes can have decidedly non-uniform coverage of the unit hypercube (Drukker and Gates 2006). "Scrambling" the sequences can increase uniformity (Kolenikov 2012).* The

startargument allows the user to shift the starting observation within the pre-computed block of draws. E.g., if the pre-computed block of draws is for 200 observation rows, when callingghk2()with a data matrix that has only 100 rows thestartargument would allow rows 101-200 of the draws to be used rather than the usual 1-100.* The

antiargument, specifying whether to include antithetical draws, is required. Any non-zero value is interpreted as requesting them.* It does not take a

rankargument. (ghk() and ghkfast() lost it in Stata 10.1 as well.)

RemarksThe

typeargument may be"halton","hammersley","random", or"ghalton". "Random" and generalized Halton sequences are influenced by the state of the random number generator just beforeghk2setup()is called. See [M-5] uniform().The

ghk2()routine performs error checking and then calls one of four additional routines, whose syntaxes correspond to the four listed above:_ghk2(),_ghk2_2(),_ghk2_d, and_ghk2_2d. You can call these routines directly for a slight speed gain.

ghk2SqrtScrambler(p)scrambles the modulo-p numbers u=(0, 1, ... p-1}' with the formula mod(u * floor(sqrt(p)), p).ghk2NegSqrtScrambler(p)uses mod(u * ( -floor(sqrt(p))), p). The user may provide alternative functions with the same type of arguments and output; see Kolenikov (2012) for ideas.

Examples (colored text is clickable)

* ghk() and ghkfast() syntax changed in Stata 10.1, but these examplesare not updated yet.. version 9.0

* Exact matches, using Halton sequencep = ghkfastsetup(10000, 5, 3, "halton") p2 = ghk2setup(10000, 5, 3, "halton") V = 1, .5, .4 \ .5, 1, .3 \ .4, .3, 1 rank = dfdx = dfdv = . anti = 0 start = .

* Exact matches, using Halton sequenceghk((1,2,3), V, (1,5), rank) ghkfast(p, (1,2,3), V, rank) ghk2(p2, (1,2,3), V, anti, start)

* Inexact matches because ghk() and ghkfast() pivot the data vector,ordering from low to highghk((3,2,1), V, (1,5), rank) ghkfast(p, (3,2,1), V, rank) ghk2(p2, (3,2,1), V, anti, start)

* Timing comparisons for many observations, few draws, with and withoutscore computationX = J(10000,3,1) timer_clear() timer_on(1); mean(ghkfast(p, X, V, rank, ., anti)); timer_off(1) timer_on(2); mean(ghk2(p2, X, V, anti, start)); timer_off(2) timer()timer_clear() timer_on(1); mean(ghkfast(p, X, V, rank, ., anti, dfdx, dfdv)); timer_off(1) timer_on(2); mean(ghk2(p2, X, V, anti, start, dfdx, dfdv)); timer_off(2) timer()

* Timing comparisons for fewer observations, many draws, includingantithetical drawsanti = 1 p = ghkfastsetup(1000, 250, 3, "halton") p2 = ghk2setup(1000, 250, 3, "halton") X = J(1000,3,1) timer_clear() timer_on(1); mean(ghkfast(p, X, V, rank, ., anti)); timer_off(1) timer_on(2); mean(ghk2(p2, X, V, anti, start)); timer_off(2) timer()timer_clear() timer_on(1); mean(ghkfast(p, X, V, rank, ., anti, dfdx, dfdv)); timer_off(1) timer_on(2); mean(ghk2(p2, X, V, anti, start, dfdx, dfdv)); timer_off(2) timer()

* Demonstration of using lower and upper bounds. The two versions agreeasymptotically in the number of draws.* The first is 8 times faster than the last.l1=l2=l3=0; u1=1; u2=2; u3=3 ghk2(p2, (l1,l2,l3), (u1,u2,u3), V, anti, start) ghk2(p2,(u1,u2,u3),V,1,.)-ghk2(p2,(l1,u2,u3),V,1,.)-ghk2(p2,(u1,l2,u3),V, > 1,.)-ghk2(p2,(u1,u2,l3),V,1,.)+ghk2(p2,(l1,l2,u3),V,1,.)+ghk2(p2,(u > 1,l2,l3),V,1,.)+ghk2(p2,(l1,u2,l3),V,1,.)-ghk2(p2,(l1,l2,l3),V,1,.)

* Demonstration of scrambling. Square-root scrambling doesn't affectprimes 2 and 3; negative square-root doesn't affect 2.(0::1), ghk2SqrtScrambler(2), ghk2NegSqrtScrambler(2) (0::2), ghk2SqrtScrambler(3), ghk2NegSqrtScrambler(3) (0::4), ghk2SqrtScrambler(5), ghk2NegSqrtScrambler(5)

* Examples of scrambling in ghk2(): 4-dimensional problem uses primes 2,3, 5.V = 1, .5, .5, .5 \ .5, 1, .5, .5 \ .5, .5, 1, .5 \ .5, .5, .5, 1 p2 = ghk2setup(1, 5, 4, "halton", 1) ghk2(p2, (1,1,1,1), V, anti, start) p2 = ghk2setup(1, 5, 4, "halton", 1, &ghk2SqrtScrambler()) ghk2(p2, (1,1,1,1), V, anti, start) p2 = ghk2setup(1, 5, 4, "halton", 1, &ghk2NegSqrtScrambler()) ghk2(p2, (1,1,1,1), V, anti, start)

Conformability

ghk2setup(n,m,d,type, |pi):n: 1x1m: 1x1d: 1x1type: 1x1pi: 1x1result:transmorphic

ghk2(P,X,V,anti,start):input:P:transmorphicX:n x dV:d x d(symmetric, positive definite)anti: 1x1start: 1x1output:result: nx1

ghk2(P,Xl,Xu,V,anti,start):input:P:transmorphicXl:n x dXu:n x dV:d x d(symmetric, positive definite)anti: 1x1start: 1x1output:result: nx1

ghk2(P,X,V,anti,start,dfdx,dfdv):input:P:transmorphicX:n x dV:d x d(symmetric, positive definite)anti: 1x1start: 1x1output:result:n x1dfdx:n x ddfdv:n x d(d+1)/2

ghk2(P,Xl,Xu,V,anti,start,dfdxl,dfdxu,dfdv):input:P:transmorphicXl:n x dXu:n x dV:d x d(symmetric, positive definite)anti: 1x1start: 1x1output:result:n x1dfdxl:n x ddfdxu:n x ddfdv:n x d(d+1)/2

ghk2SqrtScrambler(p):input:p:1 x1output:result:n x1

ghk2NegSqrtScrambler(p):input:p:1 x1output:result:n x1

Source codeghk2.mata

ReferencesCappellari, L., and S. Jenkins. 2003. Multivariate probit regression using simulated maximum likelihood.

Stata Journal3(3): 278-94. Cappellari, L., and S. Jenkins. 2006. Calculation of multivariate normal probabilities by simulation, with applications to maximum simulated likelihood estimation.Stata Journal6(2): 156-89. Drukker, D., and R. Gates. 2006. Generating Halton sequences using Mata.Stata Journal6(2): 214-28. Kolenikov, S. 2012. Scrambled Halton sequences in Mata.Stata Journal12(1): 29-44. Gates, R. 2006. A Mata Geweke-Hajivassiliou-Keane multivariate normal simulator.Stata Journal6(2): 190-213. Genz, A. 1992. Numerical computation of multivariate normal probabilities.Journal of Computational and Graphical Statistics1: 141–149.

AuthorDavid Roodman Senior Fellow Center for Global Development Washington, DC droodman@cgdev.org

Also seeOnline:

[M-5] ghk(),[M-5] ghkfast(),[M-5] halton()