help egen vreldif                                        Author: Stas Kolenikov


[D] egen -- Extensions to generate


egen [type] newvar = vreldif(varlist1) [if] [in] , by(varlist2)

egen ... = vreldif() creates a variable that contains the relative differences (see mreldif) of the variables in varlist1 within the values identified by varlist2. It is useful to compare results in two appended data sets when some minor numeric discrepancies are expected.

by(varlist2) is required. It is expected that each unique combination of variables in varlist2 identifies at most two observations in the data set.

Comparable functionality can be achieved by the following Stata pseudocode:

generate newvar = 0

foreach x of varlist varlist1 {

bysort varlist2: replace newvar = newvar + reldif( `x'[1], `x'[_N] )


bysort varlist2: replace newvar = . if _N == 1


. sysuse auto, clear

. set seed 10101

. gen byte replic = ceil( 0.5+1.5*uniform())

. expand replic, gen( datacopy )

. tabulate datacopy

. replace weight = weight + uniform()

. egen check1 = vreldif(mpg price), by(make)

. egen check2 = vreldif(mpg weight), by(make)

The variable check1 should be zero in the observations that were doubled up, and missing in the unique observations:

. assert check1 == 0 if !missing( check1 )

The variable check2 will not be zero in the observations that were doubled up, so this assert should fail:

. assert check2 == 0 if !missing( check2 )

Since the differences of the values in the weight variable between the two "copies" of data (identified by datacopy variable) are in the fourth digit, the non-missing values of check2 should be of the order 1e-4: