{smcl} {* 18feb2004}{...} {hline} help for {hi:_gmlabvpos}{right:manual: [R] egen } {right:dialog: } {hline} {title:Egen-Function to generate a variable for mlabvpos()} {p 8 18 2}{cmd:egen} [type] {it:newvar} {cmd:=} {cmd:mlabvpos(}{it:yvar xvar}{cmd:)} [{cmd:if} {it:exp}] [{cmd:in} {it:range}] [{cmd:,} {cmd:log} {cmdab:poly:nom(}{it:#}{cmd:)} {cmd:mat:rix(}{it: 5x5 matrix}{cmd:)} {p 4 4 2} whereby {it:yvar} is the name of a variable which is going to be plotted as Y-variable in a scatterplot and {it:xvar} is the name of a variable which forms the X-axis of that scatterplot. {title:Description} {p 4 4 2} {cmd:_gmlabvpos} is an attempt to automatically generate a variable for the clockpositions of marker labels in scatterplots. That is, the command generates a variable which can be filled into the scatter option {cmd:mlabvpos()}. {p 4 4 2} Note that the program does not attempt to prevent marker labels from overploting, which is quite likely in dataset with many observations. In such situations you might be better of in simply make randomized clock positions: {p 4 4 2} {cmd:. gen clock = int(uniform()*12)+1)} {p 4 4 2} The general idea behind _gmlabvpos is to pull the marker label away from the data-region. For example, marker symbols in the lower left edge of the data region are labeled at clock-position 7 or 8, and marker symbols in the upper right edge of the data region are labeled at clock-position 1 or 2, etc. More precisely, if you consider the following rectangle as the data-region of a scatterplot, than marker labels of symbols in the indicated area gets the following clock-position: {col 10}{c TLC}{hline 14}{c TRC} {col 10}{c |}11 12 12 12 1{c |} {col 10}{c |}10 11 12 1 2{c |} {col 10}{c |} 9 9 12 3 3{c |} {col 10}{c |} 8 7 6 5 4{c |} {col 10}{c |} 7 6 6 6 5{c |} {col 10}{c BLC}{hline 14}{c BRC} {p 4 4 2} If {it:yvar} and {it:xvar} are highly correlated, than the clock-positions are generated as follows (which is however the same general idea): {col 10}{c TLC}{hline 14}{c TRC} {col 10}{c |} 12 1 3{c |} {col 10}{c |} 12 12 3 4{c |} {col 10}{c |}11 11 12 5 5{c |} {col 10}{c |}10 9 6 6 {c |} {col 10}{c |} 9 7 6 {c |} {col 10}{c BLC}{hline 14}{c BRC} {p 4 4 2} To calculate the the clock-positions, Stata first categorize the x-axis into 5 equal sized intervals around the mean of {it:xvar}. Afterwards the residuals of a linear regression of {it:yvar} on {it:xvar} are categorized into 5 equal sized intervals. Both categorized variables are than used to form the clockpositions according to the rule of the first table above. The rule can be changed with the option {cmd:matrix()}. {title:Options} {p 4 8 2} {cmd:log} is used, if you want to calculate the residuals from the regression of {it:yvar} on a logarthmic version of {it:xvar}. This might be useful if the scatter shows a strong curvilinar relationship. {p 4 8 2} {cmd:polynom(#)} is used, if you want to calculate the residuals from the regression of {it:yvar} on polynoms of {it:xvar}. For example use {cmd:polynom(2) if the scatter shows a u-shaped relationship. {p 4 8 2} {cmd:matrix(#)} is used to change the general rule for the plot-positions. The clock positions are specified by a 5x5 matrix, whereby the upper left cell refer to the clock position of marker labels in the upper left part of the data-region. etc. {title:Examples} {cmd:. egen clock = mlabvpos(mpg weight)} {p 4 8 2}{cmd:. sc mpg weight, mlab(make) mlabvpos(clock)}{p_end} {p 4 8 2}{cmd:. egen clock2 = mlabvpos(mpg weight), matrix(11 1 12 11 1 \\ 10 2 12 10 2 \\ 9 3 12 9 3 \\ 8 4 6 8 4 \\ 7 5 6 7 5)}{p_end} {p 4 8 2}{cmd:. sc mpg weight, mlab(make) mlabvpos(clock2)}{p_end} {title:Also see} {p 4 13 2} Online: help for {help scatter}, {title:Author} {p 4 13 2} Ulrich Kohler, WZB, kohler@wz-berlin.de