{smcl} {* 11dec2022/12dec2022/18dec2023}{...} {vieweralsosee "[R] Diagnostic plots" "mansection R Diagnosticplots"}{...} {hline} help for {hi:qqplotg} {hline} {title:Quantile-quantile plots, generalized} {phang2}{cmd:qqplotg} {it:{help varname:varname1}} {it:{help varname:varname2}} {ifin} {cmd:,} [ {opt a(str)} {opt flip} {opt trans:form(specification)} {c -(} {opt dvm} {c |} {opt dvp} {c )-} {opt gen:erate(stub)} {opt by(byvar)} {opt miss:ing} {opt lpolyopts(options)} {opt rlopts(options)} {it:graph_options}] {phang2}{cmd:qqplotg} {it:{help varname:varname}} {ifin} {cmd:,} {opt over(groupvar)} [ {opt a(str)} {opt flip} {opt trans:form(specification)} {c -(} {opt dvm} {c |} {opt dvp} {c )-} {opt generate(stub)} {opt by(byvar)} {opt miss:ing} {opt lpolyopts(options)} {opt rlopts(options)} {it:graph_options}] {title:Description} {pstd} {cmd:qqplotg} plots the quantiles of one distribution against the quantiles of another distribution. Here quantiles means ordered values. It is a generalization of official command {helpb qqplot}. Names for this plot include quantile-quantile plot and q-q or Q-Q plot. {pstd} The two distributions may be of unequal size: if so, corresponding quantiles are calculated by interpolation. {pstd} There are two main syntaxes. In the first, emulating {help qqplot}, the two distributions are given by the values of two variables, {it:varname1} and {it:varname2}. {pstd} In the second, the distributions are given by the values of {it:varname} for two distinct groups of {it:groupvar} named in the compulsory option {cmd:over()}. The help for {cmd:qqplot} explains how to set up such a plot, but a one-line command may be convenient. {pstd} By default a reference line of equality is shown to aid in identifying any systematic or random differences between the two distributions. {pstd} Optionally, the distributions may be plotted as differences between corresponding quantiles versus their means; or as differences between corresponding quantiles versus fraction of the data (a.k.a. cumulative probability or plotting position). In each case, a smooth will be added using {help twoway lpoly} of the difference over its support. {pstd} Transformations on the fly are supported. It is suggested as essential practice to supply an informative note or title (unless an informative text caption is given otherwise); and as good practice to use axis labels on the original scale. See {help nicelabels} and {help mylabels} (Cox 2022) for support. {title:Options} {p 4 4 2}{cmd:over()} is a required option whenever you need to specify a group variable that takes on precisely two distinct numeric or string values. {p 8 8 2}{cmd:group()} is allowed as a synonym. {p 4 4 2}{cmd:a()} specifies {it:a} within the plotting position recipe ({it:i} - {it:a}) / ({it:n} + 1 - 2{it:a}) for distinct or unique ranks {it:i} running over the integers from 1 to sample size {it:n}. The default is 0.5, yielding ({it:i} - 0.5) / {it:n}. Alternatives should specify a number such as {cmd:a(0)} or a numeric expression such as {cmd:a(1/3)}. For more detail, see Cox (2014). {p 4 4 2}{cmd:flip} swaps axes as compared with the default. This can be especially helpful when a first pass shows that two groups would be better plotted the other way but you have no desire to recode {it:groupvar}. {p 4 4 2}{cmd:transform()} specifies a transformation to apply to what is plotted on both axes. There are two syntaxes. 1. A bare function name such as {cmd:ln} or {cmd:sqrt} will be applied directly. Do not supply parentheses {cmd:()}. 2. An expression mentioning {cmd:@} will be applied with {cmd:@} replaced on the fly with the appropriate variable name. Hence {cmd:@^(1/3)} specifies cube roots of zero or positive values and {cmd:1/@} specifies reciprocals. If {cmd:dvm} or {cmd:dvp} is also specified, then transforms are calculated first. {p 8 8 2}A warning will be displayed if the transform creates missing values. For example, taking logarithms of zero or negative values would do that. {p 4 4 2}{cmd:dvm} plots differences between corresponding quantiles versus their means as an alternative to plotting quantile versus quantile. The reference becomes the horizontal line defining difference zero. {p 8 8 2}{cmd:diffvsmean} is allowed as a synonym. {p 4 4 2}{cmd:dvp} plots differences between corresponding quantiles versus their plotting positions as an alternative to plotting quantile versus quantile. The reference becomes the horizontal line defining difference zero. {p 8 8 2}{cmd:dvm} and {cmd:dvp} may not be specified together. {p 4 4 2}{cmd:generate(}{it:stub}{cmd:)} generates the quantiles as two new variables, variously {it:stub}{cmd:1} and {it:stub}{cmd:2}; OR {it:stub}{cmd:d} and {it:stub}{cmd:m} if {cmd:dvm} is also specified; OR {it:stub}{cmd:d} and {it:stub}{cmd:p} if {cmd:dvp} is also specified. {p 4 4 2}{opt by(byvar, byopts)} is supported to produce separate plots for the distinct values of a variable {it:byvar}. By default {it:byopts} includes {cmd:legend(off) note("")}. Missing values of {it:byvar} will be ignored unless the further option {cmd:missing} is specified. {p 4 4 2}{cmd:lpolyopts()} are options of {help twoway lpoly} that tune the smooth that appears with options {cmd:dvm} or {cmd:dvp}. Note that {cmd:lpolyopts(nodraw)} suppresses display of such graphs. {p 4 4 2}{cmd:rlopts()} may be used to tune the rendering of reference lines. {p 4 4 2}{it:graph_options} are other options allowed with {help scatter}. Specifically {cmd:aspect(1)} may be a good idea with quantile-quantile plots. {title:Remarks} {p 4 4 2} Quantile plots have a long history, especially but not only in the form of (1) plotting quantiles against rank order or cumulative probability (equivalent to plotting versus the quantiles of a uniform distribution) (2) plotting quantiles against equivalent quantiles of a normal or Gaussian distribution (also known (e.g.) as a normal probability plot, normal scores plot, normal plot, probit plot, or fractile diagram). The terminology {it:quantiles} appears to have been introduced in the late 1930s, so names may vary. Modern history starts with an outstanding paper by Wilk and Gnanadesikan (1968). Chambers, Cleveland, Kleiner, and Tukey (1983) and Cleveland (1993, 1994) remain authoritative and lucid. For Stata-related discussions, see for example Cox (2005, 2007). The help for {help qplot} includes much more discussion. {p 4 4 2} Plots such as those from the {cmd:dvm} and {cmd:dvp} options are known as delta plots in psychology (De Jong {it:et al.} 1994; Speckman {it:et al.} 2008.) {p 4 4 2} It may often be sensible and sufficient to plot selected quantiles, especially if a dataset is very large. That idea is not supported here, but see for example Cox (2016). {p 4 4 2} {cmd:qqplotg} does not explicitly support comparison with expected quantiles from some reference distribution, but the examples include calculations for a normal quantile plot and associated plots. The procedure boils down to calculating plotting positions, possibly obtaining parameter estimates, and pushing plotting positions through code for a quantile function. For more, see Cox (2007, 2014). {title:Examples} {p 4 8 2}{cmd:. sysuse auto, clear}{p_end} {p 4 8 2}{cmd:. nicelabels mpg, local(la) tight}{p_end} {p 4 8 2}{cmd:. qqplotg mpg, over(foreign) flip xla(`la') yla(`la') subtitle(raw scale) name(QQG1, replace)}{p_end} {p 4 8 2}{cmd:. mylabels `la', myscale(@^(1/3)) local(la2)}{p_end} {p 4 8 2}{cmd:. qqplotg mpg, over(foreign) flip transform(@^(1/3)) xla(`la2') yla(`la2') subtitle(cube root scale) name(QQG2, replace)}{p_end} {p 4 8 2}{cmd:. mylabels `la', myscale(ln(@)) local(la3)}{p_end} {p 4 8 2}{cmd:. qqplotg mpg, over(foreign) flip transform(ln) xla(`la3') yla(`la3') subtitle(log scale) name(QQG3, replace)}{p_end} {p 4 8 2}{cmd:. mylabels `la', myscale(1/@) local(la4)}{p_end} {p 4 8 2}{cmd:. qqplotg mpg, over(foreign) flip transform(1/@) xla(`la4') yla(`la4') ysc(reverse) xsc(reverse) subtitle(reciprocal scale) name(QQG4, replace)}{p_end} {p 4 8 2}{cmd:. graph combine QQG1 QQG2 QQG3 QQG4, name(QQG5, replace)}{p_end} {p 4 8 2}{cmd:. * using egen is over the top here, but extends easily to groups}{p_end} {p 4 8 2}{cmd:. egen rank = rank(mpg), unique}{p_end} {p 4 8 2}{cmd:. egen n = count(mpg)}{p_end} {p 4 8 2}{cmd:. su mpg}{p_end} {p 4 8 2}{cmd:. gen normal = r(mean) + r(sd) * invnormal((rank - 0.5)/n)}{p_end} {p 4 8 2}{cmd:. label var normal "Expected normal quantiles"}{p_end} {p 4 8 2}{cmd:. qqplotg mpg normal, name(QQG6, replace)}{p_end} {p 4 8 2}{cmd:. qqplotg mpg normal, dvp lpolyopts(kernel(biweight) bw(0.1)) name(QQG7, replace)}{p_end} {p 4 8 2}{cmd:. use ozone, clear}{p_end} {p 4 8 2}{cmd:. qqplotg stamford yonkers, xla(0(50)150) yla(0(50)250) name(QQG8, replace)}{p_end} {p 4 8 2}{cmd:. mylabels 10 20 50 100 200, myscale(ln(@)) local(la)}{p_end} {p 4 8 2}{cmd:. qqplotg stamford yonkers, xla(`la') yla(`la') transform(ln) subtitle(log scale) name(QQG9, replace)}{p_end} {p 4 8 2}{cmd:. qqplotg stamford yonkers, by(month, subtitle(log scale)) transform(ln) xla(`la') yla(`la') name(QQG10, replace)}{p_end} {title:Author} {p 4 4 2}Nicholas J. Cox, Durham University{break} n.j.cox@durham.ac.uk {title:Also see} {p 4 4 2}Help for{break} {help qqplot}{break} {help qplot} ({it:Stata Journal}) (if installed){break} {help distplot} ({it:Stata Journal}) (if installed){break} {help nicelabels} ({it:Stata Journal}) (if installed){break} {help mylabels} ({it:Stata Journal}) (if installed){break} {help stripplot} (SSC) (if installed) {title:References} {p 4 8 2} Chambers, J. M., W. S. Cleveland, B. Kleiner, and P. A. Tukey. 1983. {it:Graphical Methods for Data Analysis.} Belmont, CA: Wadsworth. {p 4 8 2} Cleveland, W. S. 1993. {it:Visualizing Data.} Summit, NJ: Hobart Press. {p 4 8 2} Cleveland, W. S. 1994. {it:The Elements of Graphing Data.} Summit, NJ: Hobart Press. {p 4 8 2} Cox, N. J. 2005. Speaking Stata: The protean quantile plot. {it:Stata Journal} 5: 442{c -}460. {p 4 8 2} Cox, N. J. 2007. Stata tip 47: Quantile{c -}quantile plots without programming. {it:Stata Journal} 7: 275{c -}279. {p 4 8 2} Cox, N. J. 2014. How can I calculate percentile ranks? How can I calculate plotting positions? {browse "http://www.stata.com/support/faqs/statistics/percentile-ranks-and-plotting-positions/":http://www.stata.com/support/faqs/statistics/percentile-ranks-and-plotting-positions/} {p 4 8 2} Cox, N. J. 2016. Speaking Stata: Letter values as selected quantiles. {it:Stata Journal} 16: 1058{c -}1071. {p 4 8 2} Cox, N. J. 2022. Speaking Stata: Automating axis labels: Nice numbers and transformed scales. {it:Stata Journal} 22: 975{c -}995. {p 4 8 2} De Jong, R., C. C. Liang and E. Lauber. 1994. Conditional and unconditional automaticity: A dual-process model of effects of spatial stimulus-response concordance. {it:Journal of Experimental Psychology: Human Perception and Performance} 20: 731{c -}750. {p 4 8 2} Speckman, P. L., J. N. Rouder, R. D. Morey and M.S. Pratte. 2008. Delta plots and coherent distribution ordering. {it:The American Statistician} 62: 262{c -}266. doi: 10.1198/000313008X333493 {p 4 8 2} Wilk, M. B. and R. Gnanadesikan. 1968. Probability plotting methods for the analysis of data. {it:Biometrika} 55: 1{c -}17. https://doi.org/10.2307/2334448