{smcl} {* *! version 1.2.0 23Mar2019}{...} {vieweralsosee "[P] glevelsof" "mansection P glevelsof"}{...} {vieweralsosee "" "--"}{...} {vieweralsosee "[P] foreach" "help foreach"}{...} {vieweralsosee "" "--"}{...} {viewerjumpto "Syntax" "glevelsof##syntax"}{...} {viewerjumpto "Description" "glevelsof##description"}{...} {viewerjumpto "Options" "glevelsof##options"}{...} {viewerjumpto "Remarks" "glevelsof##remarks"}{...} {viewerjumpto "Stored results" "glevelsof##results"}{...} {title:Title} {p2colset 5 18 23 2}{...} {p2col :{cmd:glevelsof} {hline 2}}Efficiently get levels of variable using C plugins{p_end} {p2colreset}{...} {pstd} {it:Important}: Please run {stata gtools, upgrade} to update {cmd:gtools} to the latest stable version. {marker syntax}{...} {title:Syntax} {phang} This is a fast option to Stata's {opt levelsof}. It can additionally take multiple variables. It is 3 to 13 times faster in Stata/IC and 2.5-7 times faster in MP {p 8 17 2} {cmd:glevelsof} {varlist} {ifin} [{cmd:,} {it:options}] {pstd} Instead of {varlist}, it is possible to specify {p 8 17 2} [{cmd:+}|{cmd:-}] {varname} [[{cmd:+}|{cmd:-}] {varname} {it:...}] {pstd} To change the sort order of the results. {synoptset 25 tabbed}{...} {marker table_options}{...} {synopthdr} {synoptline} {syntab :Options} {synopt:{opt c:lean}}display string values without compound double quotes{p_end} {synopt:{opt l:ocal(macname)}}insert the list of values in the local macro {it:macname}{p_end} {synopt:{opt miss:ing}}include missing values of {varlist} in calculation{p_end} {synopt:{opt s:eparate(separator)}}separator to serve as punctuation for the values of returned list; default is a space{p_end} {syntab:Extras} {synopt:{opt nolocal:var}}Do not store the levels of {opt varlist} in a local macro.{p_end} {synopt:{opt silent}}Do not display the levels of varlist. For use with {opt gen()} and {opt mata:save}{p_end} {synopt:{opt mata:save}[{cmd:(}{it:str}{cmd:)}]}Save results in mata object (default name is {bf:GtoolsByLevels}){p_end} {synopt:{opt gen([prefix], [replace])}}Store the levels of {it:varlist} in new varlist ({opt prefix}) or {opt replace} {it:varlist} with its levels{p_end} {synopt:{opt cols:eparate(separator)}}separator to serve as punctuation for the columns of returned list; default is a pipe{p_end} {synopt:{opth numfmt(format)}}Number format for numeric variables. Default is {opt %.16g} (or {opt %16.0g} with {opt matasave}).{p_end} {synopt:{opt unsorted}}do not sort levels (ignored if inputs are integers){p_end} {syntab:Gtools} {synopt :{opt compress}}Try to compress strL to str#. {p_end} {synopt :{opt forcestrl}}Skip binary variable check and force gtools to read strL variables. {p_end} {synopt :{opt v:erbose}}Print info during function execution. {p_end} {synopt :{cmd:bench}[{cmd:(}{int}{cmd:)}]}Benchmark various steps of the plugin. Optionally specify depth level. {p_end} {synopt :{opth hash:method(str)}}Hash method (default, biject, or spooky). Intended for debugging. {p_end} {synopt :{opth oncollision(str)}}Collision handling (fallback or error). Intended for debugging. {p_end} {synoptline} {p2colreset}{...} {marker description}{...} {title:Description} {pstd} {cmd:glevelsof} displays a sorted list of the distinct values of {varlist}. It is meant to be a fast replacement of {cmd:levelsof}. Unlike {cmd:levelsof}, it can take a single variable or multiple variables. {pstd} {cmd:glevelsof} is part of the {manhelp gtools R:gtools} project. {marker options}{...} {title:Options} {dlgtab:Options} {phang} {cmd:clean} displays string values without compound double quotes. By default, each distinct string value is displayed within compound double quotes, as these are the most general delimiters. If you know that the string values in {varlist} do not include embedded spaces or embedded quotes, this is an appropriate option. {cmd:clean} does not affect the display of values from numeric variables. {phang} {cmd:local(}{it:macname}{cmd:)} inserts the list of values in local macro {it:macname} within the calling program's space. Hence, that macro will be accessible after {cmd:glevelsof} has finished. This is helpful for subsequent use, especially with {helpb foreach}. {phang} {cmd:missing} specifies that missing values of {varlist} should be included in the calculation. The default is to exclude them. {phang} {cmd:separate(}{it:separator}{cmd:)} specifies a separator to serve as punctuation for the values of the returned list. The default is a space. A useful alternative is a comma. {phang} {cmd:colseparate(}{it:separator}{cmd:)} specifies a separator to serve as punctuation for the columns of the returned list. The default is a pipe. Specifying a {varlist} instead of a {varname} is only useful for double loops or for use with {helpb gettoken}. {phang} {opth numfmt(format)} Number format for printing. By default numbers are printed to 16 digits of precision, but the user can specify the number format here. By default, only "%.#g|f" and "%#.#g|f" are accepted since this is formated internally in C. However, with option {opt matasave} this is formated in mata and has to be a mata format. {phang} {opth unsorted} Do not sort levels. This option is experimental and only affects the output when the input is not an integer (for integers, the levels are sorted internally regardless; the user would request the spooky hash method via {opt hash()}, which obeys the {opt unsorted} option, but this is intended for debugging). While not sorting the levels is faster, {cmd:glevelsof} is typically used when the number of levels is small (10s, 100s, 1000s) and thus speed savings will be minimal. {phang} {opt nolocalvar}Do not store the levels of {opt varlist} in a local macro. This is specially useful with option {opt gen()}. {phang} {opt silent}Do not display the levels of varlist. Mainly for use with {opt gen()} and {opt mata:save}. With {opt mata:save}, the levels are not sepparately stored as a string matrix, but the raw levels {it:are} kept. {phang} {opt mata:save}[{cmd:(}{it:str}{cmd:)}]Save results in mata object (default name is {bf:GtoolsByLevels}). See {opt GtoolsByLevels.desc()} for more. This object contains the raw variable levels in {opt numx} and {opt charx} (since mata does not allow matrices of mixed-type). The levels are saved as a string in {opt printed} (with value labels correctly applied) unless option {opt silent} is also specified. {phang} {opt gen([prefix], [replace])} Store the levels of {it:varlist} in new varlist ({opt prefix}) or {opt replace} {it:varlist} with its levels. These options are mutually exclusive. {dlgtab:Gtools} {phang} {opt compress} Try to compress strL to str#. The Stata Plugin Interface has only limited support for strL variables. In Stata 13 and earlier (version 2.0) there is no support, and in Stata 14 and later (version 3.0) there is read-only support. The user can try to compress strL variables using this option. {phang} {opt forcestrl} Skip binary variable check and force gtools to read strL variables (14 and above only). {opt Gtools gives incorrect results when there is binary data in strL variables}. This option was included because on some windows systems Stata detects binary data even when there is none. Only use this option if you are sure you do not have binary data in your strL variables. {phang} {opt verbose} prints some useful debugging info to the console. {phang} {opt bench:mark} and {opt bench:marklevel(int)} print how long in seconds various parts of the program take to execute. The user can also pass {opth bench(int)} for finer control. {opt bench(1)} is the same as benchmark but {opt bench(2)} and {opt bench(3)} additionally print benchmarks for internal plugin steps. {phang} {opth hashmethod(str)} Hash method to use. {opt default} automagically chooses the algorithm. {opt biject} tries to biject the inputs into the natural numbers. {opt spooky} hashes the data and then uses the hash. {phang} {opth oncollision(str)} How to handle collisions. A collision should never happen but just in case it does {opt gtools} will try to use native commands. The user can specify it throw an error instead by passing {opt oncollision(error)}. {marker remarks}{...} {title:Remarks} {pstd} {cmd:glevelsof} serves two different functions. First, it gives a compact display of the distinct values of {it:varlist}. More commonly, it is useful when you desire to cycle through the distinct values of {it:varlist} with (say) {cmd:foreach}; see {helpb foreach:[P] foreach}. {cmd:glevelsof} leaves behind a list in {cmd:r(levels)} that may be used in a subsequent command. {pstd} {cmd:glevelsof} may hit the {help limits} imposed by your Stata. However, it is typically used when the number of distinct values of {it:varlist} is modest. If you have many levels in varlist then an alternative may be {help gtoplevelsof}, which shows the largest or smallest levels of a varlist by their frequency count. {marker examples}{...} {title:Examples} {phang}{cmd:. sysuse auto} {phang}{cmd:. glevelsof rep78}{p_end} {phang}{cmd:. display "`r(levels)'"}{p_end} {phang}{cmd:. glevelsof rep78, miss local(mylevs)}{p_end} {phang}{cmd:. display "`mylevs'"}{p_end} {phang}{cmd:. glevelsof rep78, sep(,)}{p_end} {phang}{cmd:. display "`r(levels)'"}{p_end} {phang}{cmd:. glevelsof foreign rep78, sep(,)}{p_end} {phang}{cmd:. display `"`r(levels)'"'}{p_end} {phang}{cmd:. glevelsof foreign rep78, gen(uniq_) nolocal}{p_end} {phang}{cmd:. desc uniq_*}{p_end} {phang}{cmd:. glevelsof foreign rep78, mata(uniq) nolocal}{p_end} {phang}{cmd:. mata uniq.desc()}{p_end} {pstd} See the {browse "http://gtools.readthedocs.io/en/latest/usage/glevelsof/index.html#examples":online documentation} for more examples. {marker results}{...} {title:Stored results} {pstd} {cmd:glevelsof} stores the following in {cmd:r()}: {synoptset 15 tabbed}{...} {p2col 5 15 19 2: Macros}{p_end} {synopt:{cmd:r(levels)}}list of distinct values{p_end} {p2colreset}{...} {synoptset 20 tabbed}{...} {p2col 5 20 24 2: Scalars}{p_end} {synopt:{cmd:r(N) }} number of non-missing observations {p_end} {synopt:{cmd:r(J) }} number of groups {p_end} {synopt:{cmd:r(minJ)}} largest group size {p_end} {synopt:{cmd:r(maxJ)}} smallest group size {p_end} {p2colreset}{...} {pstd} With {opt matasave}, the following data is stored in {opt GtoolsByLevels}: real scalar anyvars 1: any by variables; 0: no by variables real scalar anychar 1: any string by variables; 0: all numeric by variables real scalar anynum 1: any numeric by variables; 0: all string by variables string rowvector byvars by variable names real scalar kby number of by variables real scalar rowbytes number of bytes in one row of the internal by variable matrix real scalar J number of levels real matrix numx numeric by variables string matrix charx string by variables real scalar knum number of numeric by variables real scalar kchar number of string by variables real rowvector lens > 0: length of string by variables; <= 0: internal code for numeric variables real rowvector map map from index to numx and charx real rowvector charpos position of kth character variable string matrix printed formatted (printf-ed) variable levels (not with option -silent-) {marker author}{...} {title:Author} {pstd}Mauricio Caceres Bravo{p_end} {pstd}{browse "mailto:mauricio.caceres.bravo@gmail.com":mauricio.caceres.bravo@gmail.com }{p_end} {pstd}{browse "https://mcaceresb.github.io":mcaceresb.github.io}{p_end} {title:Website} {pstd}{cmd:glevelsof} is maintained as part of {manhelp gtools R:gtools} at {browse "https://github.com/mcaceresb/stata-gtools":github.com/mcaceresb/stata-gtools}{p_end} {marker acknowledgment}{...} {title:Acknowledgment} {pstd} This help file was based on StataCorp's own help file for {it:levelsof}. {p_end} {pstd} This project was largely inspired by Sergio Correia's {it:ftools}: {browse "https://github.com/sergiocorreia/ftools"}. {p_end} {pstd} The OSX version of gtools was implemented with invaluable help from @fbelotti; see {browse "https://github.com/mcaceresb/stata-gtools/issues/11"}. {p_end} {title:Also see} {p 4 13 2} help for {help gtoplevelsof}, {help gtools}; {help flevelsof} (if installed), {help ftools} (if installed)