Insert gap observations in a data set
ingap [ numlist ] [ if expression ] [ in range ] [ , after gapindicator(newvarname{cmd:) neworder(newvarname) rowlabel(string_varname) growlabels(string_list) grexpression(gap_row_label_expression) rstring(string_replacement_option) ]
where numlist is an optional list of integers, string_list is a list of strings, gap_row_label_expression is a string-valued expression, and string_replacement_option can be name, label or labname.
by varlist : may be used with ingap; see help for by.
Description
ingap inserts gap observations into a list of positions in an existing data set. All existing variables in the data set will have missing values in the gap observations, unless the user specifies otherwise. Often, the user specifies non-missing values in the gap observations for one particular existing string variable, known as the row label variable. This row label variable may then be output with a list of other variables to form a publication-ready table using the listtex package. Alternatively, the row label variable may be encoded, using the sencode package, to form a numeric variable with value labels, which can then be plotted on one axis of a graph to define axis labels. The sencode and listtex packages are downloadable from SSC.
ingap inserts a gap observation next to (before or after) each of a list of observations specified by the numlist. A positive number i in the numlist specifies the ith existing observation in the data set, or in each by-group if by varlist : is specified. A negative number -i in the numlist specifies the ith existing observation, in reverse order, from the end of the data set, or from the end of each by-group if by varlist : is specified. A zero or out-of-range number in the numlist is ignored. The numlist is set to 1 if not specified by the user. ingap assumes that the data set in memory has up to 3 classes of variables. These are the by-variables (which define by-groups possibly representing the pages of a table), a row label variable (possibly containing the row labels in the left margin of the table), and the remaining variables (which may form the entries in the table rows). A gap observation inserted by ingap has the same values for the by-variables as the observation next to which it was inserted, a row label value specified by the growlabels() or grexpression() options, and missing values (or possibly column headings) in the remaining variables. ingap may also generate new variables, indicating whether the observation is a gap observation and/or the new order of the observation in the data set (or by-group) after the gap observations have been inserted.
Options
after specifies that each gap observation will be inserted after the corresponding existing observation in the data set or by-group specified in the numlist. If after is not specified, then each gap observation will be inserted before the corresponding existing observation.
gapindicator(newvarname) specifies the name of a new variable to be generated, equal to 1 for the newly-inserted gap observations and 0 for all other observations.
neworder(newvarname) specifies the name of a new variable to be generated, equal to the new sequential order of the observation within the data set (or within the by-group if by varlist : is specified), after the gap observations have been inserted. The new variable has no missing values. After execution of ingap, the data set in memory is sorted primarily by the by-variables (if specified), and secondarily by the neworder() variable (if specified).
rowlabel(string_varname) specifies the name of an existing string variable, used as the row labels for a table whose rows are the observations. In the gap observations, this string variable is set to the value specified by the corresponding string listed in the growlabels() option if that option is specified (see below), or to a missing value otherwise. The rowlabel() variable may not be a by-variable.
growlabels(string_list) specifies a string value for each of the row labels in the gap observations. The jth string in the string_list is written to the rowlabel variable in the newly-inserted gap observation inserted next to the jth observation mentioned in the numlist. If the rowlabel option is present and the growlabel() option is absent, then the rowlabel() variable is initialised to missing in the gap observations.
grexpression(gap_row_label_expression) specifies a string expression, to be evaluated in all gap observations to give the final values of the rowlabel() variables in these gap observations. If grexpression() and growlabels() are both specified, then the result of grexpression() replaces any values set by growlabels(). (However, the name of the rowlabels() variable may appear in the grexpression() expression, so that the values of the rowlabels() variable can be modified in ways depending on the original values set by the growlabels() list.) Note that, when the grexpression() expression is evaluated, all variables other than the rowlabels() variable have been set to their final values, which are missing for all variables except the by-variables and the rowlabel() variable, except if they have been set to other values by the rstring() option (see below). However, the grexpression() expression may access values of variables in adjacent observations using subscripting. If by-variables are present, then any subscripts in the expression specified by grexpression() are defined within by-groups, and are defined including the gap observations. For instance, if a gap observation is inserted at the beginning of each by-group, then the value of _n in these gap observations will be 1.
rstring(string_replacement_option) specifies a rule for replacing the values of string variables (other than the by-variables and row label variables)) in gap observations. If rstring() is set to name, then string variables which are not by-variables or row label variables are reset to their variable names in by-gap observations. If rstring() is set to label, then string variables that are not by-variables or row label variables are set to their variable labels in by-gap observations, or to missing values if their variable labels are missing. If rstring() is set to labname, then string variables that are not by-variables or row label variables are set to their variable labels in by-gap observations, or to their variable names if their variable labels are missing. If rstring() is set to any other value, or not set, then string variables that are not by-variables or row label variables are set to missing values. (Note that numeric variables that are not by-variables are always set to numeric missing values in gap observations.) The rstring() option allows the user to add a row of column headings to a data set of string variables, or to add a row of column headings to each by-group of a data set of string variables. Note that numeric variables may be converted to string variables using the sdecode package, downloadable from SSC, before using ingap and listtex. This allows the user to use the rstring option, and also to format numeric variables in ways not possible using Stata formats alone, such as adding parentheses to confidence limits.
Remarks
ingap is typically used to convert a Stata data set to a form with 1 observation per table row (including gap rows), or 1 observation per graph axis label (including gap axis labels). The user can then list the data set as a TeX, LaTeX, HTML or Microsoft Word table, using the listtex package (downloadable from SSC). Alternatively, for immediate impact, the user can use the sencode package (downloadable from SSC) to encode the row labels to a numeric variable, and then plot this numeric variable against other variables using Stata graphics programs. For instance, a user of Stata 8 or above might use eclplot (downloadable from SSC) to produce horizontal confidence interval plots, with the row labels on the vertical axis. It is often advisable for the user to type preserve before a sequence of commands including ingap, and to type restore after a sequence of commands using ingap, because ingap modifies the data set by adding new observations. It is often also advisable for the user to place the whole sequence of commands in a do-file, and to execute this do-file, rather than to type the sequence of commands one by one at the terminal.
Examples
. ingap, g(toprow)
. ingap 1 53, g(toprow) row(make) grow("US cars" "Non-US cars")
. by foreign: ingap, g(gind) row(make) grow("Car model")
. sort foreign rep78 make . by foreign rep78: ingap . by foreign: ingap -1, after . by foreign: ingap, row(make) grow("Car model") . list
The following example works in the auto data if the user has installed the listtex package, downloadable from SSC. It outputs to the Results window a generic ampersand-delimited text table, which can be cut and pasted into a Microsoft Word document, and then converted to the rows of a table inside Microsoft Word, using the menu sequence Table->Convert->Text to Table. (Note that the listtex command can alternatively create table rows suitable for input into a TeX, LaTeX or HTML file.)
. preserve . by foreign: ingap, row(make) grexp(cond(foreign,"Non-US cars","US cars")) . listtex make mpg weight, type . restore
The following example works in the auto data if the user has installed the listtex package, and also the sdecode package, both of which can be downloaded from SSC.) It outputs to the Results window a generic ampersand-delimited text table, which can be cut and pasted into a Microsoft Word document (as in the previous example), and then converted into two tables, one for American cars and one for non-American cars, each with a title line containing the variable labels in the auto data. Note that, to do this, the user must convert the numeric variables to string variables, and this is done using sdecode.
. preserve . sdecode mpg, replace . sdecode weight, replace . sdecode price, replace . by foreign: ingap, rstring(labname) . listtex make mpg weight price, type . restore
The following example works in the auto data if the user has installed the sdecode and sencode packages, downloadable from SSC. It produces a graph of mileage by car type (US or non-US) and repair record.
. preserve . sdecode rep78, gene(row) miss . by foreign: ingap, row(row) grexp(cond(foreign,"Others:","US cars:")) gap(gapind) . sencode row, replace many gs(foreign -gapind rep78) . lab var row "Repair record" . version 7: graph row mpg, yreverse ylab(1(1)12) yscale(0 13) xlab(0(10)50) . restore
Other examples of the use of ingap, together with other packages, can be found in Newson (2003).
Author
Roger Newson, King's College, London, UK. Email: roger.newson@kcl.ac.uk
References
Newson, R. 2003. Confidence intervals and p-values for delivery to the end user. The Stata Journal 3(3): 245-269. Also downloadable from Roger Newson's website at http://www.kcl-phs.org.uk/rogernewson.
Acknowledgement
I would like to thank Nicholas J. Cox, of the University of Durham, U.K., for writing the hplot package, downloadable from SSC. This package gave me a lot of the ideas used in ingap, and was also my preferred package for producing confidence interval plots under Stata Versions 6 and 7, before I had access to the improved graphics of Stata Version 8.
Also see
Manual: [U] 14.1.2 by varlist:, [U] 14.5 by varlist: construct, [U] 31.2 The by construct, [R] by, [R] expand On-line: help for by, byprog, ssc help for listtex, sencode, sdecode, hplot, eclplot if installed