{smcl} {.-} help for {cmd:ingap} {right:(Roger Newson)} {.-} {title:Insert gap observations in a dataset} {p 8 21 2} {cmd:ingap} [ {help numlist:{it:numlist}} ] {ifin} [ , {cmdab:af:ter} {break} {cmdab:g:apindicator}{cmd:(}{newvar}[,{cmd:replace}]{cmd:)} {cmdab:newo:rder}{cmd:(}{newvar}[,{cmd:replace}]{cmd:)} {break} {cmdab:ro:wlabel}{cmd:(}{varname}{cmd:)} {cmdab:gr:owlabels}{cmd:(}{it:string_list}{cmd:)} {cmdab:gre:xpression}{cmd:(}{it:gap_row_label_expression}{cmd:)} {cmdab:rs:tring}{cmd:(}{it:string_replacement_option}{cmd:)} {cmd:fast} ] {p} where {help numlist:{it:numlist}} is an optional list of integers, {it:string_list} is a list of strings, {it:gap_row_label_expression} is a string-valued expression, and {it:string_replacement_option} can be {p} {cmd:order} | {cmd:name} | {cmd:type} | {cmd:format} | {cmd:varlab} | {cmd:char} {help char:{it:characteristic_name}} | {cmd:label} | {cmd:labname} {p} and {help char:{it:characteristic_name}} is the name of a {help char:variable characteristic}. {p} The {helpb by} prefix can be used with {cmd:ingap}; see help for {help prefix}. {title:Description} {p} {cmd:ingap} inserts gap observations into a list of positions in an existing dataset. All existing variables in the dataset (apart from {help by:by-variables}) will have missing values in the gap observations, unless the user specifies otherwise. Often, the user specifies non-missing values in the gap observations for one particular existing string variable, known as the row label variable. This row label variable may then be output with a list of other variables to form a publication-ready table, using the {helpb listtab} package (or possibly the {helpb listtex} package). Alternatively, the row label variable may be encoded, using the {helpb sencode} package, to form a numeric variable with {help label:value labels}, which can then be plotted on one axis of a {help graph:graph} to define axis labels. The {helpb sencode}, {helpb listtab} and {helpb listtex} packages are downloadable from {help ssc:SSC}. {p} {cmd:ingap} inserts a gap observation next to (before or after) each of a list of observations specified by the {help numlist:{it:numlist}}. A positive number {hi:i} in the {help numlist:{it:numlist}} specifies the {hi:i}th existing observation in the dataset, or in each by-group if the {helpb by} prefix is specified. A negative number {hi:-i} in the {help numlist:{it:numlist}} specifies the {hi:i}th existing observation, in reverse order, from the end of the dataset, or from the end of each by-group if the {helpb by} prefix is specified. A zero or out-of-range number in the {help numlist:{it:numlist}} is ignored. The {help numlist:{it:numlist}} is set to 1 if not specified by the user. {cmd:ingap} assumes that the dataset in memory has up to 3 classes of variables. These are the by-variables (which define by-groups possibly representing the pages of a table), a row label variable (possibly containing the row labels in the left margin of the table), and the remaining variables (which may form the entries in the table rows). A gap observation inserted by {cmd:ingap} has the same values for the by-variables as the observation next to which it was inserted, a row label value specified by the {cmd:growlabels()} or {cmd:grexpression()} options, and missing values in the remaining variables (unless the user specifies otherwise). {cmd:ingap} may also generate new variables, indicating whether the observation is a gap observation and/or the new order of the observation in the dataset (or by-group) after the gap observations have been inserted. {title:Options} {phang} {cmd:after} specifies that each gap observation will be inserted after the corresponding existing observation in the dataset or by-group specified in the {help numlist:{it:numlist}}. If {cmd:after} is not specified, then each gap observation will be inserted before the corresponding existing observation. {phang} {cmd:gapindicator(}{it:newvarname}[,{cmd:replace}]{cmd:)} specifies the name of a new variable to be generated, equal to 1 for the newly-inserted gap observations and 0 for all other observations. The {cmd:replace} suboption specifies that any existing variable with the same name will be replaced. {phang} {cmd:neworder(}{it:newvarname}[,{cmd:replace}]{cmd:)} specifies the name of a new variable to be generated, equal to the new sequential order of the observation within the dataset (or within the by-group if {help by:the by prefix} is specified), after the gap observations have been inserted. The new variable has no missing values. After execution of {cmd:ingap}, the dataset in memory is sorted primarily by the by-variables (if specified), and secondarily by the {cmd:neworder()} variable (if specified). The {cmd:replace} suboption specifies that any existing variable with the same name will be replaced. {phang} {cmd:rowlabel(}{it:string_varname}{cmd:)} specifies the name of an existing string variable, used as the row labels for a table whose rows are the observations. In the gap observations, this string variable is set to the value specified by the corresponding string listed in the {cmd:growlabels()} option if that option is specified (see below), or to a missing value otherwise. The {cmd:rowlabel()} variable may not be a by-variable. {pstd} Note that the {cmd:neworder()}, {cmd:gapindicator()} and {cmd:rowlabel()} options may not specify the same variable names, and may not specify the names of {help by:by-variables}. Also, note that the {cmd:neworder()} and {cmd:gapindicator()} variables are always non-missing, even in observations not included in the sample defined by the {helpb if} and {helpb in} qualifiers. These qualifiers only specify that an observation may have observations inserted before it (or after it, if {cmd:after} is specified), if its sequential order in the dataset or by-group is included in the {help numlist:{it:numlist}}. {phang} {cmd:growlabels(}{it:string_list}{cmd:)} specifies a string value for each of the row labels in the gap observations. The {hi:j}th string in the {it:string_list} is written to the {cmd:rowlabel} variable in the newly-inserted gap observation inserted next to the {hi:j}th observation mentioned in the {help numlist:{it:numlist}}. If the {cmd:rowlabel} option is present and the {cmd:growlabel()} option is absent, then the {cmd:rowlabel()} variable is initialised to missing in the gap observations. {phang} {cmd:grexpression(}{it:gap_row_label_expression}{cmd:)} specifies a string expression, to be evaluated in all gap observations to give the final values of the {cmd:rowlabel()} variables in these gap observations. If {cmd:grexpression()} and {cmd:growlabels()} are both specified, then the result of {cmd:grexpression()} replaces any values set by {cmd:growlabels()}. (However, the name of the {cmd:rowlabels()} variable may appear in the {cmd:grexpression()} expression, so that the values of the {cmd:rowlabels()} variable can be modified in ways depending on the original values set by the {cmd:growlabels()} list.) Note that, when the {cmd:grexpression()} expression is evaluated, all variables other than the {cmd:rowlabels()} variable have been set to their final values, which are missing for all variables except the by-variables and the {cmd:rowlabel()} variable, except if they have been set to other values by the {cmd:rstring()} option (see below). However, the {cmd:grexpression()} expression may access values of variables in adjacent observations using {help subscripting}. If by-variables are present, then any subscripts in the expression specified by {cmd:grexpression()} are defined within by-groups, and are defined including the gap observations. For instance, if a gap observation is inserted at the beginning of each by-group, then the value of {hi:_n} in these gap observations will be 1. {phang} {cmd:rstring(}{it:string_replacement_option}{cmd:)} specifies a rule for replacing the values of string variables (other than the by-variables and row label variables) in gap observations. If {cmd:rstring()} is not set, then these variables will be set to a missing value (an empty string) in the gap observations. {cmd:rstring()} can be set to {cmd:order}, {cmd:name}, {cmd:type}, {cmd:format}, {cmd:varlab}, {cmd:char} {help char:{it:characteristic_name}}, {cmd:label}, or {cmd:labname}. The options {cmd:order}, {cmd:name}, {cmd:type}, {cmd:format}, {cmd:varlab} and {cmd:char} {help char:{it:characteristic_name}} imply that the value of each string variable, in the gap observations, will be set to the order of the variable in the existing dataset, the {help type:storage type} of the variable, the {help format:display format} of the variable, the {help label:variable label} of the variable, or the {help char:ccharacteristic} of the variable with the name {help char:{it:characteristic_name}}, respectively. The option {cmd:label} is a synonym for {cmd:varlab}. The option {cmd:labname} specifies that the value of each string variable, in the gap observations, will be set to its {help label:variable label}, if that label exists, and to its name otherwise. (Note that numeric variables that are not by-variables, {cmd:gapindicator()} variables or {cmd:neworder()} variables are always set to the numeric missing value {cmd:.} in gap observations.) The {cmd:rstring()} option allows the user to add a row of column headings to a dataset of string variables, or to add a row of column headings to each by-group of a dataset of string variables. Note also that numeric variables may be converted to string variables using the {helpb sdecode} package, downloadable from {help ssc:SSC}, before using {cmd:ingap} and {helpb listtab}. This allows the user to use the {cmd:rstring()} option, and also to format numeric variables in ways not possible using Stata formats alone, such as adding parentheses to confidence limits. {phang} {cmd:fast} is an option for programmers. It specifies that {cmd:ingap} will do no work to ensure that the original dataset is preserved in the event that {cmd:ingap} fails, or if the user presses {help break:the Break key}. If {cmd:fast} is not specified, and {cmd:ingap} fails, or the user presses {help break:the Break key}, then the original existing dataset is preserved, with no additional gap observations. {title:Remarks} {pstd} {cmd:ingap} is typically used to convert a Stata dataset to a form with 1 observation per table row (including gap rows), or 1 observation per graph axis label (including gap axis labels). The user can then list the dataset as a TeX, LaTeX, HTML or Microsoft Word table, using the {helpb listtab} package (downloadable from {help ssc:SSC}). Alternatively, for immediate impact, the user can use the {helpb sencode} package (downloadable from {help ssc:SSC}) to encode the row labels to a numeric variable, and then plot this numeric variable against other variables using {help graph:Stata graphics programs}. For instance, a user of Stata 8 or above might use {helpb eclplot} (downloadable from SSC) to produce horizontal confidence interval plots, with the row labels on the vertical axis. It is often advisable for the user to type {helpb preserve} before a sequence of commands including {cmd:ingap}, and to type {helpb restore} after a sequence of commands using {cmd:ingap}, because {cmd:ingap} modifies the dataset by adding new observations. It is often also advisable for the user to place the whole sequence of commands in a {help do:do-file}, and to execute this {help do:do-file}, rather than to type the sequence of commands one by one at the terminal. {pstd} The {helpb listtab} package is described in {help ingap##references:Newson (2012)}. It inputs a list of variables in a Stata dataset, and outputs a text table, in a file or on the screen, containing these variables as columns and the observations as rows, and formatted using a row style. This row style may correspond to table rows in plain TeX, LaTeX, HTML, XML, or RTF tables, or to rows of tab-delimited, column-delimited or ampersand-delimited generic text spreadsheets, or to rows in other styles that may be invented in future. The row style is defined using a row-beginning string, a row-end string, and a between-column delimiter string. {helpb listtab} is a successor to the {helpb listtex} package, described in {help ingap##references:Newson (2006), Newson (2004) and Newson (2003)}. The main change introduced in {helpb listtab} is that empty delimiter strings are now allowed. Users of {help version:Stata versions} 10 and above are advised to use {helpb listtab} in preference to {helpb listtex}, although both packages are still downloadable from SSC. {title:Examples} {p 8 16}{cmd:. ingap, g(toprow)}{p_end} {p 8 16}{cmd:. ingap 1 53, g(toprow) row(make) grow("US cars" "Non-US cars")}{p_end} {p 8 16}{cmd:. by foreign: ingap, g(gind) row(make) grow("Car model")}{p_end} {p 8 16}{cmd:. sort foreign rep78 make}{p_end} {p 8 16}{cmd:. by foreign rep78: ingap}{p_end} {p 8 16}{cmd:. by foreign: ingap -1, after}{p_end} {p 8 16}{cmd:. by foreign: ingap, row(make) grow("Car model")}{p_end} {p 8 16}{cmd:. list}{p_end} {p} The following example works in the {hi:auto} data if the user has installed the {helpb listtex} package, downloadable from {help ssc:SSC}. It outputs to the Results window a generic ampersand-delimited text table, which can be cut and pasted into a Microsoft Word document, and then converted to the rows of a table inside Microsoft Word, using the menu sequence {cmd:Table->Convert->Text to Table}. (Note that the {helpb listtex} command can alternatively create table rows suitable for input into a TeX, LaTeX or HTML file.) {p 8 16}{cmd:. preserve}{p_end} {p 8 16}{cmd:. by foreign: ingap, row(make) grexp(cond(foreign,"Non-US cars","US cars"))}{p_end} {p 8 16}{cmd:. listtab make mpg weight, delim(&) type}{p_end} {p 8 16}{cmd:. restore}{p_end} {p} The following example works in the {hi:auto} data if the user has installed the {helpb listtab} package, and also the {helpb sdecode} package, both of which can be downloaded from {help ssc:SSC}.) It outputs to the Results window a generic ampersand-delimited text table, which can be cut and pasted into a Microsoft Word document (as in the previous example), and then converted into two tables, one for American cars and one for non-American cars, each with a title line containing the variable labels in the {hi:auto} data. Note that, to do this, the user must convert the numeric variables to string variables, and this is done using {helpb sdecode}. {p 8 16}{cmd:. preserve}{p_end} {p 8 16}{cmd:. sdecode mpg, replace}{p_end} {p 8 16}{cmd:. sdecode weight, replace}{p_end} {p 8 16}{cmd:. sdecode price, replace}{p_end} {p 8 16}{cmd:. by foreign: ingap, rstring(labname)}{p_end} {p 8 16}{cmd:. listtab make mpg weight price, delim(&) type}{p_end} {p 8 16}{cmd:. restore}{p_end} {p} The following example works in the {hi:auto} data if the user has installed the {helpb sdecode} and {helpb sencode} packages, downloadable from {help ssc:SSC}. It produces a graph of mileage by car type (US or non-US) and repair record. {p 8 16}{cmd:. preserve}{p_end} {p 8 16}{cmd:. sdecode rep78, gene(row) miss}{p_end} {p 8 16}{cmd:. by foreign: ingap, row(row) grexp(cond(foreign,"Others:","US cars:")) gap(gapind)}{p_end} {p 8 16}{cmd:. sencode row, replace many gs(foreign -gapind rep78)}{p_end} {p 8 16}{cmd:. lab var row "Repair record"}{p_end} {p 8 16}{cmd:. scatter row mpg, yscale(reverse range(0 13)) ylab(1(1)12, valuelabel angle(0)) xlab(0(10)50)}{p_end} {p 8 16}{cmd:. restore}{p_end} {p} Other examples of the use of {cmd:ingap}, together with other packages, can be found in {help ingap##references:Newson (2012), Newson (2006), Newson (2004) and Newson (2003)}. {title:Author} {p} Roger Newson, National Heart and Lung Institute, Imperial College London, UK. Email: {browse "mailto:r.newson@imperial.ac.uk":r.newson@imperial.ac.uk} {marker references}{title:References} {phang} Newson, R. B. 2012. From resultssets to resultstables in Stata. {it:The Stata Journal} 12(2): 191-213. Download from {browse "http://www.stata-journal.com/article.html?article=st0254":{it:The Stata Journal} website}. {phang} Newson, R. 2006. Resultssets, resultsspreadsheets and resultsplots in Stata. Presented at the {browse "http://ideas.repec.org/s/boc/dsug06.html" :4th German Stata User Meeting, Mannheim, 31 March, 2006}. {phang} Newson, R. 2004. From datasets to resultssets in Stata. Presented at the {browse "http://ideas.repec.org/s/boc/usug04.html" :10th United Kingdom Stata Users' Group Meeting, London, 29 June, 2004}. {phang} Newson, R. 2003. Confidence intervals and {it:p}-values for delivery to the end user. {it:The Stata Journal} 3(3): 245-269. Download from {browse "http://www.stata-journal.com/article.html?article=st0043" :{it:The Stata Journal} website} {title:Acknowledgement} {p} I would like to thank Nicholas J. Cox, of the University of Durham, U.K., for writing the {helpb hplot} package, downloadable from {help ssc:SSC}. This package gave me a lot of the ideas used in {cmd:ingap}, and was also my preferred package for producing confidence interval plots under Stata Versions 6 and 7, before I had access to the improved graphics of Stata Version 8. {title:Also see} {p 0 21} {bind: }Manual: {hi:[U] 11 Language syntax}, {hi:[D] by}, {hi:[D] expand}, {hi:[P] byable}, {hi:[R] ssc} {p_end} {p 0 21} On-line: help for {helpb by}, {helpb byprog}, {helpb expand}, {helpb ssc} {p_end} {p 10 21} help for {helpb listtab}, {helpb listtex}, {helpb sencode}, {helpb sdecode}, {helpb hplot}, {helpb eclplot} if installed {p_end}