------------------------------------------------------------------------------- help for descsave (Roger Newson) -------------------------------------------------------------------------------

Save descriptive attributes of variables to a do-file and/or a Stata dataset

descsave [varlist] [using filename] [ , dofile(dofilename [, replace]) list( [varlist] [if] [in] [ , [list_options] ] ) saving(datafilename [, replace]) norestore fast flist(global_macro_name) charlist(characteristic_list) idnum(#) idstr(string) rename(oldvarname_1 newvarname_1 ... oldvarname_n newvarname_n) gsort(gsort_list) keep(varlist) ]

where list_options is a list of options accepted by the list command, characteristic_list is a list of characteristic_names and/or asterisks (*) separated by spaces, and gsort_list is a list of one or more elements of the form

[+|-]varname

as used by the gsort command.

Description

descsave is an extended version of describe, which lists descriptive attributes for a list of variables in a dataset given by varlist, or for all variables in the dataset if varlist is not specified. The dataset is the current dataset in memory, unless using is used to specify a dataset in a file. The descriptive attributes are variable names, storage types, display formats, value labels and variable labels (as output by describe), and also (optionally) a list of characteristics specified by the charlist() option. descsave creates an output Stata dataset (or resultsset) with one observation per variable and data on these descriptive attributes. This dataset may be listed using the list() option and/or saved to a file using the saving() option and/or written to the memory using the norestore or fast option, overwriting any existing dataset. The file specified by dofile() is a do-file, containing commands which can be run to reconstruct the descriptive attributes of the variables, assuming that variables of the same names have been created and are numeric or character as appropriate. descsave can be used together with outsheet to create a definitive generic spreadsheet version of the current dataset, together with a Stata do-file to reconstruct the descriptive attributes of the variables after the spreadsheet has been input using insheet.

Options

These options fall into the following 2 groups:

Option group Description -------------------------------------------------------------------------- outdest_opts Output-destination options for the do-file and/or resultsset conspec_opts Content-specifying options for the resultsset --------------------------------------------------------------------------

Output-destination options

dofile(dofilename [, replace]) specifies an output Stata do-file, with commands to reconstruct the variable descriptive attributes (storage types, display formats, value labels, variable labels and selected characteristics), assuming that variables with those names already exist and are numeric or string-valued as appropriate. If replace is specified, then any existing file of the same name is overwritten.

list(varlist [if exp] [in range] [, list_options ] ) specifies a list of variables in the output dataset, which will be listed to the Stata log by descsave. The user may optionally also specify if or in clauses to list subsets of variables, or change the display style using a list of list_options allowed as options by the list command. If the rename() option is specified (see below), then any variable names specified by the list() option must be the new names. If the list() option is absent, then nothing is listed.

saving(datafilename [, replace]) specifies an output file containing a Stata dataset, with one observation per variable, and data on the descriptive attributes of the variable. If replace is specified, then any existing file of the same name is overwritten.

norestore specifies that the output dataset will be written to the memory, overwriting any pre-existing dataset. This option is automatically set if fast is specified. Otherwise, if norestore is not specified, then the pre-existing dataset is restored in the memory after the execution of descsave.

fast is a stronger version of norestore, intended for use by programmers. It specifies that the pre-existing dataset in the memory will not be restored, even if the user presses Break during the execution of descsave. If norestore is specified and fast is absent, then descsave will go to extra work so that it can restore the original data if the user presses Break.

Note that the user must specify at least one of the 5 options dofile(), list(), saving(), norestore and fast.

flist(global_macro_name) specifies the name of a global macro, containing a filename list (possibly empty). If saving() is also specified, then descsave will append the filename specified in the saving() option to the value of the global macro specified in flist(). This enables the user to build a list of filenames in a global macro, containing the output of a sequence of estimation result sets saved by descsave. These files may later be concatenated using append, or using dsconcat if installed.

Content-specifying ooptions

charlist(characteristic_list) specifies a list of characteristic names and/or asterisks (*), separated by spaces. The characteristics specified will be reconstructed by the do-file specified by dofile() (if specified), and be written to variables in the output dataset. If a characteristic has length greater than the maximum length for a string variable, which is 244 in Version 9 of Stata (see help for data_types), then it will be truncated to that maximum length in the output dataset and/or do-file. (This is not expected to cause problems very often.) descsave expands the characteristic_list by replacing each asterisk * with a list of the names of all characteristics of all variables in the varlist, and then contracts the characteristic_list by removing the rightmost occurrences of all duplicate characteristic names. Therefore, charlist(*) specifies a list of all characteristics belonging to all variables in the varlist, and charlist(omit missing *) specifies a list of the same characteristics, with omit appearing first and missing appearing second. In the second case, the output variable char1 will contain the omit characteristics, and the output variable char2 will contain the missing characteristic. (See Output dataset created by descsave below for details on output variables.)

idnum(#) specifies an ID number for the output dataset. It is used to create a numeric variable idnum in the output dataset, with that value for all observations. This is useful if the output dataset is concatenated with other descsave output datasets using append, or using dsconcat if installed.

idstr(#) specifies an ID string for the output dataset. It is used to create a string variable idstr in the output dataset, with that value for all observations. (An output dataset may have idnum, idstr, both or neither.)

rename(oldvarname_1 newvarname_1 ... oldvarname_n newvarname_n) specifies a list of pairs of variable names. The first variable name of each pair specifies a variable in the output dataset, which is renamed to the second variable name of the pair. (See Output dataset created by descsave below for details on output variables.)

gsort(gsort_list) specifies a generalized sorting order (as recognised by gsort) for the observations in the output dataset. If gsort() is not specified, then the output dataset will be sorted by the single variable order. If rename() is specified, then gsort() must use the new names.

keep(varlist) specifies a list of variables to be kept in the output dataset. If keep() is not specified, then the output dataset contains all the variables listed in the next section. If rename() is specified, then keep() must use the new names.

Output dataset created by descsave

The output dataset (or resultsset) created by descsave has one observation per variable in the varlist. If the rename() option is not specified, then it contains the following variables:

Default name Description -------------------------------------------------------------------------- idnum Numeric dataset ID idstr String dataset ID order Variable order name Variable name type Storage type format Display format vallab Value label varlab Variable label charn char[characteristic_name] --------------------------------------------------------------------------

The variable order contains the sequential order of the variable in the input varlist specified for descsave, or the order of that variable in the dataset, if the user does not specify an input varlist. The variables idnum or idstr are only present if requested in the options of the same names. There is one charn variable for each characteristic_name in the list specified by the charlist() option. The variable charn specifies the nth characteristic specified in the charlist() option (truncated if necessary to the maximum length for a string variable under the current version of Stata). All of these variables can be renamed using the rename() option, or used by the gsort() option to specify the sorting order. If the keep() option is used, then the output dataset will contain only the specified subset of these variables.

Remarks

descsave can be used together with outsheet and insheet to construct a definitive generic spreadsheet version of the data. This is useful if the user needs either to convert the data to distant past versions of Stata not produced by saveold, or to return to the data decades into the future, when all proprietary software has evolved beyond recognition. The do-file specified by dofile() can be used to reconstruct variable attributes after inputting the definitive version of the data using insheet, assuming that the variables are still numeric or string-valued, as specified in the original Stata data. (The user may need to use destring after using insheet, if some of the numeric variables in the definitive generic spreadsheet are formatted in nonstandard ways.) The output do-file can also be translated manually into other software languages if the user wants to use the data under other software platforms. However, descsave can also be used with the parmest and factext packages (see help for parmby, parmest or factext if installed). Typically, the user uses descsave to save to a do-file the attributes of variables representing categorical factors, generates dummy variables for these categorical factors using tabulate or xi, enters these dummy variables into a regression analysis, saves the results of the regression to a dataset using parmby or parmest, and then reconstructs the categorical factors from the variable label in the parmest output dataset using the factext package.

Examples

. descsave, list(,)

. descsave make mpg weight, list(name varlab vallab, clean noobs)

. descsave, list(, subvar noobs sepa(0) abbrev(32)) char(omit)

. descsave, do(auto.do, replace)

. descsave, saving(autodesc.dta, replace)

. descsave, list(, noobs abb(32)) do(auto.do, replace) saving(autodesc.dta, replace) rename(varlab variable_label format variable_format)

. descsave using auto2, list(,)

. descsave model mpg price using auto2, list(,) saving(auto2desc, replace)

. descsave, norestore

The following example will work in the auto data. The first part creates a generic text spreadsheet in auto.txt, with a program to reconstruct the variable attributes in auto.do. The second part reconstructs the auto data from auto.txt, using auto.do.

. descsave, do(auto.do, replace) sa(autodesc.dta, replace) charlist(omit *) . outsheet using auto.txt, nolabel replace

. insheet using auto.txt, clear . run auto.do . describe

The following example will work in the auto data if the packages parmest, factext and eclplot are installed. All of these packages can be downloaded from SSC.

. tab foreign, gene(type_) nolabel . qui descsave foreign, do(foreign.do, replace) . parmby "regress mpg type_*, noconst robust", label norestore . factext foreign, do(foreign.do) . eclplot estimate min95 max95 foreign, xscal(range(-1 2)) xlab(0 1)

The following advanced example will work under Stata 8 or above in the auto data if the dsconcat and xcollapse packages are installed. Both packages can be downloaded from SSC. The example creates a dataset with 1 observation for each of a list of variables and data on their names and median values, using xcollapse and dsconcat, and then uses merge to merge in a dataset created by descsave, with 1 observation per variable and data on the variable names, variable labels and display formats.

. tempfile tf0 . descsave price mpg headroom trunk weight length turn displacement gear_ratio, saving(`tf0', replace) gsort(name) keep(order name varlab format) . global tflist "" . local i1=0 . foreach X of var price mpg headroom trunk weight length turn displacement gear_ratio { . local i1=`i1'+1 . tempfile tf`i1' . xcollapse (median) med=`X', idstr("`X'") nidstr(name) saving(`tf`i1'', replace) flist(tflist) . } . dsconcat $tflist . sort name . lab var med "Median value" . merge name using `tf0' . sort order . list order name varlab med

Author

Roger Newson, Imperial College London, UK. Email: r.newson@imperial.ac.uk

Also see

Manual: [D] describe, [D] destring, [D] gsort, [D] insheet, [D] label, [D] outsheet [R] tabulate, [R] xi [U] 12.8 Characteristics [P] char On-line: help for append, char, describe, destring, gsort, insheet, label, outsheet, saveold, tabulate, xi help for dsconcat, eclplot, factext, parmby, parmest, xcollapse if installed