.TH savas Local 09Dec03 .SH NAME savas\-\- Based on the file extensions savas... makes SAS (version 6.09 or later) data file copies of Stata data files. or: makes Stata (version 7 or later) data file copies of SAS (version 6.09 or later) data files. .SH SYNOPSIS .B savas [-options] StataDataSetName ... .SH EXAMPLES savas mystata.dta .TP savas mysas.sas7bdat .TP savas -fmts mystata.dta .TP savas -r mystata.dta .TP savas -r ../group/mysas.sas7bdat .TP savas -c ../group/mystata.dta .TP savas -i -o mysas6.ssd01 .TP savas analysis.dta analysis2.sas7bdat child_data.dta .TP savas -fmts mysas.sas7bdat .TP savas mysas6.ssd01 .TP savas -x analysis.xpt analysis2.exp child_data.Apr02.stx .SH DESCRIPTION .I savas copies one or more SAS/Stata datasets as a Stata/SAS file. The output dataset will have the same name, but with the appropriate filename extension: SAS Version 7, 8 or 9: .sas7bdat SAS Version 6: .ssd01 SAS 6 Transport/Xport: .xpt, .xport, .exp, .export, .sasx, .stx, .v5x, .v6x, .trans, or .expt file extensions plus whatever file extension the file might have. SAS transport files created by PROC CPORT: .cport and .ssp file extensions plus whatever file extension the file might have. SPSS portable files: .por file extension (saved only to Stata). Stata: .dta. By default the Stata/SAS file is created in the same directory as the SAS/Stata data file, but with the appropriate filename extension and contains all observations and every variable in the SAS/Stata data file. Savas requires the use of both Stata and SAS on the same machine. .PP Savas cannot process files that have filenames or are in directories that contain single or double quotes. .PP The procedure is as follows: (1) .PP (1) .I savas creates a Stata/SAS program that loads the Stata/SAS dataset into Stata/SAS and calls the .I savas Stata/SAS program. .PP (2) .I savas uses either Stata's command fdasave to save the dataset in memory temporarily as a SAS xport data file or has SAS write the data to ascii. .PP (3) .I savas writes a Stata/SAS input program to load the dataset into Stata/SAS and to assign variable names, labels (and formats). .PP (4) .I savas runs the program in Stata/SAS in batch mode to load the data. .PP (5) Stata/SAS saves the data as whatever version Stata/SAS file type specified. .PP Note: If saving to SAS version 6 or version 6 transport/xport or Stata 6 , .I savas checks for variable names that are longer than 8 characters; and, if the `\-rename' option is issued, renames them to the first 8 characters or up to 7 plus a number. In addition, savas will display this list of renamed variables. .PP If the SAS/Stata dataset is sorted by one or more variables, the Stata/SAS dataset will also be sorted by those same variables. The maximum length for a string variable to be passed from Stata to SAS is 200 characters. In such cases, the first 200 characters will be taken and passed on to SAS (this is a limitation of the SAS xport dataset used to transfer data from Stata to SAS). Depending on what version of Stata you are using, there are limitations to the length of character variables that can be passed on to Stata. Check out Stata's help page called limits. Stata variables labels can be up to 80 characters in length. .PP .SH OPTIONS .TP .B \-c/\-curdir .I savas saves the Stata/SAS dataset to the current working directory, even though the Stata/SAS dataset may be located elsewhere. .TP .B \-r/\-replace By default, .I savas warns the user if the output dataset already exists, and asks permission to overwrite it. Option `\-replace' suppresses this interactive behavior and replaces any existing output dataset without warning. If more than one dataset is submitted to .I savas , then this option will only work for the first dataset. Check out the `\-force' option. .TP .B \-force is equivalent to using both \-rename and \-replace and will maintain these options if more than one dataset is submitted to savas. .TP .B \-sas6 indicates to save the Stata file as a SAS version 6 file. .TP .B \-sasx indicates to save the Stata file as a SAS version 6 transport/xport file using the xport engine. .TP .B -o/\-old indicates to save the Stata file as the previous version of Stata to the current version, e.g., version 7. .TP .B -i/\-intercooled indicates to save the Stata file as Intercooled. This is only necessary if Stata SE is being used. .TP .B \-char2lab indicates to use the SAS macro char2fmt to convert long character variables to numeric with Stata value labels. This is like Stata's -encode- command. This option is only helpful when saving to a Stata 9 or higher dataset since Stata 9 added the feature of allowing value labels to be up to 32,000 characters long. .TP .B -fmts/\-formats specifies to either save value labels that exist in the Stata dataset as SAS formats in a file that will have the same name as the data file but with the ".sas7bcat" file extension or to use such a file if creating a Stata dataset. This formats catalog file will be created or needs to be in the same directory as the SAS data file. By default value labels are not saved nor created. NOTE: SAS formats have to be 8 characters or less and cannot end in a number. Savas makes some attempt to rename invalid SAS formats, but it would be best for you to rename or drop them in Stata before using savas. Stata does not allow string variables to have user-defined formats nor numbers with decimal values. .TP .B \-q/\-quotes indicates to replace double quotes ( " ) occurring in character variables with single quotes ( ' ) and replace compound quotes ( `" or "' ) occurring in variable labels or formats with single quotes ( ' ). Savas cannot process character variables with double quotes or variable labels or formats with compound quotes when converting a dataset from SAS to Stata. .TP .B \-x/-xport .I savas converts SAS transport files into Stata data files. Note: Multiple transport data files can be processed at a time but all data files need to be SAS transport files. There can be no intermixing of regular SAS/Stata data files and transport files when using this option. .TP .B \-f/\-float prevents the use of Stata's variable type `double'. All variables whose SAS precision would require Stata's `double' type are created as `float' (which is the default numeric storage type for Stata). This option may lead to a loss of precision, but saves space: a `float' is stored in 4 bytes, a `double' in 8 bytes. .TP .B \-check creates two check files for the user to compare the input dataset with the output dataset to make sure savas created the files correctly. This is a comparison that should be done after any data file is converted to any other type of data file by any software. The files are created in the same directory as the output data file and are named starting with the name of the data file followed by either "_SAScheck.lst" (SAS) or "_STATAcheck.log" (Stata), e.g. "mydata_SAScheck.lst" and "mydata_STATAcheck.log". .TP .B \-rights sets the file permission of the new SAS file to be whatever default file permissions would be for a new file in that directory. The default permissions are the same as the Stata data file. .TP .B \-rename specifies that any required renaming of file names or variable names is to be done. The `\-rename' option is only necessary when saving to a older version of SAS or Stata or when variable names are not unique in SAS. When saving to an older version rename attempts to rename long variable names (more than 8 characters) to be unique by shortening all long variable names to the first 8 characters or up to the 7 plus a number. .I Savas lists all variables that were renamed. If more than one dataset is submitted to .I savas , then this option will only work for the first dataset. Check out the `\-force' option. .TP .B \-b/\-beep beeps upon completion. .TP .B \-s/\-silent be silent; in this case, .I savas does not print any output to the screen, except for error messages. By default, .I savas tells what stage of the conversion process is currently being executed, and it reports number of variables, number of observations, and more. .TP .B \-ascii/\-sascode specifies that only a data file and an input program are to be created. By default, .I savas executes all four steps outlined above. The `\-ascii/\-sascode' option aborts this process after step (3). The user then needs to read in the data manually using Stata/SAS. .I Savas writes a SAS program ( mydata_infile.sas) to read in the xport data file (mydata.xpt ). .TP .B \-m/\-messy .I savas specifies that all the intermediary files created by .I savas during its operation are not to be deleted. The `\-messy' option prevents .I savas from cleaning up after it has finished. This option is mostly useful for debugging purposes in order to find out where something went wrong. All intermediary files have a name starting with an underscore (_) followed by the process ID and are located in the temp directory. .TP .B \-obs=n converts only the first .I n observations. By default, .I savas converts all observations of the Stata/SAS dataset. .TP .B \-varfile=filename may be used to select only a subset of variables to be included in the Stata/SAS dataset. This will speed up the conversion process and is useful in situations where the number of variables is too large for a non Stata SE (Special Edition) file, more than 2,047 variables. The .I filename is the name of a file whose contents are variable names only. These variable names are case-insensitive when saving to Stata. If saving to SAS, multiple variables can be listed using any of Stata's specified varlist rules. For example, var* is understood as var1, var2, ... or if saving to Stata, multiple variables with the same stem may be specified as ranges according to general SAS rules. For example, var1-var20 is understood as var1, var2, ..., var20. .TP .B \-n/\-nice runs SAS/Stata nicely. The default is 20. This should be used if you have a very large data file and there are others using the UNIX box. e.g. savas -n 10 mystata.dta .PP .SH FEATURES .I savas attempts to transfer Stata value labels to SAS formats and vice versa. Date formats are translated as closely as possible. Fixed SAS formats (Fw.d) translate into Stata's %w.df format. SAS date formats are translated as closely as possible. Unformatted variables get Stata's default formats for the appropriate data type (%8.0g for bytes and ints, %9.0g for floats, and %10.0g for doubles), except for long variables, which .I savas formats as %12.0g. .PP .I savas stamps the SAS creation date and time on the Stata data set name, so that the Stata user knows not only when the Stata data set was created, but also the original SAS creation date and time. .PP Not all SAS variable names are acceptable in Stata. .I savas attempts to prevent conflicts by using uppercase names for reserved names. These names are `_all', `_B', `_coef', `_cons', `if', `in', `byte', `int', `long', `float', `double', '_pi','_pred','_rc','_se', '_skip','using', and 'with' as well as names starting with `str' and followed by an integer. (For example, name `street' does not pose any problems, but SAS name `str10' will be translated into Stata name `STR10'.) SAS name `_n' translates into `_______N' (and a warning is issued). Savas can process multiple files at a time. Try: savas *.sas7bdat or savas *.dta. .PP Not all Stata variable names are acceptable in SAS because Stata allows variable names to be different based on upper or lower or mixed case. So the variable .I gender can be in the same dataset as .I Gender or .I GENder etc. .I savas attempts to prevent conflicts by testing for situations like the gender issue and when the .I \-ren/\-rename option is issued .I savas attempts to rename the variables to be unique by adding a number to the end of the variable name. If saving to an older version, then `\-rename' will shorten all variable names that are longer than 8 characters. .PP .PP .SH FILES .TP .B /usr/local/bin/savas .TP the (cshell) program savas uses .TP .B /usr/local/ado/s/savasas.ado .TP the Stata program savasas.ado .TP and .TP .B /usr/local/ado/s/savastata.sas the SAS macro SAVASTATA .TP .B /afs/isis/pkg/stata/.install/common/ado/updates/char2fmt.sas the SAS macro CHAR2FMT .TP In addition, numerous standard UNIX utilities are used. .TP .I savas .TP also needs 'gawk', the GNU version of 'awk'. .TP .B /bin/gawk .PP .SH AUTHOR Dan Blanchette .TP Developed at The Carolina Population Center and Research Computing, University of North Carolina Chapel Hill .TP Center of Entrepreneurship and Innovation, Duke University's Fuqua School of Business, Durham, NC USA .TP (dan.blanchette@duke.edu) .PP .SH ACKNOWLEDGEMENTS This script was inspired by the sas2stata script developed at RAND. .PP .SH VERSION The current version is 3.0 .PP .SH BUGS SAS character variables with non-Roman characters can mess up the intermediary ASCII data set when transfering data to Stata. Savas will not create or overwrite the Stata dataset if that is the case. .PP