------------------------------------------------------------------------------- help for gzsave, gzsaveold, gzuse, gzappend, gzmerge (Henrik Stovring) -------------------------------------------------------------------------------

Save, use, append, and merge datasets compressed by gzip on Windows/Unix/Linux/ > MacOSX

gzsave [filename] [, replace save_options]

gzsaveold [filename] [, replace saveold_options]

gzuse filename [, clear use_options]

gzuse [varlist] [if] [in] using filename [, clear use_options]

gzappend [varlist] using filename [filename ...] [, append_options]

gzmerge [varlist] using filename [filename ...] [, merge_options]

Description

gzsave compresses and stores the current dataset on disk under the name filename. If no filename is specified, the command tries to use the last filename under which the data were last known to Stata (c(filename)). If filename is specified without an extension, .dta.gz is used.

gzsaveold compresses and stores the current dataset on disk in Stata 7 format.

gzuse loads a Stata-format dataset previously saved by gzsave into memory. If filename is specified without an extension, .dta.gz is assumed. In the second syntax for gzuse, a subset of the data may be read.

gzappend appends a Stata-format dataset previously saved by gzsave to the current dataset in memory.

gzmerge merges one or more Stata-format dataset(s) previously saved by gzsave with the current dataset in memory.

Obviously, all these commands require the gzip command to be available at the command line. It can be downloaded for free at http://www.gzip.org, but is preinstalled on most Unix/Linux/MacOSX systems. On Unix/Linux/MacOSX you can check if gzip is available with

. shell which gzip

which should return something like '/usr/bin/gzip'. On Windows the gzip.exe must similarly be found in the path, cf. the documentation on your Windows version. The easiest way to check if your path is set up correctly on Windows is to try out gzsave and gzuse on a test dataset (ie. an artificial dataset you can afford to lose!). If it does not work, the most likely explanation is that you either have not installed gzip or your path is not configured correctly.

In principle, the command should work on any system with gzip installed and ordinary shell commands available. The command has however only been tested on the platforms mentioned above, so as always use at your own risk!

Options

replace permits gzsave to overwrite an existing dataset. replace may not be abbreviated.

clear permits the data to be loaded even if there is a dataset already in memory and even if that dataset has changed since it was last saved.

save_options are all options available with save.

saveold_options are all options available with saveold.

use_options are all options available with use.

append_options are all options available with append.

merge_options are all options available with merge.

Remarks

These commands are useful for two purposes:

First, they obviously help lowering the space used on disk by a dataset, which may be important when storing very large datasets.

Second, they may help reduce network load and hence the time used for storing and opening datasets when using a distributed disk system such as NFS. This is due to the fact that the commands only transfer the compressed datasets over the network, since the uncompressed dataset is only stored as a temporay datafile, which typically resides on the local disk (where local is relative to the running instance of Stata).

The price paid for saving disk space (and network load) is the CPU time used by gzip - please, test for yourself whether compression is actually advantageous in your specific set-up.

Note, that the gzsave and gzuse commands sets the filename associated with the dataset, so that for example describe correctly reports the originating filename of the compressed dataset. An unwanted side-effect of this occurs when issuing a save, replace on a dataset opened with gzuse, as this will save an uncompressed dataset replacing the original compressed dataset. This is, however, considered a feature as it ensures that commands such as describe points exactly to the relevant file, and it remains the users responsibility to be aware which filename is actually used in the command save, replace.

If you prefer to use zip for compression, have a look at zipsave.

Examples

. gzsave myfile . gzsave myfile, replace

. gzuse myfile . gzuse myfile, clear

. gzappend using myfile2

. gzmerge id using myfile3, unique

. shell gzip -l myfile.dta.gz shows compression ratio

Author

Henrik Stovring (stovring@biostat.au.dk), Biostatistics, Department of Public Health, Aarhus University, Denmark.

Acknowledgements

Jun Xu, Department of Sociology, Indiana University, USA, for useful suggestions and help with testing the command for Windows XP.

Morten Andersen, Research Unit of General Practice, University of Southern Denmark, Denmark, gave suggestions on filename handling, tested on MacOSX, and requested saveold.

Christopher Paul Barrington-Leigh, University of British Columbia, Canada, requested handling of filenames with blanks and prompted me to add gzappend and gzmerge to the list.

Also see

Manual: [R] save

Online: help for save, use, compress, append, merge, zipsave (if installed, otherwise describe zipsave)