------------------------------------------------------------------------------- help for zipsave, zipsaveold, zipuse, zipappend, zipmerge (Henrik Stovring) -------------------------------------------------------------------------------

Save and use datasets compressed by zip on WindowsXP/Unix/Linux/MacOSX

zipsave [filename] [, replace save_options]

zipsaveold [filename] [, replace saveold_options]

zipuse filename , clear dtafile(dtafilename) use_options]

zipuse [varlist] [if] [in] using filename [, clear dtafile(dtafilename) use_options]

zipappend [varlist] using filename [filename ...] [, dtafile(dtafilename) append_options]

zipmerge [varlist] using filename [filename ...] [, dtafile(dtafilename) merge_options]

Description

zipsave compress and stores the dataset currently in memory on disk under the name filename. If no filename is specified, the command tries to open the last filename under which the data were last known to Stata (c(filename)). If filename is specified without an extension, .dta.zip is used.

zipuse loads a Stata-format dataset into memory that was previously saved and zipped, either directly by zipsave or manually. If the zip-file filename is specified without an extension, .dta.zip is assumed. If the zip-file filename only contains one datafile, dtafilename need not be specified, but if the zip-file is an archive containing more files, then dtafilename must be specified to indicate the relevant file for de-compression.

zipappend appends a Stata-format dataset previously saved in zip-format (see description above for zipuse) to the current dataset in memory.

zipmerge merges one or more Stata-format dataset(s) previously saved in zip-format (see description above for zipuse) with the current dataset in memory.

Obviously, the commands require the zip command to be available at the command line. It can be downloaded for free at http://www.info-zip.org, but is preinstalled on most Unix/Linux systems. On Unix/Linux/MacOSX you can check if zip is available with

. shell which zip

which should return something like '/usr/bin/zip'. On Windows the zip.exe and unzip.exe must similarly be found in the path, cf. the documentation on your Windows version. The easiest way to check if your path is set up correctly on Windows is to try out zipsave and zipuse on a test dataset (ie. an artificial dataset you can afford to lose!). If it does not work, you either have not installed zip or your path is not configured correctly.

In principle, the command should work on any system with zip installed and ordinary shell commands available. The command has however only been tested on the platforms mentioned above, so as always use at your own risk!

Options

replace permits gzsave to overwrite an existing dataset. replace may not be abbreviated.

clear permits the data to be loaded even if there is a dataset already in memory and even if that dataset has changed since the data were last saved.

dtafile must be specified when the zip-file contains two or more datafiles. The full name of the datafile must be given.

save_options are all options available with save.

saveold_options are all options available with saveold.

use_options are all options available with use.

append_options are all options available with append.

merge_options are all options available with merge.

Remarks

These commands are useful for two purposes:

First, they obviously help lowering the space used on disk by a dataset, which may be important when storing very large datasets.

Second, they may help reduce network load and hence the time used for storing and opening datasets when using a distributed disk system such as NFS. This is due to the fact that the commands only transfer the compressed datasets over the network, since the uncompressed dataset is only stored as a temporay datafile, which typically resides on the local disk (where local is relative to the running instance of Stata).

The price paid for saving disk space (and network load) is the CPU time used by gzip - please, test for yourself whether compression is actually advantageous in your specific set-up.

Note, that with {zipsave} the filename listed inside the zip-archive is that of a temporary file created in the process (St12709.0000001 for example), and so you should not extract the dataset using a simple unzip command. Put differently, you should only use the command zipuse to open data saved with zipsave. zipuse can however also open a manually created zip-archive consisting of one or more datafiles by use of the option dtafile.

Also note, that the zipsave and zipuse commands sets the filename associated with the dataset, so that for example describe correctly reports the originating filename of the compressed dataset. An unwanted side-effect of this occurs when issuing a save, replace on a dataset opened with zipuse, as this will save an uncompressed dataset replacing the original compressed dataset. This is, however, considered a feature, as it ensures that commands such as describe points exactly to the relevant file, and it remains the users responsibility to be aware which filename is actually used in the command save, replace.

If you prefer to use gzip for compression, have a look at gzsave.

Examples

. zipsave myfile . zipsave myfile, replace

. zipuse myfile . zipuse myfile, clear . zipuse multfile.zip, clear dtafile(data.dta)

. zipappend using myfile2 . zipappend using multfile2, dtafile(data2.dta)

. zipmerge id using myfile3, unique

Author

{ Henrik Stovring (stovring@biostat.au.dk), Biostatistics, Department of Public Health, Aarhus University, Denmark.

Also see

Manual: [R] save

Online: help for save, use, compress, append, merge, gzsave (if installed, otherwise describe gzsave)