-------------------------------------------------------------------------------
help for todate
-------------------------------------------------------------------------------

Generate Stata date variables from run-together date variables

todate varlist [if exp] [in range] , generate(newvarlist) pattern(patternspec) [format(format) cend(year)]

Description

todate takes run-together date variables consisting of integers or integer characters and generates the corresponding Stata date variables. Depending on input, the resulting variables will have units that are the numbers of days, weeks, months, quarters or half-years since the beginning of 1960.

Remarks

As explained in [U] 24.2.2.2, Stata's date() function cannot convert run-together date variables which consist of integers or integer characters and represent some permutation of years, months and days, with values such as 19520121 or 520121 or "19520121", to Stata date variables. However, it is possible to use other numeric or string functions to extract year, month and day and then to compute the dates.

A similar problem and a similar solution arise with run-together dates representing years and one of half-years, quarters, months or weeks.

todate offers an alternative. It can handle one or more numeric and/or string variables with run-together dates which are integers or integer characters, hereafter both referred to as digits.

Options

generate() specifies a list of new variable names to hold the Stata date variables. There must be as many new names as there are existing run-together date variables. This is a required option.

pattern() specifies the pattern for interpreting digits in the dates.

y indicates year. h indicates half-year. q indicates quarter. m indicates month. w indicates week. d indicates day.

The patterns supported are of types mdy, yh, yq, ym and yw. For example, type mdy consists of some permutation of ms, ds and ys. In general, all ys, ... , all ds should be contiguous.

Thus p(yyyymmdd) indicates that 19520121 is to be interpreted as 1952 January 21. Dates like 3281952 and 12251999 for March 28 1952 and December 25 1999, in which the first digit for shorter dates is not specified, but could be 0, should be specified as p(mmddyyyy).

todate cannot handle input variables whose maximum length does not match the length of pattern specified, nor input variables whose non- missing values vary in length by more than 1 digit. Such variables are skipped.

This is a required option.

cend() specifies the end of the century (any period of 100 years) to which years belong. With cend(2000) year digits of 52 and 99 will be interpreted as 1952 and 1999. With cend(2060) they will be interpreted as 2052 and 1999. cend() is ignored if 4 digit years are given.

format() specifies a format to be attached to each new variable, usually but not necessarily a date format. See help on date formats or time-series formats. By default the formats will be %d, %tw, %tm, %tq or %th as appropriate.

Examples

. todate d1 d2 d3, gen(ndate1-ndate3) p(yyyymmdd) f(%dd_m_cy) . todate d1 d2 d3, gen(ndate1-ndate3) p(mmddyy) c(2000) f(%dd_m_cy) . todate d1 d2 d3, gen(ndate1-ndate3) p(yyyyq) f(%tq_q_cy)

Author

Nicholas J. Cox, Durham University, U.K. n.j.cox@durham.ac.uk

Acknowledgments

Kit Baum, Roger Harbord, David Kantor and Gary Longton made very helpful comments.

Also see

On-line: help for datefcn, ywfcns, dfmt, tfmt Manual: [U] 27 Commands for dealing with dates FAQ: http://www.stata.com/support/faqs/data/dateseq.html