{smcl}
{* 3-Aug2004 rev 8-11-2004, 4-11-2005, 1-27-2012, 2013feb24 & Mar8, May4, July25, 2014may16, 2016jan15}
{hline}
help for {hi:carryforward}
{hline}

{title:Carry values forward, filling in missing values.}

{p 8 17 2}
{cmd:carryforward}
{it:varlist} [{cmd:if} {it:exp}] [{cmd:in} {it:range}]{cmd:, }{c -(}{cmd:gen(}{it:newvarlist1}{cmd:)} | {cmd:replace}{c )-}
[{cmd:cfindic(}{it:newvarlist2}{cmd:) back carryalong(}{it:varlist2}{cmd:)} {cmd:strict} {cmd:nonotes}
{cmd:dynamic_condition(}{it:dyncond}{cmd:)} {cmd:extmiss}]

{p 4 4 2}
{cmd:by} {it:...} {cmd::} may be used with {cmd:carryforward}; see {help by}.

{title:Description}

{p 4 4 2}
{cmd:carryforward} will carry non-missing values forward from one observation to the next,
filling in missing values with the previous value. Thus, if you consider a sequence of missing values
as a gap in the overall sequence, this operation will fill the gaps with values that
appear before the gap.

{p 4 4 2}
It is important to understand that
this is not appropriate for imputing missing values; more on this later, under "Additional Remarks".

{p 4 4 2}
The value-carrying action proceeds sequentially in the existing order of observations, or as sorted by
{help bysort}, cascading values from one observation to the next, potentially
carrying a given value through many observations. The process stops upon encountering
a non-missing value, an excluded observation, or the end of a {cmd:by} group (that it, a change in value
of the primary sort-variable, when used with {help by}). The process resumes when another missing value is encountered.

{p 4 4 2}
An example will illustrate:

{cmd:. carryforward x, gen(y)}
{txt}(6 real changes made)

{cmd:. list, noobs sep(0)}
{txt}
  {c TLC}{hline 4}{c -}{hline 4}{c TRC}
  {c |} {res} x    y {txt}{c |}
  {c LT}{hline 4}{c -}{hline 4}{c RT}
  {c |} {res}12   12 {txt}{c |}
  {c |} {res} 4    4 {txt}{c |}
  {c |} {res} .    4 {txt}{c |}
  {c |} {res} .    4 {txt}{c |}
  {c |} {res} .    4 {txt}{c |}
  {c |} {res} 3    3 {txt}{c |}
  {c |} {res} .    3 {txt}{c |}
  {c |} {res} 7    7 {txt}{c |}
  {c |} {res} .    7 {txt}{c |}
  {c |} {res} .    7 {txt}{c |}
  {c BLC}{hline 4}{c -}{hline 4}{c BRC}

{p 4 4 2}
Notice that each value is carried until a non-missing value of x is encountered.

{title:Options}

{p 4 4 2}
{cmd:gen(}{it:newvarlist1}{cmd:)} specifies the new variable(s) that will receive the
values. If it is specified, then {it:newvarlist1} must have exactly as many names as there are in
{it:varlist}; the variable names in the two lists will correspond in the order presented. The variables
in {it:newvarlist1} will equal their corresponding variables in {it:varlist} wherever the latter are
non-missing.

{p 4 4 2}
{cmd:replace} specifies that the new values are to br replaced directly in
the variables of {it:varlist}. Under this option, {cmd:carryforward} functions as a
{help replace} operation.

{p 4 4 2}
You must use either {cmd:gen()} or {cmd:replace}, but not both.

{p 4 4 2}
{cmd:cfindic(}{it:newvarlist2}{cmd:)} specifies indicator variable(s) that will
be generated, indicating which observations received carry-forward values, that is, which
observations were altered by the process. This is probably more useful under the {cmd:replace}
option, since with {cmd:gen()}, this information is discernable by comparing the original
and generated values. If {cmd:cfindic(}{it:newvarlist2}{cmd:)} is specified, then {it:newvarlist2} 
must have exactly as many names as there are in
{it:varlist}; the variable names in the two lists will correspond in the order presented.
Furthermore, {it:newvarlist2} may not have any names in common with {it:newvarlist1}.

{p 4 4 2}
{cmd:carryalong(}{it:varlist2}{cmd:)} specifies additional variables that will
have their values carried along in concert with {it:varlist}.
These variables get their values carried forward, but the
set of observations that are affected is determined by {it:varlist}
rather than the variables in {it:varlist2} themselves.
This may be specified only if {it:varlist} consists of a single name.
Be aware that this is essentially a {help replace} operation, with no regard
for the original values in {it:varlist2}. Whereas 
{it: varlist} (with {cmd:replace}) never has non-missing values overwritten,
the variables in {it:varlist2} can, indeed, have non-missing values overwritten.
(If you are concerned
about overwriting values, keep a copy in a separate variable. But typically,
you would use this option to carry values into what were originally missing
values.)

{p 4 4 2}
{cmd:back} merely affects the wording of labels and notes, and has no
effect on the data. It inserts text into labels and notes, indicating that the operation was performed backward.
Typically, you would use it when you "fool" {cmd:carryforward} into carrying values backward (see example).

{p 4 4 2}
{cmd:strict} imposes an additional constraint on the treatment of excluded observations which result from
{cmd:if} or {cmd:in} qualifiers. Such observations are always excluded from having missing values filled in
(with values from the previous observation).
With the {cmd:strict} option, they are also excluded from having non-missing value carried forward (into the next observation).
This will be illustrated below.

{p 4 4 2}
{cmd:nonotes} prevents the setting of notes on the generated or replaced variable. This pertains to the note stating that the
variable was subjected to a carryforward operation; it does not affect the transfer of existing notes to the new variable under
the {cmd:gen()} option. This option is provided for instances where the notes may not be appropriate, such as when
carryforward is used as a tool for constructing a summary measure, rather than for modifying existing data.
(For example, you derive a new variable to detect a condition; the new variable initially may be sparsely populated;
you do a carryforward, followed by a reduction to the last observation per group.)

{p 4 4 2}
{cmd:dynamic_condition(}{it:dyncond}{cmd:)} specifies a restricting condition which may
include references to the value being carried. It is a more-capable alternative to the {cmd:if} {it:exp} qualifier
(though the two can be combined as well). The difference is that the {cmd:if} {it:exp} qualifier
operates only on conditions that are "static" in that they must be computable at the start of the process;
by contrast, the {cmd:dynamic_condition()} option allows for references to
values as they get propagated during the carryforward process.

{p 4 4 2}
Another limitation of the {cmd:if} {it:exp} qualifier {c -} a consequence of its static nature {c -} is that,
when there are multiple variables
in {it:varlist}, the {cmd:if} {it:exp} qualifier establishes a restriction pattern that is the same 
for all the variables; the {cmd:dynamic_condition} option can affect each variable differently.

{p 4 4 2}
Note that a reference to the value being carried would be {it:var}{cmd:[_n-1]}, where {it:var} is the 
variable being operated on. You can specify such a reference in {cmd:if} {it:exp}, but it may not work as you 
would want, since {it:var}{cmd:[_n-1]} will likely refer to observations that do not yet have the
desired values in them at the start of the carrying process (in instances where the value would be
carried more than once). That is, such a reference in {cmd:if} {it:exp} is allowed, but it refers
to values {it:before} the carrying operation begins {c -} not as they get carried.
The {cmd:dynamic_condition} option enables you to reference these values during the process of being
carried. Thus, for example, you might write,{p_end}
{p 6 8 2}{cmd:. by person_id (date): carryforward a, dynamic_condition(a[_n-1]<y)}{p_end}

{p 4 4 2}
That condition states that the value in a is to be carried as long as it does not exceed the value in y.
(That is, the carried value of a is compared to the present value of y.) Note that
this condition cannot be adequatedly implemented under an {cmd:if} {it:exp} qualifier.

{p 4 4 2}
You can use the special symbol {cmd:@} 
to refer to the value being carried from the prior observation; it stands for
{it:var}{cmd:[_n-1]}, where {it:var} is the variable under consideration.
Thus, the prior example could be written{p_end}
{p 6 8 2}{cmd:. by person_id (date): carryforward a, dynamic_condition(@<y)}{p_end}

{p 4 4 2}
The {cmd:@} symbol should be preferable, as it makes it easier to formulate and
understand such conditions. But is has an additional advantage: it refers to each of
the carried variables in succession when there are multiple variables in {it:varlist}.
Therefore, it can be used with multiple variables if the condition is the same relative
to each variable individually. Thus, you can write:{p_end}
{p 6 8 2}{cmd:. by person_id (date): carryforward a b c, dynamic_condition(@<y)}{p_end}

{p 4 4 2}
Which is equivalent to:{p_end}
{p 6 8 2}{cmd:. by person_id (date): carryforward a, dynamic_condition(@<y)}{p_end}
{p 6 8 2}{cmd:. by person_id (date): carryforward b, dynamic_condition(@<y)}{p_end}
{p 6 8 2}{cmd:. by person_id (date): carryforward c, dynamic_condition(@<y)}{p_end}

{p 4 4 2}
Which is equivalent to:{p_end}
{p 6 8 2}{cmd:. by person_id (date): carryforward a, dynamic_condition(a[_n-1]<y)}{p_end}
{p 6 8 2}{cmd:. by person_id (date): carryforward b, dynamic_condition(b[_n-1]<y)}{p_end}
{p 6 8 2}{cmd:. by person_id (date): carryforward c, dynamic_condition(c[_n-1]<y)}{p_end}

{p 4 4 2}
But the first form is concise and prefereable.

{col 12}{hline}
{p 12 12 12}
{hi:Special notes:} If you do use a hard-coded reference such as {cmd:a[_n-1]}, it usually make sense only when
{it:varlist} contains a single variable. But if you have multiple 
variables in {it:varlist} and also use
hard-coded references, be aware that the carryforward operation
will be repeated for each variable in sequence, subject to the evolving set of values in these variables.
It may be complicated and results may depend on the order of the variables.

{p 12 12 12}
Would you ever reference {it:var}, rather than {it:var}{cmd:[_n-1]} (where {it:var} is the 
variable being operated on)? It is unlikely that you would ever need to do this, since such a reference would
be applicable where {it:var} is missing, and therefore its only use would be in distinguishing various
extended missing values. Furthermore, a conditon involving {it:var} and not {it:var}{cmd:[_n-1]} could be
expressed using {cmd:if} {it:exp}.

{p 12 12 12}
If {cmd:dynamic_condition(}{it:dyncond}{cmd:)} is combined with the {cmd:if} {it:exp} qualifiers,
the carryforward action is subject to the conjunction of both conditions, as well as
{cmd:in} {it:range}, if applicable).{p_end}
{col 12}{hline}

{p 4 4 2}
{cmd:extmiss} applies to numeric variables only. This specifies that extended missing values (.a, .b, etc.)
are to be treated the same as actual numeric values; only sysmis (.) will be replaced, and extended missing values
are potentially carried into succeeding observations, just as actual numbers are (if the original values therein are sysmis). 
This option would be appropriate where the gaps that you want to fill in are coded exclusively with sysmis, and
other (extended) missing values have special significance that you want to preserve and carry forward. This situation
will occur after a {help merge} operation, where the pertinent variables in the original datasets never contain sysmis 
but may contain extended missing values that do not signify gaps to be filled.

{title:Remarks}

{p 4 4 2}
The effect of {cmd:carryforward} is sensitive to the prevailing order of the observations.
Thus, you should have the data sorted in an order that is meaningful with
respect to what is being carried forward. This can be done with a preceeding
{help sort} operation, or in conjuction with {help bysort}.
There are two purposes of {cmd:by} or {cmd:bysort} in this context: (a) to limit the flow of values to stay within
{cmd:by} groups (consecutive observations with
the same value of the primary sort-variable), that is, to prevent values from spilling over 
into observations where they don't belong;
and (b) to assure that the sequence of observations within {cmd:by} groups has a uniquely determined order and is
appropriate for the carryforward
operation. For a, there should be a primary sort-variable (a {cmd:by} group identifier), representing distinct entities such as persons.
For b, there should be a secondary
"sequencing" variable, typically representing date or time, written in parentheses, to control the order of observations within {cmd:by} groups.
Thus, symbolically, you would write{p_end}
{p 6 8 2}{cmd:. by primary_variable (sequencing_variable): carryforward}...{p_end}

{p 4 4 2}
The sequencing variable should be such that it assures a unique sort within {cmd:by} groups.
That is, the combination of primary and sequencing variables should be sufficient to uniquely sort the data. (In database terminology, they
constitute a key.) The uniqueness of the sorting sequence is important: if we are carrying values from one observation to the
next, it makes sense to require that the "next" observation be uniquely determined. Otherwise, the concept of "next observation" is not
meaningful. Furthermore, this will assure consistent results if the
operation is done multiple times on the same starting dataset. Note that just having that sequencing variable in the command does
not guarantee a unique sort order; it is up to the user to assure that uniqueness.

{p 4 4 2}
(For the present purpose, the salient feature of sorting on the primary sort-variable is that same-valued
observations are located together; the particular order in which they occur is not important. On the other hand,
the order imposed by the sequencing variable is important.)

{p 4 4 2}
If the primary variable is person_id and the sequencing variable is date, then we would write...{p_end}
{p 6 8 2}{cmd:. bysort person_id (date): carryforward}...{p_end}
{p 4 4 2}or equivalently,{p_end}
{p 6 8 2}{cmd:. sort person_id date}{p_end}
{p 6 8 2}{cmd:. by person_id (date): carryforward}...{p_end}

{p 4 4 2}
The author advocates using {help assertky} in place of {cmd:sort} in these situations, to
sort the dataset and assure that the sorting sequence is unique. (See note, below.) Thus, the preferred form is...{p_end}
{p 6 8 2}{cmd:. assertky person_id date}{p_end}
{p 6 8 2}{cmd:. by person_id (date): carryforward}...{p_end}

{p 4 4 2}
Having the sequencing variable in parentheses is important. Do not write...{p_end}
{p 6 8 2}{cmd:. bysort person_id date: carryforward}...{p_end}
{p 4 4 2}as that will limit the flow of data to single observations; nothing will happen.

{p 4 4 2}(The foregoing examples assume that the primary and sequencing variables were each a single variable. Naturally,
either of them could consist of multiple variables. As a separate matter, it may be (rarely) possibly that there is no
primary sort-variable; the whole dataset is one contiguous sequence. In that case, you can't use {cmd:by}; just preceed the
{cmd:carryforward} with a {cmd:sort} or {cmd:assertky} on the sequencing variable.)

{p 4 4 2}
{cmd:carryforward} will call on {help clonevar} to copy the original variable(s) when using the {cmd:gen()} option.
This will copy the variable {help label} and any existing {help notes}.
Regardless of whether you use {cmd:gen()} or {cmd:replace}, the variable will receive {help notes} indicating that the
variable was subjected to a carryforard operation (unless {cmd:nonotes} is specified). This behavior has changed as of version 4.3.

{p 4 4 2}
When values are carried forward, you will see a message such as
{cmd:(22 real changes made)}, reporting the number of originally missing
values that were replaced, and referring to either {it:varlist} or {it:newvarlist1},
depending on which option ({cmd:gen()} or {cmd:replace}) was used.

{p 4 4 2}
The presence of {cmd:if} {it:exp} or {cmd:in} {it:range} qualifiers
will exclude the non-eligible observations from having values
carried into them, {it:and} will interrupt the carrying
of values past that point. That is, excluded observations are not merely excluded from
getting their missing values replaced; they
affect subsequent observations. Note that, unlike in many commands, such excluded observations
are not totally "out of the picture"; they have a real effect. (Furthermore, under the {cmd:gen()} option, they can
receive non-missing values if the original variables have non-missing values in the
excluded observations. But that is not the typical situation in the use of exclusionary qualifiers.)
An example will illustrate.

{com}. carryforward x if c1, gen(y)
{txt}(4 real changes made)
{com}. carryforward x if c1, gen(z) strict
{txt}(2 real changes made)

{com}. list, noobs sep(0)
{txt}
  {c TLC}{hline 4}{c -}{hline 4}{c -}{hline 4}{c -}{hline 4}{c TRC}
  {c |} {res} x   c1    y    z {txt}{c |}
  {c LT}{hline 4}{c -}{hline 4}{c -}{hline 4}{c -}{hline 4}{c RT}
  {c |} {res}12    1   12   12 {txt}{c |}
  {c |} {res} 4    1    4    4 {txt}{c |}
  {c |} {res} .    1    4    4 {txt}{c |}
  {c |} {res} .    0    .    . {txt}{c |}
  {c |} {res} .    1    .    . {txt}{c |}
  {c |} {res} 3    1    3    3 {txt}{c |}
  {c |} {res} .    1    3    3 {txt}{c |}
  {c |} {res} 7    0    7    7 {txt}{c |}
  {c |} {res} .    1    7    . {txt}{c |}
  {c |} {res} .    1    7    . {txt}{c |}
  {c BLC}{hline 4}{c -}{hline 4}{c -}{hline 4}{c -}{hline 4}{c BRC}

{p 4 4 2}
Notice that the fourth observation did not receive a value in y, since c1=0,
and that the fifth observtion also did not receive a value, as the fourth
observation interrupted the flow of values.

{p 4 4 2}
Also notice that the 0 in c1 in observation 8 had no effect on y, since x is
non-missing in that observation. The basic behavior is that excluded
observations are restricted from having their missing values replaced,
but by default, they are not restricted from having their non-missing values carried forward.
With the {cmd:strict} option, they are also excluded from having their non-missing values carried forward,
as illustrated by z in this example.

{p 4 4 2}
Notice that, without {cmd:strict}, excluded observaions interrupt the flow of values only if the
original value is missing. With {cmd:strict}, excluded observaions always interrupt the flow of values.
If you prefer that certain excluded observations not interrupt the flow of values,
you should arrange sorting variables so as to move these observations out of the way.

{col 12}{hline}
{p 12 12 12}
{hi:Technical note:} It would be possible to devise an option such that
excluded observations would be skipped over, and would not stop the flow of
values. (Thus, observation 5 in the above example would receive 4 in y, and
observations 9 and 10 would receive a 3 {c -} not a 7.)
This is a potential avenue for future development, and the author
welcomes comments on whether this is desirable.
{p_end}
{col 12}{hline}

{p 4 4 2}
When using {cmd:carryalong(}{it:varlist2}{cmd:)} there is nothing to stop you
from including (the single name in) {it:varlist} among {it:varlist2}, but there is no point in doing
so. This is effectively equivalent to specifying {cmd:replace}. (If you
specified {cmd:replace}, then there is no additional effect; if you specified
{cmd:gen(}{it:newvarlist1}{cmd:)}, then {it:newvarlist1} and {it:varlist}
will be equal {c -} as if you had specified both {cmd:gen(}{it:newvarlist1}{cmd:)}
and {cmd:replace}, if that were allowed.)

{title:Examples}

{p 4 8 2}{cmd:. by personid spellno (year): carryforward statefp, replace}

{p 4 8 2}{cmd:. gen int negyear = -year}{p_end}
{p 4 8 2}{cmd:. bysort personid (negyear): carryforward educ2, gen(educ2b) back}
{cmd:cfindic(educ2b_cbi) carryalong(educ2_from_hw educ2_cfi)}{p_end}

{p 4 4 2}
In the latter example, we are going backward; thus, the {cmd:back} option.
Also, educ2_from_hw is an attribute about how educ2 was constructed, so
we want it to be carried along with educ2. Similarly for educ2_cfi, but that
was actually a cfindic variable from an earlier carryforward operation (not
shown). (That earlier operation was in the forward direction; the present
one goes backward. In between, certain observations were dropped; otherwise,
there would be little use in having educ2_cfi in the carryalong variables.)

{p 4 4 2}
Note that in going backward, it is necessary to reverse only the sequencing variable, not the
primary sorting variable.

{title:Additional Remarks}

{p 4 4 2}
{cmd:carryforward} is not intended for imputing missing values; indeed, this operation is considered to be a bad choice for missing-value imputation.
The intent is, rather, to fill in gaps in the sequence of values of designated variables, where it is natural that these gaps ought to be filled
with the values that preceed them. 
It is important to understand that only certain variables have this property {c -} that values persist, in the context of an
appropriate sort order, until they explicitly change; let us
call them "prevailing-value" variables.
Examples include a person's address, marital status, eduational attainment, and various attributes about his/her employment.

{p 4 4 2}
Typically the observations correspond to dates or times, and the sequence of observations is an important
aspect of interpreting the data. That is, it is important to always have the dataset sorted by the date or time {c -} or
typically, by date or time within groups of some other entity such as persons. Thus, you might{p_end}
{p 4 8 2}{cmd:. sort person_id date}{p_end}
{p 4 4 2}or possibly,{p_end}
{p 4 8 2}{cmd:. assertky person_id date}{p_end}

{p 4 4 2}
The dataset will have observations corresponding to when these attributes change ("change-events").
Due to the way the dataset is constructed, there may be observations for dates/times that are between change-events for a given variable.
That is, there may be more observations than change-events for a given variable. A common situation is that there are mutiple prevailing-value
variables, and they may change on different dates/times.
Naturally, a between-change-event observaton should retain the value established at the most recent change-event, but the data-construction
process may have left some between-change observations with missing values, as can happen with non-matched observations in a
{help merge} operation.

{p 4 4 2}
(For simplicity, in what follows, we will use date as representative of any sequencing variable. Understand that, generally, it might also
include time, or it may be some other entity that determines a unique sort order that is appropriate for the data.)

{p 4 4 2}
Suppose you start with two or more prevailing-value attributes recorded in multiple datasets. Each of these original datasets should be
uniquely sorted on an identifer and date variable. They have mostly the same identifier values, but possibly
different date values for any given identifier value. Thus, after a {cmd:merge}, there are non-matched observations, resulting in 
missing values {c -} gaps in the sequence. (It is important to {help sort} the data again after the {cmd:merge}).

{p 4 4 2}
Usually, it is important to distinguish these gaps, which are artifacts of the 
merging process, from "originally" missing values in content variables in the pre-merge datasets.
(This discussion relates to content variables; identifying variables must never be missing.)
The pre-merge datasets typically have observations only for change-events, and don't contain any gaps themselves,
though they might have missing values.
A missing value in a pre-merge dataset is presumably a "genuine" missing value, representing
an unknown value on a change-event; its true value may differ from the prior value, and
you would not want to replace it with a non-missing value, and you would not want to potentially carry such a value
into subsequent observations in the merged dataset.

{p 4 4 2}
Suppose salary.dta contains salary, and marstat.dta contains marit_stat.

{p 4 8 2}{cmd:. use salary}{p_end}
{p 4 8 2}{cmd:. gen byte rec_sal = 1}

{p 4 8 2}{cmd:. merge person_id date using marstat, uniq}{p_end}
{p 4 8 2}{cmd:. gen byte rec_mar = _merge==2 | _merge==3}{p_end}
{p 4 8 2}{cmd:. drop _merge}

{p 4 8 2}{cmd:. recode rec_sal (mis=0)}

{p 4 8 2}{cmd:. assertky person_id date}{p_end}
{p 4 8 2}{cmd:. by person_id (date): carryforward salary if ~rec_sal, replace}{p_end}
{p 4 8 2}{cmd:. by person_id (date): carryforward marit_stat if ~rec_mar, replace}

{p 4 4 2}
{cmd:person_id} and {cmd:date} are the primary and sequencing variables as described above. ({cmd:by person_id} insures that you limit the 
carrying of values to within person-based groups,
as you don't want to carry a value from one person to another. The presence of {cmd:(date)} assures that the sort 
order is correct within each such group.)

{p 4 4 2}
The {cmd:if ~rec_sal} qualifier is there to prevent carrying an actual value into an originally missing value, as explained above.
Observations in the merged dataset with ~rec_sal (which must have missing values for salary) comprise the gaps.
They correspond to observations in {cmd:marstat} which fall between 
observations that come from {cmd:salary}.
Cases with rec_sal=1, on the other hand, take values from {cmd:salary}; any missing values there should be regarded as "truly missing" and
should not be overwritten. Similarly for rec_mar as it relates to marit_stat.

{p 4 4 2}
{cmd:assertky} is a program that sorts the data and assures that the sort order is unique. See more on this, below.

{p 4 4 2}
(In the code sequence above, it would have been possible to calculate {cmd:rec_sal} as {cmd:_merge==1 | _merge==3} after the
{cmd:merge}, but the scheme shown here generalizes to more than two files.)

{p 4 4 2}
One possible situation that arises is where a prevailing-value attribute has gaps, but the data is such that the non-missing value
following the gap is not necessarily on the date of the change-event; the change may have occurred at that point, or it may have
occured on one of the observations within
a gap {c -} but you don't know which one. Usually, then, it is not appropriate to do
a carryforward, but there is one situation where you can safely do it: (a) the attribute is on an ordered scale and changes monotonically;
and (b) the values on either side of the gap are the same. As an example, think of educational attainment; it never decreases. If
the value is "High School Graduate" at two distinct times, then it should be the same at any time in between. The way to handle this
situation is to fill in the gaps with values from both sides, going forward and backward, as follows:

{p 4 8 2}{cmd:. assertky person_id date}{p_end}
{p 4 8 2}{cmd:. by person_id (date): carryforward educ, gen educ_fwd}{p_end}
{p 4 8 2}{cmd:. gen int negdate = -date}{p_end}
{p 4 8 2}{cmd:. assertky person_id negdate}{p_end}
{p 4 8 2}{cmd:. by person_id (negdate): carryforward educ, gen educ_back, back}{p_end}
{p 4 8 2}{cmd:. assertky person_id date}{p_end}
{p 4 8 2}{cmd:. replace educ = educ_fwd if mi(educ) & educ_fwd == educ_back}{p_end}

{p 4 4 2}
The idea is that you tentatively carry values in from both sides, using {cmd:gen()} so as not to replace values at that point.
If the values agree, then you can fill in the gap.

{p 4 4 2}
One final note: Be aware of a phenomenon that can occur in numeric variables that have been converted to string, such as with {help tostring}.
In this situation, missing numeric values are rendered as "." (or ".a", ".b", etc.). While these look like missing values, they are not,
and {cmd:carryforward} will not affect them. If you want {cmd:carryforward} to replace them, then either replace them with null
strings ("") prior to applying {cmd:carryforward}, or apply {cmd:carryforward} to the numeric variable prior to converting it to string.

{title:About assertky}

{p 4 4 2}
As mentioned above, assertky is a program that sorts the dataset {it:and} assures that the sorting sequence is unique, which is
useful in preparation for {cmd:carryforward} (as well as {cmd:merge}). It is by this same author, and available from SSC.
If you prefer to use a standard Stata
command, the same results can be obtained by using {help isid} with the {cmd:sort} and {cmd:missok} options.

{title:Author}

{p 4 4 2}
David Kantor. Initial work was done at The Institute for Policy Studies, Johns Hopkins University.
Email {browse "mailto:kantor.d@att.net":kantor.d@att.net} if you observe any
problems. The author thanks several users who have requested the enhancement to allow multiple variables.

{title:Also See}

{p 4 4 2}
{help replace}; {help gen_tail}, a related program by the same author.