------------------------------------------------------------------------------- help for mark_changes -------------------------------------------------------------------------------

Generate a variable indicating where one or more variables changes value.

mark_changes varlist, gen(newvar)


mark_changes generates a variable that indicates the occurrence of a change in value in any of the variables in varlist. A change is deemed to occur where there is a difference in value between one observation and the prior observation. newvar will be 1 where a change occurs, and 0 otherwise. Additionally, newvar will be 1 for the first observation.

mark_changes can be combined with by, in which case, newvar will be 1 for the first observation in each by-group. See by.


mark_changes is useful for detecting changes in values of varlist, which is equivalent to detecting the starts of spells of the same values of varlist. Typically one might subsequently keep only these starting observations, as the others (the non-changing) may be deemed as irrelevant. Alternatively, one might summ up newvar, thus generating a spell identifier (though spell by Nicholas Cox and Richard Goldstein, available from ssc, is another way to achieve that result).

The order of observations is critical. You should be sure that the observations are in an order that makes sense with respect to the changes you want to detect; typically, there is a time-based variable involved. See sort.

When using this with by, you would typically use the two-varlist form of the by specification, as in

. by personid (date): mark_changes ...

The reason is that you want to be sensitive to the primary divisions, marking the first observation in each group (the first observation for each person, in this example), but you would also want a specific order within these groups (sorting by date, in this example).

Furthermore, in such a situation, it is important that the by variables uniquely identify the observations. That is, you want to sort on these variables, and have the resulting order of observations be unique. assertky from ssc can be helpful in assuring this condition. Thus the prior example would be preceded by

. assertky personid date


. mark_changes weight, gen(weight_change)

. by personid (date): mark_changes locationcode jobcode salary, gen(ch)

The previous example might be followed by

. keep if ch . keep personid date locationcode jobcode salary

the idea being that the data may have many more variables, with many observations that report changes in these variable, but you want to notice only changes in a few selected variables. After the keep if ch it is usually appropriate to keep only the variables mentioned in the mark_changes command, as other variables may have values that are unrelated to the change. Thus, you are reducing variables and potentially reducing observations.