Restricted cubic spline smoothing of proportions
proprcspline yvar xvar [cvars] [if] [in] [weight] [ , at(cvar1 # [cvar2 # [...]]) by(by_option) showknots cataxis catlegend rareaopt#(rarea_options) addplot(plot) lablength("all" | #) stub(stub) generate(stub) mkspline_options mlogitopts(mlogit_options) ]
fweights and pweights are allowed; see weight.
Description
proprcspline computes a restricted cubic spline smooth of proportions of observations in each category of yvar given xvar, and graphs them as a stacked area plot. Optionally, these smoothed proportions can be adjusted for a set of control variables (cvars).
Remarks
proprcspline calls mkspline, cubic to create variables containing a restricted cubic spline of xvar. It then calls mlogit to regress yvar against those new variables, and thus obtains predicted (smoothed) values of proportions in each category of yvar given xvar. Finally, it calls graph to plot the smooth.
When control variables are added, then these will also be added to the mlogit model. When predicting the smoothed proportions the values of control variables will be fixed at the values specified in the at() option or the mean when these variables did not occur in the at() option.
More generally, the main intended usage of proprcspline is for descriptive analysis and informal exploratory analysis. More formal uses would require specification of the stub() option to save the variables created. Some consideration might need to be given to the implications of any data snooping.
Options
at(cvar1 # [cvar2 # [...]]) specifies the values at which the control variables (cvars) are held constant when predicting the smoothed proportions. All control variables that are not mentioned in the at() option will be fixed at their overal mean. When the by() option is specified, the mean will be computed once for the entire sample (allowing for if and in selection criteria), not separately for each group specified in the by() option.
by(by_options) allows comparing the smoothed proportions across groups. See help on by_option.
showknots specifies that the positions of the knots be shown on the graph by vertical lines.
cataxis specifies that the categories of yvar are labeled on the right y-axis. This is the default when the by() option is not specified or when the by() option implies the comparison of only two groups.
catlegend specifies that the categories of yvar are labeled in a legend. This is the default when the by() option is specified such that more than 2 groups are compared.
rareaopt#(rarea_options) specifies options to be applied to the area representing category number # of yvar. These options are listed in twoway rarea
addplot(plot) provides a way to add other plots to the generated graph. See help on addplot_option.
lablength("all"| #) Within the second y-axis or the legend the categories are labeled using the value labels. the lablength(() option specifies the maximum number of characters that are used from each value label. The default is 20. One can either specify "all" to make proprcspline use all characters of the value labels in the legend or a positive integer indicating the maximum number of characters used.
stub(stub) specifies that the variables containing the spline be saved in variables with prefix stub.
generate(stub) specifies that smoothed proportions be saved in variables with prefix stub..
mkspline_options are options of mkspline, cubic.
nknots() specifies the number of knots that are to be used for a restricted cubic spline. This number must be between 3 and 7 unless the knot locations are specified using knots(). The default number of knots is 5.
knots() specifies the exact location of the knots to be used for a restricted cubic spline. The values of these knots must be given in increasing order. When this option is omitted, the default knot values are based on Harrell's recommended percentiles with the additional restriction that the smallest knot may not be less than the fifth-smallest value of xvar and the largest knot may not be greater than the fifth-largest value of xvar. If both nknots() and knots() are given, they must specify the same number of knots.
mlogitopts() contains options of mlogit. It is difficult to know why you would want to specify any.
Examples
sysuse nlsw88, clear gen marst = cond(never_married, 1, /// cond(married, 2, 3)) /// if !missing(married, never_married) label define marst 1 "never married" /// 2 "married" /// 3 "divorced/widowed" label value marst marst proprcspline marst grade, xlab(0(5)15)
(click to run)
sysuse nlsw88, clear gen marst = cond(never_married, 1, /// cond(married, 2, 3)) /// if !missing(married, never_married) label define marst 1 "never married" /// 2 "married" /// 3 "divorced/widowed" label value marst marst proprcspline marst grade, xlab(0(5)15) /// rareaopt1(color(red)) /// rareaopt2(color(blue)) /// rareaopt3(color(gs10))
(click to run)
sysuse nlsw88, clear gen marst = cond(never_married, 1, /// cond(married, 2, 3)) /// if !missing(married, never_married) label define marst 1 "never married" /// 2 "married" /// 3 "divorced/widowed" label value marst marst
label define c_city 1 "in central city" /// 0 "outside central city" label value c_city c_city
proprcspline marst grade, xlab(0(5)15) /// by(c_city, note(""))
(click to run)
sysuse nlsw88, clear gen marst = cond(never_married, 1, /// cond(married, 2, 3)) /// if !missing(married, never_married) label define marst 1 "never married" /// 2 "married" /// 3 "divorced/widowed" label value marst marst
label define c_city 1 "in central city" /// 0 "outside central city" label value c_city c_city gen black = race == 2 if race < . label define black 1 "black" /// 0 "non-black" label value black black
proprcspline marst grade black, xlab(0(5)15) /// by(c_city, note("")) at(black 0)
(click to run)
Saved results
r(N_knots) number of knots (scalar) r(knots) knot positions (matrix)
Author
Maarten L. Buis Universitaet Tuebingen Institut fuer Soziologie maarten.buis@uni-tuebingen.de
Acknowledgments
Large portions of the code are based upon rcspline by Nicholas J. Cox.
Also see
Online: lowess, lpoly If installed: rcspline, mvrs