Title
sqindexplot -- Sequence index plots
Syntax
sqindexplot [if] [in] [, options]
options Description ------------------------------------------------------------------------- ranks(numlist) restrict tabulation on most frequent numlist se apply same elements similarity so apply same order similarity order(varlist) specify order of vertical axis by(varname) plot groups of sequences based on varname color(colorstyle) apply colors to the elements overplot(#) fine tune bar-width rbar bars intead of spikes gapinclude include sequences with gaps in the tabulation subsequence(a,b) use only subsequence between positions a and b twoway_options options allowed with graph twoway -------------------------------------------------------------------------
Description
sqindexplot draws sequence index plots. These plots draw a horizontal line for each sequence, which changes its colors according to the elements.
Out of the box, sequence index plots have several shortcomings, which should be dealt with when fine-tuning the graph:
o In general, colored versions of sequence index plots are more sensible than black-and-white versions. The color() opton allows fine-tuning of the colors used for the elements.
o Depending on variables such as the resulution of the screen, the viewer used to show the figure, the resolution of the printer, or the graph size, there can be a tendency to either overplot the lines, which overrepresents elements with higher category values (levels) or to have white stripes between the lines. The effect can be moderated by tuning the option overplot() and/or the aspectratio. It might also be sensible to restrict the graph to the most frequent sequences by using the ranks() option.
o Sequence index plots depend heavily on the order of the sequences along the vertical axis. Without further options, a naive algorithm is used to order the sequences; however, the order() option sorts the sequences according to a user-defined variable list. It is sensible to use the results of sqom to order the sequences in a sequence index plot.
Options
ranks(numlist) is used to restrict the output to the most frequent sequences. numlist refers to the position of the sequences in the sorted frequency table. Hence, ranks(1) refers to the most frequent sequence, whereas ranks(1/10) refers to the 10 most frequent sequences. You can also specify ranks(2(2)20).
se is used to request that a plot showing only the elements of sequences are used (same elements similarity). Hence, with this option sequences like A-B-A-B, B-A-A-B, and A-B-B-A would be drawn as A-B.
so is used to request a plot where only the order of elements is shown (same-order similarity). With this option the sequences A-B-B-A and A-B-A-A would both be drawn as if they were A-B-A.
order(varlist) is used to control the order of the sequences along the vertical axis. Without this option, a simple algorithm for the order is used. However, an order derived from an application of sqom is preferable. Note that within sequences with the same pattern on the order variables the default algorithm is applied.
by(varname) specifies to plot groups of sequences separately based on varname.
color(colorstyle) specifies the colors for the elements. You can specify one color for each element, whereby the first color refers to the element with the lowest level. See colorstyle for a list of color choices.
overplot(#) lets you fine-tune the amount of overplotting. The command tries to be smart about this setting, but the solutions are not always satisfying, especially with either rather small or rather large numbers of sequences. The default setting is overplot(60). Choose a smaller number if the lines for higher levels appear to thicker than lines for lower levels. Choose a larger number if there are white stripes between the lines. If you have only few observations and/or draw sequence index plots with option by(), you may want to use option rbar.
rbar uses bars instead of spike to draw the sequences. This option leads to serious overplotting even for moderate number of observations but can be advantageous for small sample sizes of when plots are drawn with option by(). rbar.
gapinclude is used to include sequences with gaps. The default behavior is to drop sequences with gaps from the graph. The term gap refers only to missing values on the element variable within a sequence. Sequences with missing values at the begining and at the end of a sequence are included in any case. You might consider using sqset with option trim to get rid of superfluous missings (see sq for details.)
subsequence(a,b) is used to include only the part of the sequence that is between position a and b, whereby a and b refer to the position defined in order variable.
twoway_options are a set of common options supported by all twoway commands; see twoway_options.
Examples
. sqindexplot
. sqindexplot, color(black red yellow)
. sqindexplot, so
. sqindexplot, se
Author
Ulrich Kohler, WZB, kohler@wzb.eu
Also see
Manual: [G] graph, [G] graph twoway rbar, [G] barlook options
Online: sq, sqdemo, sqset, sqdes, sqegen, sqstat, sqindexplot, sqmodalplot sqparcoord, sqom, sqclusterdat, sqclustermat