Recoding variables using grouped values
- ---------------------------------------

^egen^ newvar = ^cut(^varname^)^, ^at(^x1,x2,...,xk^)^ [^ic^odes ^lab^el ^g^roup(k)]

Description - -----------

The option ^cut^ in ^egen^ creates a new categorical variable coded with the left-hand ends of the grouping intervals specified in at(). It allows a stata number list to specify the breaks, (labelled) integer codes in place of the left-hand ends of the intervals, and can produce approximately equal frequency groups.

Options - -------

^at( )^ supplies the breaks for the groups, as an ascending stata number list. If no breaks are specified the command expects the option ^group^.

^icodes^ requests that the codes 0, 1, 2, etc. be used in place of the left-hand ends of the intervals.

^label^ requests that the integer coded values of the grouped variable be labelled with the left--hand ends of the grouping intervals. Specifying this option automatically invokes ^icodes^.

^group( )^ specifies the number of equal frequency grouping intervals to be used in the absence of ^breaks^. Specifying this option automatically invokes ^icodes^. The command works by first calculating the appropriate percentiles using the command ^pctile^ and then using the percentiles as break points.

Example - ------- Using the variable ^length^ from the ^auto^ data, the commands

^egen lgrp=cut( length), at(140,180,200,220,240)^ ^tab lgrp^

produce the output

lgrp | Freq. Percent Cum. - ------------+----------------------------------- 140 | 31 41.89 41.89 180 | 16 21.62 63.51 200 | 20 27.03 90.54 220 | 7 9.46 100.00 - ------------+----------------------------------- Total | 74 100.00

So will the command

^egen lgrp=cut( length), at(140,180(20)240)^

Values outside the range 140--240 are coded as missing. The command

^egen lgrp = cut(length), at(140,180(20)240) label^

will produce a variable coded 0, 1, 2, 3 but labelled 140-, 180-, 200-, 220-.

Thus ^tab lgrp^ produces

lgrp | Freq. Percent Cum. - ------------+----------------------------------- 140- | 31 41.89 41.89 180- | 16 21.62 63.51 200- | 20 27.03 90.54 220- | 7 9.46 100.00 - ------------+----------------------------------- Total | 74 100.00

and ^tab lgrp, nolab^ produces

lgrp | Freq. Percent Cum. - ------------+----------------------------------- 0 | 31 41.89 41.89 1 | 16 21.62 63.51 2 | 20 27.03 90.54 3 | 7 9.46 100.00 - ------------+----------------------------------- Total | 74 100.00

The commands

^egen lgrp = cut(length), group(4) label^ ^tab lgrp^

will produce

lgrp | Freq. Percent Cum. - ------------+----------------------------------- 142- | 17 22.97 22.97 170- | 20 27.03 50.00 192.5- | 18 24.32 74.32 204- | 19 25.68 100.00 - ------------+----------------------------------- Total | 74 100.00