Recoding variables using grouped values - ---------------------------------------
^egen^ newvar = ^cut(^varname^)^, ^at(^x1,x2,...,xk^)^ [^ic^odes ^lab^el ^g^roup(k)]
Description - -----------
The option ^cut^ in ^egen^ creates a new categorical variable coded with the left-hand ends of the grouping intervals specified in at(). It allows a stata number list to specify the breaks, (labelled) integer codes in place of the left-hand ends of the intervals, and can produce approximately equal frequency groups.
Options - -------
^at( )^ supplies the breaks for the groups, as an ascending stata number list. If no breaks are specified the command expects the option ^group^.
^icodes^ requests that the codes 0, 1, 2, etc. be used in place of the left-hand ends of the intervals.
^label^ requests that the integer coded values of the grouped variable be labelled with the left--hand ends of the grouping intervals. Specifying this option automatically invokes ^icodes^.
^group( )^ specifies the number of equal frequency grouping intervals to be used in the absence of ^breaks^. Specifying this option automatically invokes ^icodes^. The command works by first calculating the appropriate percentiles using the command ^pctile^ and then using the percentiles as break points.
Example - ------- Using the variable ^length^ from the ^auto^ data, the commands
^egen lgrp=cut( length), at(140,180,200,220,240)^ ^tab lgrp^
produce the output
lgrp | Freq. Percent Cum. - ------------+----------------------------------- 140 | 31 41.89 41.89 180 | 16 21.62 63.51 200 | 20 27.03 90.54 220 | 7 9.46 100.00 - ------------+----------------------------------- Total | 74 100.00
So will the command
^egen lgrp=cut( length), at(140,180(20)240)^
Values outside the range 140--240 are coded as missing. The command
^egen lgrp = cut(length), at(140,180(20)240) label^
will produce a variable coded 0, 1, 2, 3 but labelled 140-, 180-, 200-, 220-.
Thus ^tab lgrp^ produces
lgrp | Freq. Percent Cum. - ------------+----------------------------------- 140- | 31 41.89 41.89 180- | 16 21.62 63.51 200- | 20 27.03 90.54 220- | 7 9.46 100.00 - ------------+----------------------------------- Total | 74 100.00
and ^tab lgrp, nolab^ produces
lgrp | Freq. Percent Cum. - ------------+----------------------------------- 0 | 31 41.89 41.89 1 | 16 21.62 63.51 2 | 20 27.03 90.54 3 | 7 9.46 100.00 - ------------+----------------------------------- Total | 74 100.00
The commands
^egen lgrp = cut(length), group(4) label^ ^tab lgrp^
will produce
lgrp | Freq. Percent Cum. - ------------+----------------------------------- 142- | 17 22.97 22.97 170- | 20 27.03 50.00 192.5- | 18 24.32 74.32 204- | 19 25.68 100.00 - ------------+----------------------------------- Total | 74 100.00