Recoding variables using grouped values - --------------------------------------- ^egen^ newvar = ^cut(^varname^)^, ^at(^x1,x2,...,xk^)^ [^ic^odes ^lab^el ^g^roup(k)] Description - ----------- The option ^cut^ in ^egen^ creates a new categorical variable coded with the left-hand ends of the grouping intervals specified in at(). It allows a stata number list to specify the breaks, (labelled) integer codes in place of the left-hand ends of the intervals, and can produce approximately equal frequency groups. Options - ------- ^at( )^ supplies the breaks for the groups, as an ascending stata number list. If no breaks are specified the command expects the option ^group^. ^icodes^ requests that the codes 0, 1, 2, etc. be used in place of the left-hand ends of the intervals. ^label^ requests that the integer coded values of the grouped variable be labelled with the left--hand ends of the grouping intervals. Specifying this option automatically invokes ^icodes^. ^group( )^ specifies the number of equal frequency grouping intervals to be used in the absence of ^breaks^. Specifying this option automatically invokes ^icodes^. The command works by first calculating the appropriate percentiles using the command ^pctile^ and then using the percentiles as break points. Example - ------- Using the variable ^length^ from the ^auto^ data, the commands ^egen lgrp=cut( length), at(140,180,200,220,240)^ ^tab lgrp^ produce the output lgrp | Freq. Percent Cum. - ------------+----------------------------------- 140 | 31 41.89 41.89 180 | 16 21.62 63.51 200 | 20 27.03 90.54 220 | 7 9.46 100.00 - ------------+----------------------------------- Total | 74 100.00 So will the command ^egen lgrp=cut( length), at(140,180(20)240)^ Values outside the range 140--240 are coded as missing. The command ^egen lgrp = cut(length), at(140,180(20)240) label^ will produce a variable coded 0, 1, 2, 3 but labelled 140-, 180-, 200-, 220-. Thus ^tab lgrp^ produces lgrp | Freq. Percent Cum. - ------------+----------------------------------- 140- | 31 41.89 41.89 180- | 16 21.62 63.51 200- | 20 27.03 90.54 220- | 7 9.46 100.00 - ------------+----------------------------------- Total | 74 100.00 and ^tab lgrp, nolab^ produces lgrp | Freq. Percent Cum. - ------------+----------------------------------- 0 | 31 41.89 41.89 1 | 16 21.62 63.51 2 | 20 27.03 90.54 3 | 7 9.46 100.00 - ------------+----------------------------------- Total | 74 100.00 The commands ^egen lgrp = cut(length), group(4) label^ ^tab lgrp^ will produce lgrp | Freq. Percent Cum. - ------------+----------------------------------- 142- | 17 22.97 22.97 170- | 20 27.03 50.00 192.5- | 18 24.32 74.32 204- | 19 25.68 100.00 - ------------+----------------------------------- Total | 74 100.00