*stndzxage tutorial *by Sarah Reynolds *2-27-19 *The file checks how the command stndzxage differs from zscore *The file illustrates how to use the command clear all set more off cd "C:\Users\saris\Dropbox\ado\plus\s\stndzxage additional files" use "stndzxage_sample_data.dta", clear count *1,429 children in the data count if TestScore~=. *1,420 were tested hist AgeMonth *ages concentrated in the center stndzxage TestScore AgeMonth sum stx_TestScore *mean about 0 & standard deviation about 1, as expected *however, there are fewer observations *Do a loop to check standardization with stata command levelsof AgeMonth, local(ages) gen Z_TestScore=. foreach age of local ages { zscore TestScore if AgeMonth==`age' replace Z_TestScore=z_TestScore if AgeMonth==`age' drop z_TestScore } sum Z_TestScore *mean about 0 & standard deviation about 1, as expected *however, there are more observations, equal to the *number of children who took the test - 1 sum AgeMonth if Z_TestScore==. & TestScore~=. *The - 1 corresponds to the child who was the only one of thier age *Check to see how well they line up if there are both standardization variables scatter Z_TestScore stx_TestScore tab AgeMonth if stx_TestScore~=. tab AgeMonth if Z_TestScore~=. *mismatch in missings because stndzxage has 30 observations minimum *find out how many are in each month to re-standardize the *using the smallest number of observations! tab AgeMonth stndzxage TestScore AgeMonth, minbinsize(12) assert stx_TestScore==Z_TestScore *This error turns out to be from rounding gen stx_round=round(stx_TestScore, 0.0001) gen Z_round=round(Z_TestScore, 0.0001) assert stx_round==Z_round *****Validation complete******** ****Exploring options**** *GRAPHING stndzxage TestScore AgeMonth, graph *Notice there are more ages with raw data points than have means *These ages had too few observations (default minbinsize is 30) *BIN WIDTH *let's widen the age bins so more ages are grouped together, resulting in *a larger number of observations in each bin stndzxage TestScore AgeMonth, binwidth(6) graph *the waves in the standardized data indicate bins are probably too wide stndzxage TestScore AgeMonth, binwidth(3) graph *still some age dependency but not so much *note the last bin included 4 ages (see help file chart about bin grouping) *MININIMUM BIN SIZE *let's increase the minimum number of observations allowed in each bin stndzxage TestScore AgeMonth, binwidth(3) minbinsize(150) graph *CONTINUOUS *continuous standardization is a good option when data density has gaps (in tails) stndzxage TestScore AgeMonth, continuous graph sum stx_TestScore *note all observations are standardized stndzxage TestScore AgeMonth, continuous poly(1) graph // linear stndzxage TestScore AgeMonth, continuous poly(5) graph // a bit more curvature *STANDARDIZING OVER ADDITIONAL VARIABLES *you can use if to standardize a single subgroup stndzxage TestScore AgeMonth if Male==1, binwidth(3) tab Male, sum(stx_TestScore) stndzxage TestScore AgeMonth if Male==0, binwidth(3) tab Male, sum(stx_TestScore) *but below is more efficient *standardize by age & gender stndzxage TestScore AgeMonth Male, binwidth(3) graph tab Male, sum(stx_TestScore) *note means & s.d. are 0 in both cases *standardize by age, gender, and urban stndzxage TestScore AgeMonth Male Urban, continuous graph tab Male Urban, sum(stx_TestScore) *STANDARDIZING WTIH REGARDS TO A REFERENCE GROUP stndzxage TestScore AgeMonth, binwidth(3) reference(Male) graph *The graph only illustrates the data for the reference group, which was used *for standardizing tab Male, sum(stx_TestScore) *note here the mean & s.d. is ~0 & ~1 for the reference group, but different for *the non reference group *USING A REFERENCE GROUP & A SUBGROUP *can you do it both reference group stndzxage TestScore AgeMonth Urban, binwidth(3) minbinsize(30) reference(Male) graph tab Male Urban, sum(stx_TestScore) *USING A DIFFERENT RUNNING VARIABLE *Suppose the test was administered with different questions to different ages *Cut the data at the ages for each group egen testgroups=cut(AgeMonth), at(10, 13, 16, 19, 25, 30) tostring testgroups, replace encode testgroups, gen(TestGroups) label values TestGroups // remove label from TestGroup2 stndzxage TestScore TestGroups, graph rename stx_TestScore testgroups_z *This graph has the test groups all lumped together *If you want to see the ages graphed also, use the if option. *Select the binwidth to be the widest number of ages in a bin. levelsof TestGroups, local(groups) gen testgroups_if_z=. foreach i of local groups { stndzxage TestScore AgeMonth if TestGroups==`i', binwidth(6) graph replace testgroups_if_z=stx_TestScore if TestGroups==`i' } assert testgroups_z==testgroups_if_z *Though the syntax below is appealing, it does not work because *the ages are divided up by binwidth before the TestGroups * stndzxage TestScore AgeMonth TestGroups, binwidth(6) graph *don't use this code! *FLOORS & CEILINGS *let's make an artificial floor in this data replace TestScore=35 if TestScore<35 hist TestScore scatter TestScore AgeMonth *If your data actually looked like this, you might be ok with the test ceiling, but *you might want to rethink the appropriateness of the test for the younger kids: *the test best discriminates after about 15 months. stndzxage TestScore AgeMonth, continuous graph sum stx_TestScore stndzxage TestScore AgeMonth, continuous floor graph sum stx_TestScore *The floor option uses a Tobit adjustment, which assumes a spread farther below *that which is censored. Censoring pushes the mean up. Without the adjustment, *the mean used to standardize is higher than the mean used to standardize with a *Tobit adjustment. Average standadrdized scores are higher in the Tobit adjustment *We can take ceilings into account as well. replace TestScore=60 if TestScore>60 & TestScore~=. stndzxage TestScore AgeMonth, floor ceiling minbinsize(30) reference(Male) graph *USING THE MEDIAN & RESCALING *The median can be used for standardizing instead of the mean. *A different standard mean/median & standard deviation can be selected stndzxage TestScore AgeMonth, sd(15) mean(100) binw(3) sum stx_TestScore stndzxage TestScore AgeMonth, median sd(15) mean(100) binw(3) sum stx_TestScore