Test 2: confidence intervals and tests

Download your data. In Stata, type:
use http://web.missouri.edu/~kolenikovs/Stat3500/test2/data-test2-#####.dta, clear
where ###### is your course ID number in the upper right hand corner of your syllabus.

The exercise concerns confidence intervals and testing statistical hypotheses.

With this data, answer the following questions, reporting the results with four accurate decimal points if that is a non-integer number:

  1. What is the mean of variable x?
  2. Test hypothesis that the mean of x is equal to 2.7 with a two-sided alternative. Report the p-value.
  3. Should the null hypothesis be retained or rejected at 5% level? Enter 1 if the null is rejected, and 0 if the null is retained.
  4. Identify the confidence interval in the test output. Report the 90% confidence interval, with Answers to Question 4 being the left limit...
  5. ... and the Answer to Question 5 being the right limit.
  6. Variable group contains information on whether the observation belongs to group 1 or group 2. Test the hypothesis that the proportion p in group 1 is at least 0.50, with one-sided alternative. So, H0: p ≥ 0.5, H1: p < 0.5. Report p-value of the test.
  7. Should the null hypothesis be retained or rejected at 10% level? Enter 1 if the null is rejected, and 0 if the null is retained.
  8. Let us now compare the values of variable y between the two groups. For question 8, report the mean of group 1.
  9. Report the mean of y in group 2.
  10. Report the difference in group means.
  11. The default null hypothesis is H0: there are no differences between group means. What is the value of t-statistic?
  12. What is the degrees of freedom of t-statistic?
  13. If the alternative hypothesis is two-sided, should the null be retained or rejected at 10% level? Enter 1 if the null is rejected, and 0 if the null is retained.
  14. A modification of t-test can account for differences in variances between the two groups. This can be done with unequal option of ttest. Try it out. First of all, did the reported mean change? For question 14, enter 0 if they did not change, and 1 if they did change.
  15. Report the value of t-statistic.
  16. What is the degrees of freedom of t-statistic? Note that it might be a fractional number, and that is expected.
  17. If the alternative hypothesis is two-sided, should the null be retained or rejected at 10% level? Enter 1 if the null is rejected, and 0 if the null is retained.
  18. Does it make a difference whether the correction for different variances is made? Enter 0 if your test results are the same, enter 1 if they are different.
Create a file called answers-test2-#####.dta with two variables, Question and Answer. (Capitalization is important in the variable names!) The first variable will contain the question number, and the second variable, your numeric answer. In Stata, this can be achieved using post commands (see help post):
. postfile test2 Question Answer using answers-test2-#####, replace
. post test2 (1) (your answer here)
. ...
. postclose test2

Parentheses in the post command are important!

Upload the resulting file to the blackboard. Go to: Assignments / Stata Assignments / Stata test2 / View and complete assignment. In panel 2, there is "Attach local file" window -- that's where your answers-test2-#####.dta will go. Click "Submit" to finish.

Example using post

A student by the name of John Doe has a Course ID # set to 99999. He starts the log file, downloads his data and sets up his homework assignment file:

. log using John-Doe-test2, replace
(note: file John-Doe-test2.smcl not found)
--------------------------------------------------------------------------------
       log:  John-Doe-test2.smcl
  log type:  smcl

. use http://web.missouri.edu/~kolenikovs/Stat3500/test2/data-test2-99999, clear
(Assignment test2 for student # 99999)

. postfile test2 Question Answer using answers-test2-99999, replace
(note: file answers-test2-99999.dta not found)

.

The first few questions deal with the results of a t-test, so let's take a look at it:
. help ttest

. ttest x = 2.7

One-sample t test
------------------------------------------------------------------------------
Variable |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
       x |      35    2.998796    .1071178    .6337177    2.781107    3.216486
------------------------------------------------------------------------------
    mean = mean(x)                                                t =   2.7894
Ho: mean = 2.7                                   degrees of freedom =       34

   Ha: mean < 2.7               Ha: mean != 2.7               Ha: mean > 2.7
 Pr(T < t) = 0.9957         Pr(|T| > |t|) = 0.0086          Pr(T > t) = 0.0043

Here, John can find all he needs for the first three questions: the mean of the data, the p-value (the middle column in the bottom part of the output), and since 0.0086 is lower than the suggested cut-off of 5%, the null can be rejected:
. post test2 ( 1 ) ( 2.9988 )

. post test2 ( 2 ) ( 0.0086 )

. post test2 ( 3 ) ( 1 )

.

However, the confidence interval reported is at 95% level, not at 90% level. To fix that, one can specify level option (not only in ttest, but in pretty much any Stata command that reports confidence intervals as a part of output):
. ttest x = 2.7, level( 90 )

One-sample t test
------------------------------------------------------------------------------
Variable |     Obs        Mean    Std. Err.   Std. Dev.   [90% Conf. Interval]
---------+--------------------------------------------------------------------
       x |      35    2.998796    .1071178    .6337177    2.817668    3.179925
------------------------------------------------------------------------------
    mean = mean(x)                                                t =   2.7894
Ho: mean = 2.7                                   degrees of freedom =       34

   Ha: mean < 2.7               Ha: mean != 2.7               Ha: mean > 2.7
 Pr(T < t) = 0.9957         Pr(|T| > |t|) = 0.0086          Pr(T > t) = 0.0043

. post test2 ( 4 ) ( 2.8177 )

. post test2 ( 5 ) ( 3.1799 )

.

The next few questions deal with a test for proportions. There are several ways to proceed with that. Probably the easiest one is just to tabulate the data and see what the counts are:
. tab group

      group |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |         12       34.29       34.29
          2 |         23       65.71      100.00
------------+-----------------------------------
      Total |         35      100.00

.

This information can be used for proportion test. John Doe does not know what the command is (and neither does Stas who writes for him), so he would have to do some digging:
. search test for proportions

Keyword search

        Keywords:  test for proportions
          Search:  (1) Official help files, FAQs, Examples, SJs, and STBs

Search of official help files, FAQs, Examples, SJs, and STBs


[R]     bitest  . . . . . . . . . . . . . . . . . .  Binomial probability test
        (help bitest)

[R]     ci  . . . . .  Confidence intervals for means, proportions, and counts
        (help ci)

[R]     prtest  . . . . . . . . . . . One- and two-sample tests of proportions
        (help prtest)
--Break--

... and another three or five screens of output omitted by pressing the Break button. Click on that bitest command in Stata output to invoke the help screen. OK, if we have the counts, we can use the ``immediate'' version of the command:
. bitesti 35 12 0.5

        N   Observed k   Expected k   Assumed p   Observed p
------------------------------------------------------------
       35         12         17.5       0.50000      0.34286

  Pr(k >= 12)            = 0.979520  (one-sided test)
  Pr(k <= 12)            = 0.044766  (one-sided test)
  Pr(k <= 12 or k >= 23) = 0.089531  (two-sided test)

.

OK, now we have some figuring out to do: which p-value to use? The question asked for the null hypothesis that the first group makes up at least half of the data. The critical, or rejection, region for that would correspond to small counts of the first group, and that translates to ≤ sign in the output. Hence, the right p-value is in the second row, and it is low enough to claim rejection:
. post test2 ( 6 ) ( 0.0448 )

. post test2 ( 7 ) ( 1 )

.

The next big block of questions deals with two-sample t-test. The help file is already somewhere on the screen, so let's identify what we need for a two-sample comparison:
. ttest y, by( group )

Two-sample t test with equal variances
------------------------------------------------------------------------------
   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
       1 |      12     .320678    .0550999    .1908717    .1994039    .4419521
       2 |      23    .5068851    .0929011    .4455381      .31422    .6995502
---------+--------------------------------------------------------------------
combined |      35    .4430427    .0650873    .3850617    .3107694     .575316
---------+--------------------------------------------------------------------
    diff |           -.1862071    .1353579               -.4615948    .0891805
------------------------------------------------------------------------------
    diff = mean(1) - mean(2)                                      t =  -1.3757
Ho: diff = 0                                     degrees of freedom =       33

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 0.0891         Pr(|T| > |t|) = 0.1782          Pr(T > t) = 0.9109

.

The third column gives us the means and their differences. Just below the table, John finds the t-statistic and its degrees of freedom. The two-sided alternative sits in the middle column of the bottom part of the output, and the p-value is not convincing. A long string of post's can now follow:
. post test2 ( 8 ) ( 0.3207 )

. post test2 ( 9 ) ( 0.5069 )

. post test2 ( 10 ) ( -0.1862 )

. post test2 ( 11 ) ( -1.3757 )

. post test2 ( 12 ) ( 33 )

. post test2 ( 13 ) ( 0 )

.
One of the assumptions of the standard t-test was that the variances are the same in the two groups. Well in John Doe's output, the standard deviations differ by the factor of more than 2... that's hardly satisfactory! There is a somewhat more advanced version of the procedure that allows to account for unequal variances, and this can be invoked with unequal option:
. ttest y, by( group ) unequal

Two-sample t test with unequal variances
------------------------------------------------------------------------------
   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
       1 |      12     .320678    .0550999    .1908717    .1994039    .4419521
       2 |      23    .5068851    .0929011    .4455381      .31422    .6995502
---------+--------------------------------------------------------------------
combined |      35    .4430427    .0650873    .3850617    .3107694     .575316
---------+--------------------------------------------------------------------
    diff |           -.1862071    .1080121               -.4061604    .0337461
------------------------------------------------------------------------------
    diff = mean(1) - mean(2)                                      t =  -1.7239
Ho: diff = 0                     Satterthwaite's degrees of freedom =   32.225

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 0.0472         Pr(|T| > |t|) = 0.0943          Pr(T > t) = 0.9528

.

Did the means change? No, they did not, and they should not have: we are adjusting the variances, not the means. The changes are concentrated in the last row of the table where the differences are reported and analyzed. The t-statistic moved away from zero a little bit, and the p-value has dropped below the desired cut-off of 10%. So the information that John needs to enter is:
. post test2 ( 14 ) ( 0 )

. post test2 ( 15 ) ( -1.7239 )

. post test2 ( 16 ) ( 32.225 )

. post test2 ( 17 ) ( 1 )

. post test2 ( 18 ) ( 1 )

.

Finally, John is wrapping things up and verifies his file:
. postclose test2

. use answers-test2-99999, clear

. list

     +--------------------+
     | Question    Answer |
     |--------------------|
  1. |        1    2.9988 |
  2. |        2     .0086 |
  3. |        3         1 |
  4. |        4    2.8177 |
  5. |        5    3.1799 |
     |--------------------|
  6. |        6     .0448 |
  7. |        7         1 |
  8. |        8     .3207 |
  9. |        9     .5069 |
 10. |       10    -.1862 |
     |--------------------|
 11. |       11   -1.3757 |
 12. |       12        33 |
 13. |       13         0 |
 14. |       14         0 |
     |--------------------|
 15. |       15   -1.7239 |
 16. |       16    32.225 |
 17. |       17         1 |
 18. |       18         1 |
     +--------------------+

. log close

.

Now, the only thing left is to upload this resulting file to the Blackboard.


Any questions? Ask Stas or email to the class list.