capture program drop pinocchio program pinocchio version 11.0 #d ; local pinocchio = `" "1. In a univariate regression model (i.e., a model without explanatory variables) the constant or intercept has no interpretation. " "2. In a bivariate regression the constant (or intercept) is equal to the mean of the dependent variable (Y): constant or apha = Y_bar." "3. In a bivariate regression the R-squared is equal to the correlation coefficient." "4. In a bivariate regression the R-squared is equal to the correlation coefficient." "5. In a bivariate regression the R-squared will always be greater than the absolute value of the correlation coefficient. " "6. In a bivariate regression the p-value of an F-test is always greater than that of a t-test." "7. In a bivariate linear regression the beta coefficient is calculated as the covariance between X and Y divided by the variance of Y. " "8. Regression residuals are calculated as the differnce between predicted and observed values, therefore positive residuals indicate overpredictions and negative residuals underpredictions." "9. A bivariate regression is a model with two explanatory variables (X1 and X2) to predict or explain the variation in another variable (Y)." "10. In a simple linear regression the error or residuals of the estimation are calculated as e = Y_obs - alpha_hat + B1_hat*X1_obs + B2_hat*X2_obs. " "11. In a regression model the Root Mean Square Error (RMSE) is equal to the square root of the Model Mean Squares (MMS) or the square root of the quotient of the Model Sum of Squares (MSS) and the Model degrees of freedom (k). " "12. In a regression the F-statistic is equal to the Model Mean Squares (Model MS) minus the Residual Mean Squares (Residual MS)." "13. The Mean Squares (MS) are the Sum of Squares (SS) divided by the number of observations (N). " "14. In a regression, the t-statistic of any variable is equal to its beta coefficient divided by its standard deviation, i.e., t=beta/sd. " "15. The R-squared of a regression ranges between 0 and 100." "16. The Adjusted R-squared and the R-squared are always positive. " "17. The Adjusted R-squared adjusts for heteroskedasticity in the data. " "18. The R-squared is equal to the Total Sum of Squares minus the Model Sum of Squares divided by the Total Sum of Squares, i.e., (TSS-MSS)/TSS = TSS/TSS - MSS/TSS = 1 - MSS/TSS." "19. The F-test of a regression evaluates whether all of the explanatory variables are non-zero. As long as all variables are statistically significant the alternative hypothesis is accepted. " "20. An F-test in a regression model, tests whether the R-squared is statistically significant. " "21. The F-test of a regression model is always two-tailed." "22. The Total Sum of Squares (TSS) measures the difference between the Residual Sum of Squares (RSS) and the Model Sum of Squares (MSS). " "23. The Total Sum of Squares (TSS) is equal to the sum of the differences between the observed and the mean value of Y, i.e., TSS = SUM(Y_obs - Y_bar)." "24. The Residual Sum of Squares (RSS), also known as the Sum of Squared Errors (SSE), is the sum of the residuals or the sum of the differences between the observed Y and the predicted Y, i.e., RSS = SUM(Y_obs - Y_hat)." "25. The Model Sum of Squares (MSS), also known as the Explained Sum of Squares (ESS) or the Regression Sum of Squares (SSR), is the sum of the differences between the predicted and the mean value of Y, i.e., MSS = SUM(Y_hat - Y_bar)." "26. The Explained Sum of Squares (ESS) is approximately equal to the Model Sum of Squares (MSS). " "27. The Sum of Squares can be expressed as TSS=RSS+MSS or (Y_obs - Y_bar) = (Y_obs - Y_hat) - (Y_hat - Y_bar)" "28. In a regression model, Y and X are the independent and dependent variables, respectively." "29. In a regression model, Y and X are the regressor and the regressand, respectively." "30. Depending on the context, the independent variable X is sometimes also called regressand, outcome, predicted variable, explained variable, response variable, measured variable, observed variable, responding variable, and output variable. " "31. Depending on the context, the dependent variable Y is sometimes also called regressor, covariate, predictor variable, exposure variable, control variable, manipulated variable, explanatory variable, and input variable. " "32. In econometrics, the error term is an estimate of the disturbance term. " "33. In regression analysis, statistical errors and residuals are the same. " "34. The errors and the residuals are the same thing. " "35. In a regression fitted values differ from predicted values as they are closer to the true observed value. " "36. In a regression, the Residual degrees of freedom (df) is equal to N-k, where N is the number of observations and k the number of explanatory variables. " "37. Residual degrees of freedom (df) represent the number of used parameters by the regression model and is equal to N-k-1." "38. Model degrees of freedom (df) are the number of unused explanatory variables by the regression model and is equal to N-k-1." "39. In a log-linear or log-level regression model, a beta of 0.05 indicates that a 1 unit change in X causes approximately a 0.05*100=5 percentage point (pp) change in Y." "40. In a linear-log or level-log regression model, a beta of 250 indicates that a 1% change in X causes approximately a 250/100 = 2.5 percentage point increase in Y. " "41. In a log-log regression model, a beta of 3 indicates that a 1 percentage point (pp) change in X causes approximately a 3 percentage point (pp) increase in Y. " "42. To interpret a model with only a log-transformed dependent variable (Y) exponentiate the coefficien. For example, if the coefficient is 0.251 then exp(0.251) = 1.2853. For every one-unit increase in X, the Y variable increases by 1.2853%. " "43. To interpret a model with only a log-transformed independent variable (X) multiply the coefficient by e(1.x) where x is the percent increase and e is Euler's number. For example, if the coefficient is 0.251 then a 10% increase in X causes a change in Y by 0.251*e(1.10)=0.75404 units." "44. To interpret a model where both dependent (Y) and independent (X) variables are log-transformed calculate 1.x to the power of the coefficient, where x is the percent change in X. For example, if the coefficient is 0.251 a 10% increase in X will change Y by (1.10^0.251=1.024. " "45. In a log-transformed dependent variable (Y) model (i.e., log-linear) a larger beta improves our approximation interpretation of the effect of X on Y." "46. In a log-transformed dependent variable (Y) model (i.e., log-linear) the true effect of beta is smaller than its approximation. " "47. The root mean square error (Root MSE) can be loosly interpreted as the average standard error of the model's beta coefficients. " "48. The Durbin–Wu–Hausman test (often referred to as the Hausman test) is used to determine whether the errors in a regression model are serially uncorrelated. " "49. The Durbin–Watson and the Breusch–Godfrey tests are used to detect model mispecification. " "50. The Ramsey Regression Equation Specification Error Test (RESET) is a general test multicollinearity in regression analysis. " "51. The Variance Inflation Factor (VIF) is a way to measure and test for heteroskedasticity in regression analysis." "52.The Jarque-Bera test is a test for multicollinearity and is based on the rule of thumb that if the test statistic is more than 5 or 10 there is multicollinearity. " "53.The Granger causality test is used to determine if two variables are casusally associated or just correlated. " "54. The Breusch–Pagan and the White Test are used to decide between random effects and fixed effects in panel data analysis. " "55. Heteroskedasticity tests check whether the residuals are correlated. " "56. The Interquartile Range (IQR) is a measure of statistical dispersion and is defined as the difference between the mean and the median of the data. " "57. A boxplot is a standardized way of displaying the distribution of data based on the 10th, 25th, 50th, 75th, and 90th percentiles." "58. The mean, median, and mode are three common measures of describing the variance of a set of data." "59. In mathematics, the mean always refers to the Pythagorean arithmetic mean (i.e., the average)." "60. The Pythagorean arithmetic mean is a type of average that is calculated by taking the nth root of the product of n numbers." "61. The Pythagorean geometric mean is a type of average that takes the reciprocal of the arithmetic mean of the reciprocal terms in that data set." "62. The Pythagorean harmonic mean is a type of average that takes the sum of all measurements and divides by the number of observations in the data set." "63. Central tendency is a statistical concept that summarizes the ability of a variable to converge with another variable. " "64. The truncated mean (also known as the trimmed mean) is a type of average that is calculated on a data set where we suspect right or left censoring. " "65. The Winsorized mean is a type of average that removes extreme values to reduce the influence of outliers. " "66. In mathematics, the mode represents the most central value in a dataset (i.e., the 50th percentile)." "67. Chebyshev's inequality, also known as Chebyshev's theorem, states that 99% of a distribution's values must be within +/- 3 standard deviations from the mean." "68. Chebyshev's inequality states that over 1 − 1/k^2 of a distribution's values are more than k standard deviations away from the mean. " "69. The 68–95–99.7 rule, also known as the empirical rule, is a rule of thumb that applies to all distributions." "70. In any distribution, approximately 68% of the data falls within one standard deviation (SD) of the mean, approximately 95% falls within two SD, and approximately 99.7% falls within three SD." "71. The three-sigma rule of thumb states that all values in a normal distribution lie within 3 standard deviations of the mean. " "72. According to Chebyshev's theorem at least 98.8% of cases should fall within three standard deviations of the mean. " "73. In a normal distribution, approximately 99.7% of the data lies in a range of 3 standard deviations. " "74. A skewed to the right distribution (right-skewed or postive-skew distribution) is a distribution where the right-tail of the distribution is shorter than the left-fail." "75. In a right-skewed distribution, the majority of the data is concentrated on the right side of the distribution." "76. As the degrees of freedom increase, the t-distribution diverges from the standard normal distribution." "77. The Gaussian distribution is a special type of distribution with heavy-tails that is used to explain data with a higher likelihood of extreme values. " "78. A uniform distribution is a continuous probability distribution where each subsequent value has a higher probability of occuring then the previous one. " "79. A rectangular distribution is a heavy-tailed distribution that gives it a rectangular appearance. " "80. The critical value for an upper one-tailed test at the 95% confidence level assuming a normal distribution (5% significance level) is approximately 1.96" "81. The critical value for a two-tailed test at the 95% confidence level assuming a normal distribution (5% significance level) is approximately +/- 1.645. " "82. The critical value for a two-tailed test at the 99% confidence level assuming a normal distribution (1% significance level) is approximately +/- 1.96. " "83. For any given significance or confidence level, one-tailed and two-tailed tests have the same critical value. " "84. A standard normal distribution is a Gaussian distribution taking any values. " "85. A sampling distribution is the bell-shaped distribution of values taken from a sample. " "86. A discrete distribution is a probability distribution that describes the occurrence of very rare events. " "87. The roll of a die generates a continuous distribution with p = 1/6 for each outcome." "88. Percentage points (pp) and percent (%) are terms that can be used interchangeably. " "89. The covariance between two variables, just like the variance, can only be positive. " "90. It is possible to perform logarithmic transformations on any given value." "91. To standardize a variable divide each observation by the variable's standard deviation." "92. Panel data is also known as longitudinal data and repeated cross-sectional data. " "93. A random sample is a sampling technique in which existing study participants recruit additional participants from their social network or community to join the study. " "94. A confounder (Z) is the variable that explains the relationship between the explanatory (X) and the dependent (Y) variables. In other words, it is the mechanism that causes the association between X and Y." "95. A mediator (M) is an omitted variable that can cause a spurious relationship between the explanatory (X) and the dependent (Y) variables when not controlled for. " "96. In multilevel modelling, the grand mean refers to the average of the outcome variable (Y). " "97. In multilevel analysis, the model's constant is the average value of the outcome variable (Y). " "98. In panel data analysis, fixed effects models are not always the preferred choice because they are not always consistent." "99. In panel data analysis, a random effects model is less efficient than a fixed effects model but could be an inconsistent estimator." "100. In a fixed effects model, the effect of time-invariant and cluster-invariant (i.e., constant) variables is reduced. " "101. It is not possible to estimate the effect of constants, such as time- or cluster-invariant variables, when using a model that estimates within-unit effects. " "102. In a random effects model, all unobserved heterogeneity is removed. " "103. Multilevel, hierarchical, nested, and panel analysis are terms that refer to different data structures. " "104. A variance components analysis is a type of principal component analysis used to reduce the dimensionality of a data set. " "105. The intraclass correlation is the ratio of the within-cluster variance to the total variance." "106. The intraclass correlation can explain the mechanisms behind the variability in an outcome. " "107. Odds refer to the likelihood of an event occurring in one group to that of it occurring in another group. " "108. Odds ratio refer to the likelihood of an event occurring compared to the likelihood of it not occurring." "109. If the odds ratio (OR) is > 1 this means that the odds are lower in numerator group A than in denominator group B." "110. The probability and the odds are bounded between 0 and 1. " "111. Permutations refer to the selection of objects without considering the order, while combinations refer to the arrangements of objects in a specific order. " "112. Permutations and combinations are the same mathematical concept." "113. The second order condition of a function at a critical point identifies the slope of a function. " "114. If the second order condition (second derivative) of the function evaluated at the critical point is positive, then the critical point is a local maximum. " "115. If the second order condition (second derivative) of the function evaluated at the critical point is negative, the critical point is a local minimum. " "116. An open interval includes its endpoints, and is indicated with parentheses. For example, (0,1). " "117. A closed interval does not include its limit points, and is denoted with square brackets. For example, [0,1]. " "118. A half-open interval denotes an interval that is open at its lower value and closed at its upper value." "119. A correlation of zero means that two variables are not dependent at all." "120. Hazard ratios and relative risk ratios are the same. " "121. The relative risk ratio is an adjustment correction of the risk ratio." "122. Odds ratios measure the number of positive events in relation to the number of trials. " "123. The odds ratio of flipping a head is 0.5 (or 1:2), while the risk ratio of flipping a head is 1:1. " "124. Risk measures relative probability." "125. The null hypothesis is the hypothesis that there is a significant difference between two groups or variables, and the alternative hypothesis is the hypothesis that there is no significant difference between two groups or variables." "126. If the effect size is larger, it will become harder to detect it, requiring a larger sample. " "127. Larger samples result in a greater chance of rejecting the null hypothesis, which means a decrease in the power of the hypothesis test." "128. Power is the probability of correctly rejecting the alternative hypothesis (H1) when it is false. " "129. Power is the probability of finding a false positive. " "130. A Type 1 Error is the mistaken rejection of the alternative hypothesis (H1). " "131. A Type 1 Error is a False Negative (failing to reject a false null hypothesis)." "132. A Type 1 Error rejects H1 when it is true." "133. A Type 2 Error is the mistaken acceptance of the alternative hypothesis (H1). " "134. A Type 2 Error is a False Positive (rejecting a true null hypothesis)." "135. A Type 2 Error fails to reject H1 when it is false." "136. The probability of correctly failing to reject a true null hypothesis (H0) is equal to 1 minus beta." "137. The probability of correctly rejecting a false null hypothesis (H0) is equal to 1 minus alpha." "138. The probability of making a Type 1 Error is equal to beta." "139. The probability of making a Type 2 Error is equal to alpha." "140. A large p-value means that there is certainly no evidence of a relationship between the two variables." "141. A confidence interval is calculated as the mean +/- the standard error." "142. The standard error is calculated by dividing the standard deviation by the sample size. " "' ; #d cr local countstatements : word count `pinocchio' local randomnum = runiformint(1, `countstatements') local statement : word `randomnum' of `pinocchio' di "`statement'" end