------------------------------------------------------------------------------------------------------ log: c:\Imbook\bwebpage\Section2\mma08p3diagnostics.txt log type: text opened on: 17 May 2005, 14:10:13 . . ********** OVERVIEW OF MMA08P3DIAGNOSTICS.DO ********** . . * STATA Program . * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi . * used for "Microeconometrics: Methods and Applications" . * by A. Colin Cameron and Pravin K. Trivedi (2005) . * Cambridge University Press . . * Chapter 8.7.3 pages 290-1 . * Model diagnostics example (Table 8.3) . . * (A) DIFFERENT R-SQUAREDS . * (B) CALCULATION OF RESIDUALS . * for a Poisson model with simulated data (see below). . . * The data generation requires free Stata add-on command rndpoix . * In Stata: search rndpoix . . * This program gives results for model 2 . * For model 1 need to rerun with only x3 as regressor . . ********** SETUP ********** . . set more off . version 8.0 . set scheme s1mono /* Used for graphs */ . . ********** GENERATE DATA ********** . . * Model is . * y ~ Poisson[exp(b1 + b2*x2 + b3*x3] . * where . * x2 and x3 are iid ~ N[0,1] . * and b1=0.5 and b2=0.5 and b3=0.5. . . * The Diagnostics below are from Poisson regression of y on x3 alone . * or from Poisson regression of y on x3 and x3sq. [Note" x2 is omitted] . . set seed 10001 . set obs 100 obs was 0, now 100 . scalar b1 = 0.5 . scalar b2 = 0.5 . scalar b3 = 0.5 . . * Generate regressors . gen x2 = invnorm(uniform()) . gen x3 = invnorm(uniform()) . . * Generate y . gen mupoiss = exp(b1+b2*x2+b3*x3) . * The next requires Stata add-on. In Stata: search rndpoix . rndpoix(mupoiss) ( Generating ......... ) Variable xp created. . gen y = xp . sum Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- x2 | 100 .0053689 1.000686 -2.173506 2.106561 x3 | 100 -.0235884 1.024207 -2.857666 2.149822 mupoiss | 100 2.020511 1.400564 .3380426 7.029678 xp | 100 1.92 1.835013 0 8 y | 100 1.92 1.835013 0 8 . . * Write data to a text (ascii) file so can use with programs other than Stata . outfile y x2 x3 using mma08p3diagnostics.asc, replace . . ********* SETUP FOR THIS PROGRAM ********** . . * Change this if want different regressors . gen x3sq = x3*x3 . * global XLIST x3 /* Model 1 */ . global XLIST x3 x3sq /* Model 2 */ . . ********* R-SQUARED (reported in Table 8.3 p.291) ********** . . * The following code can be changed to diffferent models than poisson . * For RsqRES, RsqEXP and RsqCOR need . * y dependent variable . * yhat predicted value of dependent variable . * For RsqWRSS additionally need . * sigmasq predicted variance of dependent variable . * For RsqRG need log density evaluated at values given below . . * Obtain exp(x'b) Will vary with the model . poisson y $XLIST Iteration 0: log likelihood = -176.09611 Iteration 1: log likelihood = -176.09119 Iteration 2: log likelihood = -176.09119 Poisson regression Number of obs = 100 LR chi2(2) = 30.96 Prob > chi2 = 0.0000 Log likelihood = -176.09119 Pseudo R2 = 0.0808 ------------------------------------------------------------------------------ y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- x3 | .3588412 .07035 5.10 0.000 .2209578 .4967245 x3sq | .0912999 .0514311 1.78 0.076 -.0095032 .1921029 _cons | .492656 .0958903 5.14 0.000 .3047144 .6805975 ------------------------------------------------------------------------------ . predict yhat (option n assumed; predicted number of events) . scalar dof = e(N)-e(k) . . * RsqRES and RsqEXP are R-squared from sums of squares . * First get TSS, ESS and RSS . egen ybar = mean(y) . gen ylessybarsq = (y - ybar)^2 . quietly sum ylessybarsq . scalar totalss = r(mean) . gen yhatlessybarsq = (yhat - ybar)^2 . quietly sum yhatlessybarsq . scalar explainedss = r(mean) . gen residualsq = (y - yhat)^2 . quietly sum residualsq . scalar residualss = r(mean) . * Second computed the rsquared . scalar sereg = sqrt(residualss/dof) . scalar RsqRES = 1 - residualss/totalss . scalar RsqEXP = explainedss/totalss . . * RsqCOR uses sample correlation . quietly correlate y yhat . scalar RsqCOR = r(rho)^2 . . di "standard error of regression: " sereg standard error of regression: .16620308 . di "totalss: " totalss _n "explainedss: " explainedss _n "residualss: " residualss totalss: 3.3336 explainedss: .69556676 residualss: 2.6794761 . di "RsqRES: " RsqRES _n "RsqEXP: " RsqEXP _n "RsqCOR: " RsqCOR RsqRES: .19622149 RsqEXP: .20865333 RsqCOR: .19640666 . . * RsqWRSS uses weighted sums of squares . * First generate estimated variance of y . * Here for Poisson use fact that variance = mean . gen sigmasq = yhat . gen weightedylessybarsq = ((y - ybar)^2) / sigmasq . quietly sum weightedylessybarsq . scalar weightedtotalss = r(mean) . gen weightedresidualsq = ((y - yhat)^2) / sigmasq . quietly sum weightedresidualsq . scalar weightedresidualss = r(mean) . scalar RsqWRSS = 1 - weightedresidualss/weightedtotalss . di "RsqWRSS: " RsqWRSS RsqWRSS: .16945018 . . * RsqRG is from ML. Difficult to generalize beyond LEF models. . * Need . * lnL_fit log-likelihood at fitted values (the usual) . * lnL_0 log-likelihood at intecept only . * lnL_max log-likelihood at best fit . quietly poisson y $XLIST . scalar lnL_fit = e(ll) . scalar lnL_0 = e(ll_0) . * The following applies only for Poisson. Differs for otehr models. . * lnf(y) = -mu + y*ln(mu) - ln(y!) . * is maximized at mu = y . * so compute lnL_max = sum of [-y + y*ln(y) - lny!] . * Following sets 0*ln0 = 0 . gen ylny = 0 . replace ylny = y*ln(y) if y > 0 (51 real changes made) . gen lnfyatmax = -y + ylny - lnfact(y) . quietly sum lnfyatmax . scalar lnL_max = r(sum) . scalar RsqRG = (lnL_fit - lnL_0) / (lnL_max - lnL_0) . . * RsqQ should only be used for binary and other discrete choice models . * And definitely use only if lnL_fit < 0 . scalar RsqQ = 1 - lnL_fit/lnL_0 . . di "lnL_0: " lnL_0 _n "lnL_fit: " lnL_fit _n "lnL_max: " lnL_max lnL_0: -191.57162 lnL_fit: -176.09119 lnL_max: -101.12402 . di "RsqRG: " RsqRG _n "RsqQ: " RsqQ RsqRG: .17115358 RsqQ: .08080754 . . * Check . sum Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- x2 | 100 .0053689 1.000686 -2.173506 2.106561 x3 | 100 -.0235884 1.024207 -2.857666 2.149822 mupoiss | 100 2.020511 1.400564 .3380426 7.029678 xp | 100 1.92 1.835013 0 8 y | 100 1.92 1.835013 0 8 -------------+-------------------------------------------------------- x3sq | 100 1.039067 1.446146 .0000877 8.166255 yhat | 100 1.92 .838208 1.150405 5.398193 ybar | 100 1.92 0 1.92 1.92 ylessybarsq | 100 3.3336 5.966374 .0064 36.9664 yhatlessyb~q | 100 .6955668 1.572256 4.82e-06 12.09783 -------------+-------------------------------------------------------- residualsq | 100 2.679476 4.830379 .0000825 36.93972 sigmasq | 100 1.92 .838208 1.150405 5.398193 weightedyl~q | 100 1.681324 2.560112 .0018502 19.23135 weightedre~q | 100 1.396423 2.424518 .0000276 19.21747 ylny | 100 2.15694 3.48234 0 16.63553 -------------+-------------------------------------------------------- lnfyatmax | 100 -1.01124 .6233793 -1.969071 0 . poisson y $XLIST /* Stata Rsq = RsqQ */ Iteration 0: log likelihood = -176.09611 Iteration 1: log likelihood = -176.09119 Iteration 2: log likelihood = -176.09119 Poisson regression Number of obs = 100 LR chi2(2) = 30.96 Prob > chi2 = 0.0000 Log likelihood = -176.09119 Pseudo R2 = 0.0808 ------------------------------------------------------------------------------ y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- x3 | .3588412 .07035 5.10 0.000 .2209578 .4967245 x3sq | .0912999 .0514311 1.78 0.076 -.0095032 .1921029 _cons | .492656 .0958903 5.14 0.000 .3047144 .6805975 ------------------------------------------------------------------------------ . . *** The following results are for Model 2 in Table 8.3 p.291 . *** For model 1 R-squareds need to rerun with only x3 as regressor . di "standard error of regression: " sereg standard error of regression: .16620308 . di "RsqRES: " RsqRES _n "RsqEXP: " RsqEXP _n "RsqCOR: " RsqCOR RsqRES: .19622149 RsqEXP: .20865333 RsqCOR: .19640666 . di "RsqWRSS: " RsqWRSS _n "RsqRG: " RsqRG _n "RsqQ: " RsqQ RsqWRSS: .16945018 RsqRG: .17115358 RsqQ: .08080754 . . ********* RESIDUAL ANALYSIS (text bottom p.290 to top p.291) ********** . . * Assume that from earlier have yhat . . * raw residual . gen raw = y - yhat . gen sigma = sqrt(yhat) . gen Pearson = (y - yhat)/sigma . * Note that earlier defined ylny = 0 if y=0 and = yln(y) otherwise . gen deviance = sign(y-yhat)*sqrt(2*(-y+ylny)-2*(-yhat+y*ln(yhat))) . . *** The following are results reported in text bottom p.290 to top p.291 . sum raw Pearson deviance Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- raw | 100 -2.38e-09 1.645157 -2.993904 6.077806 Pearson | 100 -.0014455 1.187656 -1.498094 4.383774 deviance | 100 -.2103819 1.212345 -2.016939 3.264961 . corr raw Pearson deviance (obs=100) | raw Pearson deviance -------------+--------------------------- raw | 1.0000 Pearson | 0.9852 1.0000 deviance | 0.9625 0.9818 1.0000 . * Example of use to find whether x3 belongs in the model . * graph twoway scatter Pearson x3 . . ********** CLOSE OUTPUT . log close log: c:\Imbook\bwebpage\Section2\mma08p3diagnostics.txt log type: text closed on: 17 May 2005, 14:10:13 ----------------------------------------------------------------------------------------------------