* racd05.do  January 2013 for Stata version 12

capture log close
log using racd05.txt, text replace

********** OVERVIEW OF racd05.do **********

* STATA Program 
* copyright C 2013 by A. Colin Cameron and Pravin K. Trivedi 
* used for "Regression Analyis of Count Data" SECOND EDITION
* by A. Colin Cameron and Pravin K. Trivedi (2013)
* Cambridge University Press

* Chapter 5
*  5.2.5 BASICS
*  5.3.4 GOODNESS-OF-FIT

* To run you need file
*   racd05data.dta
* in your directory
* and Stata user-written command
*   countfit

********** SETUP **********

set more off
version 12
clear all
set linesize 82
set scheme s1mono  // Graphics scheme

************

* This STATA program does analysis of takeover bids studied in chapter 5
*  5.2.5 RESIDUALS
*  5.3.4 R-SQUARED and GOODNESS-OF-FIT
*  5.4.2 TESTS OF NONNESTED MODELS 


********** DATA DESCRIPTION

* The original data are from Sanjiv Jaggia and Satish Thosar, 1993,
* "Multiple Bids as a Consequence of Target Management Resistance"
* Review of Quantitative Finance and Accounting, 447-457.
* The data are also used in 
* A.C. Cameron and Per Johansson (1997), 
* "Count Data Regression Models using Series Expansions: with Applications", 
* Journal of Applied Econometrics, May, Vol. 12, pp.203-223.

* For more details see these datasets and racd05makedata.dta

*************** 5.2.5 TAKEOVER BIDS: DESCRIPTIVE STATISTICS 

use racd05data.dta, clear

summarize
describe 

global XLIST LEGLREST REALREST FINREST WHTKNGHT BIDPREM INSTHOLD SIZE SIZESQ REGULATN

*** TABLE 5.1: ACTUAL FREQUENCY DISTRIBUTION

tabulate NUMBIDS

*** TABLE 5.2: VARIABLE DEFINITIONS AND SUMMARY STATISTCS

describe NUMBIDS $XLIST
summarize NUMBIDS $XLIST

*************** 5.2.5 (Continued) RESIDUALS

*** TABLE 5.3: POISSON QMLE Estimates, Standard Errors, T-statistics

poisson NUMBIDS $XLIST, vce(robust)

*** OVERDISPERSION TESTS presented in text - here underdispersion
* Estimate from Pearson statistic divided by (n-k)
quietly glm NUMBIDS $XLIST, family(poisson)
display "Var = phi*E[y] where phi = " e(dispers_ps)
* LM Overdispersion test - here underdispersion
quietly poisson NUMBIDS $XLIST 
predict mu, n
generate ystar = ((NUMBIDS - mu)^2 - NUMBIDS) / mu
* Test against NB2 variance
regress ystar mu, noconstant
* Test against NB1 variance
regress ystar, vce(robust)
sum mu ystar
drop mu ystar

*** CONSTRUCT RESIDUALS after command glm
* NOTE: Stata glm uses different terminology from the book
* Stata standardized multiplies residual by (1-h_ii)^(-1/2)
* We call this studentized (our star)
* Stata studentized multiplies residual by one over the 
* estimated square root of the estimated scale parameter
* NOTE: Deviance residual differs from that in First Edition (error in first)
quietly glm NUMBIDS $XLIST, family(poisson)
predict mu, mu
generate raw = NUMBIDS - mu
predict pear, pearson
predict pearstar, pearson standardized
predict dev, deviance
predict devstar, deviance standardized
generate devadj = dev + 1/(6*sqrt(mu))
predict anscombe, anscombe
predict hat, hat
* Extras for completeness
predict pearstud, pearson studentized
predict pearstan, pearson standardized
predict devstud, deviance studentized
predict devstan, deviance standardized

*** TABLE 5.4: DESCRIPTIVE STATISTICS FOR VARIOUS REDSIDUALS

summarize raw pear pearstar dev devstar devadj anscombe
tabstat raw pear pearstar dev devstar devadj anscombe, ///
  statistics(mean sd skew kurt min p10 p90 max) col(stat) format(%9.2f)

*** TABLE 5.5: CORRELATIONS OF VARIOUS REDSIDUALS

correlate raw pear pearstar dev devstar devadj anscombe

*** RESIDUAL PLOTS (several)

* Anscombe residual plotted against y
label variable anscombe "Anscombe residual"
graph twoway scatter anscombe NUMBIDS, msize(medium) xlabel(#6) saving(racd05graph1, replace)
* graph twoway (scatter anscombe NUMBIDS, msize(medium)) ///
*  (lowess anscombe NUMBIDS, lwidth(medthick)), xlabel(#6) saving(racd05graph1, replace)

* Anscombe residual plotted against fitted mean
label variable mu "Predicted bids"
graph twoway scatter anscombe mu, msize(medium) xlabel(#6) saving(racd05graph2, replace)
* graph twoway (scatter anscombe mu, msize(medium)) ///
*   (lowess anscombe mu, lwidth(medthick)), xlabel(#6) saving(racd05graph2, replace)

* Ordered anscombe residual plotted against standard normal ordinates 
* NOTE: Axes reversed from the First Edition
qnorm anscombe, msize(medium) xlabel(#6) saving(racd05graph3, replace)

* Diagonal entries in Hat matrix for each observation plotted against observation number
generate obsno = _n
label variable obsno "Observation number"
label variable hat "Diagonal entry in H"
graph twoway scatter hat obsno, msize(medium) xlabel(#6) saving(racd05graph4, replace)

*** FIGURE 5.1: RESIDUAL PLOTS

graph combine racd05graph1.gph racd05graph2.gph racd05graph3.gph racd05graph4.gph, ///
   iscale(0.7) ysize(5) xsize(6) rows(2)
graph export racd05fig1.eps, replace
graph export racd05fig1.wmf, replace

* Identify and drop the observations with largest HAT matrix diagonal term
poisson NUMBIDS $XLIST, vce(robust)
estimates store PFULL
scalar kreg = e(k)
scalar Nobs = e(N)
list obsno hat if hat > 3*kreg/Nobs
poisson NUMBIDS $XLIST if hat < 3*kreg/Nobs, vce(robust) 
estimates store PNOOUTLIERS
estimates table PFULL PNOOUTLIERS, b(%9.3f) se stats(ll)

*************** 5.3.4 R-SQUARED and CHISQUARE GOODNESS-OF-FIT

*** Deviance, Pearson and R-squared measures presented in text
* Fitted model
glm NUMBIDS $XLIST, family(poisson) vce(robust)
display "Deviance Statistic = " e(deviance)
display "Pearson Statistic  = " e(deviance_p)
scalar Devfitted = e(deviance)
scalar Pearsfitted = e(deviance_p)
* Intercept-only model
quietly glm NUMBIDS, family(poisson) vce(robust)
display "Deviance Statistic = " e(deviance)
display "Pearson Statistic  = " e(deviance_p)
scalar Devintercept = e(deviance)
scalar Pearsintercept = e(deviance_p)
* Calculate R-squared Deviance and Pearson
scalar R2_Dev = 1 - Devfitted/Devintercept
scalar R2_Pears = 1 - Pearsfitted/Pearsintercept
display "Deviance R-squared = " R2_Dev "   Fitted = " Devfitted "   Intercept = " Devintercept 
display "Pearson R-squared  = " R2_Pears "   Fitted = " Pearsfitted "   Intercept = " Pearsintercept
* Squared correlation coefficient
capture drop mu
quietly poisson NUMBIDS $XLIST, vce(robust)
predict mu, n
quietly correlate NUMBIDS mu
display "Squared correlation coefficient = " r(rho)^2
* Compare to OLS
quietly regress NUMBIDS $XLIST
display "OLS R-squared = " e(r2)

*** Predicted Probabilities and begin Chi-square Goodness-of-fit test

** In January 2013 there was a forthcoming Stata hournal article and 
** user-written addon to implement chisquare goodness of fit test.

* This program written for categories j = 0, 1, 2, ..., $REST or more
global Y NUMBIDS
global MAXCOUNT = 4        // Form cells y = 0, 1, 2, ... , maxcount
global REST = 5            // The remaining category y >= $REST
* Create indicators for y = 0, 1, 2, ...., maxcount and y >= $REST
forvalues i = 0/$MAXCOUNT {
   generate Dummy`i' = $Y==`i'
   }
generate Dummy$REST = $Y > $MAXCOUNT
* Create corresponding predicted probabilites of y = 0, 1, 2, ...
quietly poisson $Y $XLIST
forvalues i = 0/$MAXCOUNT {
   predict Predicted`i', pr(`i')
   }
predict Predicted$REST, pr($REST,.)
* The preceding required Stata 12. Could instead use user-written addon countfit
* or use recursion for Poisson probabilities as follows ..
/*
quietly poisson $Y $XLIST
capture drop mu
predict mu, n
generate Predicted0 = exp(-mu)
forvalues i = 1/$MAXCOUNT {
   local j = `i' - 1
   generate Predicted`i' = Predicted`j'*mu/`i'
   }
generate Predicted$REST = 1
forvalues i = 0/$MAXCOUNT {
   replace Predicted$REST = Predicted$REST - Predicted`i'
   }
*/
* Create differences between actual and predicted
forvalues i = 0/$REST {
   generate Difference`i' = Dummy`i' - Predicted`i'
   }

*** TABLE 5.6: ACTUAL AND PREDICTED FREQUENCIES

summarize P* D*

*** Continue Chi-square Goodness-of-fit test

* Obtain the scores to be used later
generate score = $Y - mu
foreach var of varlist $XLIST {
   generate scorefor`var' = score*`var'
   local i = `i' + 1
   }
* Run the auxiliary regression
generate ones = 1
quietly regress ones Difference* score scorefor*, noconstant
scalar CGOF = e(N)*e(r2)
di "Chi-square GOF Test: " CGOF     "  p-value: " chi2tail($MAXCOUNT,CGOF)

* Compare to Stata user-written command countfit
countfit NUMBIDS $XLIST, maxcount(10) prm nograph noestimates nofit

* Aside: Stata command estat gof is a quite different test
* of whether deviance statistic is stat. different from chisquare(n-k)
quietly glm NUMBIDS $XLIST, family(poisson) vce(robust)
display chi2tail((e(N)-e(k)),e(deviance))
quietly poisson NUMBIDS $XLIST, vce(robust)
estat gof

* Classification table (Confusion matrrix)
* Find the mode probability for each observation (i.e. k than maximizes Pr[y = k] 
generate mode = 0
forvalues i = 1/5 {
   local j = `i' - 1
   quietly replace mode = `i'   if Predicted`i'> Predicted`j'
   }
* Compare the actual count to the predicted mode 
generate NUMBIDSgrouped = NUMBIDS
replace NUMBIDSgrouped = $REST if NUMBIDS > $REST
tabulate NUMBIDSgrouped mode
tabulate mode
count if NUMBIDSgrouped == mode

*************** 5.4.2 NON-NESTED MODELS: AIC, BIC and Vuong TEST

* Poisson
poisson NUMBIDS $XLIST
estat ic
estimates store POISSON
* Hurdle logit / Poisson
hplogit NUMBIDS $XLIST
estat ic
estimates store PHURDLE
* ZIP
quietly zip NUMBIDS $XLIST, inflate($XLIST)
estat ic
estimates store ZIP

*** VUONG TEST presented in text
zip NUMBIDS $XLIST, inflate($XLIST) vuong

*** TABLE 5.7: AIC and BIC

* Does not list coefficients of all the regressors 
estimates table POISSON PHURDLE ZIP, b(%9.1f) keep(LEGLREST) ///
   stats(N k ll aic bic) equations(1)

********** CLOSE OUTPUT

* log close
