* racd05.do  January 2013 for Stata version 12

capture log close
log using racd04.txt, text replace

********** OVERVIEW OF racd04.do **********

* STATA Program 
* copyright C 2013 by A. Colin Cameron and Pravin K. Trivedi 
* used for "Regression Analyis of Count Data" SECOND EDITION
* by A. Colin Cameron and Pravin K. Trivedi (2013)
* Cambridge University Press

* This STATA program analyzes docor visits data for chapter 4
*   4.2   TWO CROSSINGS THEOREM
*   4.8.1 MIXTURE OF 2 POISSON
*   4.8.8 EXAMPLE: PATENTS

* To run you need file
*   racd09data.dta
* and user-written Stata addon
*   fmm
* in your directory

********** SETUP **********

set more off
version 12
clear all
* set linesize 82
set scheme s1mono  /* Graphics scheme */

********** DATA DESCRIPTION

*  The original data is from 
*  Bronwyn Hall, Zvi Griliches, and Jerry Hausman (1986), 
*  "Patents and R&D: Is There a Lag?", 
*  International Economic Review, 27, 265-283.
*  See this article for more detailed discussion 
*  Also see racd09makedata.do for further details  
*  NOTE: Here we use just 1979 data 

********** 4.2 TWO CROSSINGS THEOREM

* Poisson with mean mu = 10 
* Negative binomial with mean mu = 10 and alpha = 0.2
* so 1/alpha = 1/.2 = 5 and 1/(1+alpha*mu) = 1/(1+0.2*10) = 1/3
* and alpha/(1+alpha*mu) = 0.2*10/(1+0.2*10) = 2/3

* ASIDE: The following gave unexpected wrong result
* Stata function nbinomialp(n,k,p) returns the probability of 
* observing k or fewer failures before the nth success
* So  p = a*mu/(1+a*mu) = 2/3 = 0.666 and n = 1/a = 1/.2 = 5
* But generate ynegbin = nbinomialp(5,k,0.6666666) gave mean 2.5, var = 3.75

set obs 21
generate k = _n - 1
generate prob_poisson = exp(-10)*(10^k)/exp(lngamma(k+1))
generate check_poisson = poissonp(10,k)
generate prob_nb = exp(lngamma(k+5)-lngamma(k+1)-lngamma(5))*((1/3)^5)*((2/3)^k)
generate badprob_nb = nbinomialp(5,k,0.6666666)
list, clean

graph twoway (line prob_nb k, connect(stairstep))                         ///
  (line prob_poisson k, connect(stairstep) lstyle(p3)), scale(1.2)        ///
  legend( ring(0) rows(2) pos(1) label(1 "NB {&mu} = 10 {&alpha} = .2")   ///
  label(2 "Poisson {&mu} = 10")) ytitle("Probability that Y = y") xtitle("y")

*** FIGURE 4.1: TWO CROSSINGS THEOREM

graph export racd04fig1.eps, replace
graph export racd04fig1.wmf, replace

**********   4.8.1 MIXTURE OF 2 POISSON

clear
set obs 100000
set seed 10101
generate xpmix = rpoisson(0.2)
replace xpmix = rpoisson(6) if runiform() > 0.5
label variable xpmix "X is .5xP[0.2] + .5xP[6.0]"
tabulate xpmix

* For appearance drop a few of the largest values
histogram xpmix if xpmix < 17, discrete scale(1.2)

*** FIGURE 4.2: 50/50 MIXTURE OF POISSONS

graph export racd04fig2.eps, replace
graph export racd04fig2.wmf, replace

********** 4.8.8 EXAMPLE: PATENTS

use racd09data.dta, clear

* Create log of total R&D over five years
generate LOGRandD = ln(exp(LOGR)+exp(LOGR1)+exp(LOGR2)+exp(LOGR3)+exp(LOGR4)+exp(LOGR5))

* Use only 1979 data
keep if YEAR==5

* Regressor list
global XLIST LOGRandD LOGK SCISECT

* Variable descriptions and summary statistics
describe PAT $XLIST
summarize PAT $XLIST

*** TABLE 4.2: FREQUENCIES OF PAT (with grouping)

recode PAT (0=0) (1=1) (2/5=2) (6/20=6) (21/50=31) (51/100=51)  ///
   (101/200=101) (201/300=201) (301/600=301), gen(PATgrouped)
tabulate PATgrouped

* Poisson
poisson PAT $XLIST, vce(robust)
estimates store POISS

* NB models: NB1 and NB2
nbreg PAT $XLIST, dispersion(constant) vce(robust)
estimates store NB1
nbreg PAT $XLIST, vce(robust)
estimates store NB2

* PIG model
* Not included though ideally would be included.

* Finite mixture models: NB1 and NB2
fmm PAT $XLIST, components(2) mixtureof(negbin1) vce(robust)
estimates store FMMNB1

fmm PAT $XLIST, components(2) mixtureof(negbin2) vce(robust)
estimates store FMMNB2
predict mu1, equation(component1)
predict mu2, equation(component2)
summarize mu*

* estimates table (including NB1 and FMNB2)
estimates table POISS NB1 NB2 FMMNB1 FMMNB2, b(%9.3f) se stats(ll aic bic N k) equations(1)

*** TABLE 4.3: MODEL ESTIMATES

estimates table POISS NB2 FMMNB2, b(%9.3f) se stats(ll aic bic N k) equations(1)

********** CLOSE OUTPUT

* log close
* clear
* exit
