* racd13.do  January 2013 for Stata version 12

capture log close
log using racd13.txt, text replace

********** OVERVIEW OF racd13.do **********

* STATA Program 
* copyright C 2013 by A. Colin Cameron and Pravin K. Trivedi 
* used for "Regression Analyis of Count Data" SECOND EDITION
* by A. Colin Cameron and Pravin K. Trivedi (2013)
* Cambridge University Press

* This STATA program estimates measurement error models for chapter 13
*   13.7 SIMULATION EXAMPLE: POISSON WITH MISMEASURED REGRESSOR

* To run you need no data (the data are generated)
* and user-written Stata addons
*   rcal, simex, qvf, cme, and gllamm (cme is a wappper for gllamm) 
* in your directory

********** SETUP **********

set more off
version 12
clear all
* set linesize 82
set scheme s1mono  /* Graphics scheme */
* set maxvar 100 width 1000

************

* This STATA program estimates some measurement error models
*           TRUE: Poisson model with true regressor x1star
*          NAIVE: Poisson model with observed x1 with measurement error
*           RCAL: Regression calibration with duplicate observation z for x1star
*          SIMEX: SIMEX with duplicate observation z for x1star
*         NL2SLS: NLIV with instrument z for x1star
*       IVAPPROX: Carroll's IV with duplicate observation (instrument) z for x1star
* MLE STRUCTURAL: Maximum likelihood of normal structural model

* NOTE: SIMEX and MLE STRUCTURAL commented out as take a long time.
* But results from these commands are included in comments below.
* Can speed up SIMEX by reducing number of bootstrap replications
 
********** DATA DESCRIPTION

*  The data are generated
*    y ~ Poisson(mu)
*    mu = 0 + 1*x1star + 1*x2

* For normally distibuted data
*    x1star ~ N[0, .4^2]
*    x1 = x1star + e1  where e1 ~ N[0, .2^2]
*    z1 = x1star + e2  where e2 ~ N[0, .2^2] 
*    x2 ~ N[0, .4^2]

* For rescaled chisquare data
*    x1star ~ (0.4/sqrt(2)) x (w-1) where w ~ chi2(1)
*    x1 = x1star + e1  where e1 ~ (0.4/sqrt(2)) x (w-1) where w ~ chi2(1)
*    z1 = x1star + e2  where e2 ~ (0.4/sqrt(2)) x (w-1) where w ~ chi2(1)
*    x2 ~ (0.4/sqrt(2)) x (w-1) where w ~ chi2(1)

********** PART A: NORMALLY DISTRIBUTED DATA

clear
set obs 10000
set seed 10101

generate x1star = rnormal(0, .4)
generate e1 = rnormal(0, .4)
generate e2 = rnormal(0, .4)
generate x2 = rnormal(0, .4)
generate x1 = x1star + e1
generate z = x1star + e2
generate mu = 0 + 1*x1star + 1*x2
generate y = rpoisson(exp(mu))
generate ynox2 = rpoisson(exp(0 + 1*x1star))
generate ylinear = mu + rnormal(0,1)
generate xgamma = rgamma(1,1)
generate munegbin = xgamma*exp(mu)
generate ynegbin = rpoisson(munegbin)

/*
lowess ynox2 x1, msize(tiny) lineopts(lwidth(medthick)) lstyle(solid) saving(graph1, replace) xlabel(#6)
lpoly ynox2 x1star, msize(tiny) lineopts(lwidth(medthick)) saving(graph2, replace) xlabel(#6)
graph combine graph1.gph graph2.gph iscale(0.7) ysize(5) xsize(6) xcommon
*/

summarize
summarize y, detail
tabulate y

* Linear model with truth - not in book
regress ylinear x1star x2
* Linear model with observed - not in book
regress ylinear x1 x2

* TRUE: Poisson model with true regressor x1star
poisson y x1star x2
estimates store NTrue

* NAIVE: Poisson model with observed x1 with measurement error
poisson y x1 x2
estimates store NNaive

* RCAL: Regression calibration with duplicate observation z for x1star
* Standard errors vary with the seed
rcal (y = x2) (w1: x1 z), family(poisson) bstrap brep(400) seed(10101)
estimates store NRCAL

* SIMEX: SIMEX with duplicate observation z for x1star
* Commented out as takes a long time
* matrix theta=(0,.5,1,1.5,2,2.5,3,3.5)
* simex (y = x2) (w1: x1 z), family(poisson) theta(theta) median bstrap brep(400) seed(10101)
* Estimated coefficients vary with the seed
* simex (y = x2) (w1: x1 z), family(poisson) theta(theta) median bstrap brep(400) seed(12345)

/* SIMEX Results. SIMEX commented out as takes a long time
. * SIMEX: SIMEX with duplicate observation z for x1star
. matrix theta=(0,.5,1,1.5,2,2.5,3,3.5)
. simex (y = x2) (w1: x1 z), family(poisson) theta(theta) median bstrap brep(400) seed(10101)
Estimated time to perform bootstrap: 15.5 minutes.
Simulation extrapolation                        No. of obs         =     10000
                                                Bootstraps reps    =       400
Residual df  =      9997                        Wald F(2,9997)     =   1501.29
                                                Prob > F           =    0.0000
Variance Function: V(u) = u                        [Poisson]
Link Function    : g(u) = ln(u)                    [Log]
------------------------------------------------------------------------------
             |              Bootstrap
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          x2 |   1.012512   .0240066    42.18   0.000     .9654542     1.05957
          w1 |    .858099   .0238988    35.91   0.000     .8112526    .9049454
       _cons |   .0062725    .009817     0.64   0.523    -.0129708    .0255157
------------------------------------------------------------------------------
*/

* NL2SLS: NLIV with instrument z for x1star
gmm (y - exp({xb:x1 x2} + {b0})), instruments(z x2) onestep
estimates store NNL2SLS

* IVAPPROX: Carroll's IV with duplicate observation (instrument) z for x1star
qvf y x1 x2 (z x2), family(poisson)
estimates store NIVApprox

* MLE STRUCTURAL: Maximum likelihood of normal structural model
* Takes a long time to run so commented out
* cme y x2 (x1true: x1 z), link(log) family(poisson) robust

/* CME RESULTS. CME commented out as takes a long time
. * MLE STRUCTURAL: Maximum likelihood of normal structural model
. cme y x2 (x1true: x1 z), link(log) family(poisson) robust
Running adaptive quadrature
Iteration 0:    log likelihood = -29442.215
Iteration 1:    log likelihood = -28836.534
Iteration 2:    log likelihood =  -28802.87
Iteration 3:    log likelihood = -28802.777
Iteration 4:    log likelihood = -28802.777
Adaptive quadrature has converged, running Newton-Raphson
Iteration 0:   log likelihood = -28802.777  
Iteration 1:   log likelihood = -28802.777  
Iteration 2:   log likelihood = -28802.776  
gllamm covariate measurement error model           No. of obs      = 10000
log likelihood = -28802.776
OUTCOME MODEL
------------------------------------------------------------------------------
             |               Robust
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
y            |
          x2 |    1.01163   .0248953    40.64   0.000     .9628365    1.060424
      x1true |   .9904211     .02921    33.91   0.000     .9331706    1.047672
       _cons |  -.0026743   .0110999    -0.24   0.810    -.0244297    .0190811
------------------------------------------------------------------------------
TRUE COVARIATE MODEL
------------------------------------------------------------------------------
      x1true |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
x1true       |
          x2 |  -.0033133   .0122784    -0.27   0.787    -.0273785     .020752
       _cons |   -.006885   .0048946    -1.41   0.160    -.0164782    .0027082
-------------+----------------------------------------------------------------
   res. var. |   .1603168    .003648                      .1532465    .1675466
------------------------------------------------------------------------------
MEASUREMENT MODEL
------------------------------------------------------------------------------
  error var. |   .1587588   .0022676                      .1543761     .163266
 reliability |   .5024414   .0075969                      .4875535    .5173258
------------------------------------------------------------------------------
*/

*** TABLE 13.1 RESULTS: PANEL A: POISSON MODEL AND NORMAL DGP

estimates table NTrue NNaive NRCAL NNL2SLS NIVApprox, b(%9.3f) se equations(1)


********** PART B: RESCALED CHISQUARE DISTRIBUTED DATA

clear
set obs 10000
set seed 10101

generate x1star = 0.4*(rchi2(1)-1)/sqrt(2)
generate e1 = 0.4*(rchi2(1)-1)/sqrt(2)
generate e2 = 0.4*(rchi2(1)-1)/sqrt(2)
generate x2 = 0.4*(rchi2(1)-1)/sqrt(2)

generate x1 = x1star + e1
generate z = x1star + e2
generate mu = 0 + 1*x1star + 1*x2
generate y = rpoisson(exp(mu))
generate ynox2 = rpoisson(exp(0 + 1*x1star))
generate ylinear = mu + rnormal(0,1)
generate xgamma = rgamma(1,1)
generate munegbin = xgamma*exp(mu)
generate ynegbin = rpoisson(munegbin)

/*
lowess ynox2 x1, msize(tiny) lineopts(lwidth(medthick)) lstyle(solid) saving(graph1, replace) xlabel(#6)
lpoly ynox2 x1star, msize(tiny) lineopts(lwidth(medthick)) saving(graph2, replace) xlabel(#6)
graph combine graph1.gph graph2.gph iscale(0.7) ysize(5) xsize(6) xcommon
*/

summarize
summarize y, detail
tabulate y

* Linear model with truth - not in book
regress ylinear x1star x2
* Linear model with observed - not in book
regress ylinear x1 x2

* TRUE: Poisson model with true regressor x1star
poisson y x1star x2
estimates store CTrue

* NAIVE: Poisson model with observed x1 with measurement error
poisson y x1 x2
estimates store CNaive

* RCAL: Regression calibration with duplicate observation z for x1star
* Standard errors vary with the seed
rcal (y = x2) (w1: x1 z), family(poisson) bstrap brep(400) seed(10101)
estimates store CRCAL

* SIMEX: SIMEX with duplicate observation z for x1star
* matrix theta=(0,.5,1,1.5,2,2.5,3,3.5)
* simex (y = x2) (w1: x1 z), family(poisson) theta(theta) median bstrap brep(400) seed(10101)
* Estimated coefficients vary with the seed
* simex (y = x2) (w1: x1 z), family(poisson) theta(theta) median bstrap brep(400) seed(12345)

/* SIMEX Results. SIMEX commented out as takes a long time
* SIMEX: SIMEX with duplicate observation z for x1star
. simex (y = x2) (w1: x1 z), family(poisson) theta(theta) median bstrap brep(400) seed(10101)
Estimated time to perform bootstrap: 15.5 minutes.
Simulation extrapolation                        No. of obs         =     10000
                                                Bootstraps reps    =       400
Residual df  =      9997                        Wald F(2,9997)     =   3289.04
                                                Prob > F           =    0.0000
Variance Function: V(u) = u                        [Poisson]
Link Function    : g(u) = ln(u)                    [Log]
------------------------------------------------------------------------------
             |              Bootstrap
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          x2 |    .983736   .0178388    55.15   0.000     .9487683    1.018704
          w1 |    .956486   .0147981    64.64   0.000     .9274788    .9854933
       _cons |   .0078898   .0106569     0.74   0.459    -.0129999    .0287794
------------------------------------------------------------------------------
*/

* NL2SLS: NLIV with instrument z for x1star
gmm (y - exp({xb:x1 x2} + {b0})), instruments(z x2) onestep
estimates store CNL2SLS

* IVAPPROX: Carroll's IV with duplicate observation (instrument) z for x1star
qvf y x1 x2 (z x2), family(poisson)
estimates store CIVApprox

* MLE STRUCTURAL: Maximum likelihood of normal structural model
* Takes a long time to run so commented out
* cme y x2 (x1true: x1 z), link(log) family(poisson) robust

/* CME RESULTS. CME commented out as takes a long time
. * MLE STRUCTURAL: Maximum likelihood of normal structural model
. cme y x2 (x1true: x1 z), link(log) family(poisson) robust
Running adaptive quadrature
Iteration 0:    log likelihood = -29704.418
Iteration 1:    log likelihood = -29107.377
Iteration 2:    log likelihood = -29037.282
Iteration 3:    log likelihood = -29036.541
Iteration 4:    log likelihood = -29036.541
Adaptive quadrature has converged, running Newton-Raphson
Iteration 0:   log likelihood = -29036.541  
Iteration 1:   log likelihood = -29036.541  (backed up)
Iteration 2:   log likelihood = -29036.538  
Iteration 3:   log likelihood = -29036.538  
gllamm covariate measurement error model           No. of obs      = 10000
log likelihood = -29036.538
OUTCOME MODEL
------------------------------------------------------------------------------
             |               Robust
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
y            |
          x2 |   .9698153   .0157735    61.48   0.000     .9388999    1.000731
      x1true |   1.213485      .0215    56.44   0.000     1.171346    1.255624
       _cons |  -.0178152   .0115338    -1.54   0.122     -.040421    .0047907
------------------------------------------------------------------------------
TRUE COVARIATE MODEL
------------------------------------------------------------------------------
      x1true |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
x1true       |
          x2 |   .0253178    .012859     1.97   0.049     .0001147     .050521
       _cons |    .001485   .0049441     0.30   0.764    -.0082053    .0111753
-------------+----------------------------------------------------------------
   res. var. |   .1658849   .0065873                      .1532253    .1790469
------------------------------------------------------------------------------
MEASUREMENT MODEL
------------------------------------------------------------------------------
  error var. |   .1563348    .004127                      .1484518    .1646365
 reliability |   .5148191   .0124117                      .4904801     .539103
------------------------------------------------------------------------------
*/

*** TABLE 13.1 RESULTS: PANEL B: POISSON MODEL AND CHISQUARE DGP

estimates table CTrue CNaive CRCAL CNL2SLS CIVApprox, b(%9.3f) se equations(1)

/* NOT IN BOOK .....

* REPEAT FOR NEGATIVE BINOMIAL D.G.P. - BUT USING POISSON METHODS

* Negative binomial model with truth
nbreg ynegbin x1star x2
* Negative binomial model with observed
nbreg ynegbin x1 x2

* Regression calibration with duplicate observation z for x1star
rcal (ynegbin = x2) (w1: x1 z), family(poisson) bstrap brep(100)

* SIMEX with duplicate observation z for x1star
matrix theta=(0,.5,1,1.5,2,2.5,3,3.5)
simex (ynegbin = x2) (w1: x1 z), family(poisson) theta(theta) median bstrap brep(10)

* Carroll's IV with duplicate observation (instrument) z for x1star
qvf ynegbin x1 x2 (z x2), family(poisson)

* NLIV with instrument z for x1star
gmm (ynegbin - exp({xb:x1 x2} + {b0})), instruments(z x2) onestep

* REPEAT FOR NEGATIVE BINOMIAL - BUT USING NBINOMIAL

* Negative binomial model with truth
nbreg ynegbin x1star x2
* Negative binomial model with observed
nbreg ynegbin x1 x2

* Regression calibration with duplicate observation z for x1star
rcal (ynegbin = x2) (w1: x1 z), family(nbinomial) bstrap brep(100)

* SIMEX with duplicate observation z for x1star
matrix theta=(0,.5,1,1.5,2,2.5,3,3.5)
simex (ynegbin = x2) (w1: x1 z), family(nbinomial) theta(theta) median bstrap brep(10)

* Carroll's IV with duplicate observation (instrument) z for x1star
qvf ynegbin x1 x2 (z x2), family(nbinomial)

*/

********** CLOSE OUTPUT

* log close
* clear
* exit
