------------------------------------------------------------------------------------------------------
       log:  c:\Imbook\bwebpage\Section2\mma05p1mle.txt
  log type:  text
 opened on:  17 May 2005, 13:48:11

. 
. ********** OVERVIEW OF MMA05P1MLE.DO **********
. 
. * STATA Program 
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi 
. * used for "Microeconometrics: Methods and Applications" 
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press 
. 
. * Chapter 5.9 pp.159-63 
. * Maximum likelihood analysis.
. 
. * Provides first two columns of Table 5.7
. *   (1) OLS      using Stata command regress 
. *   (2) MLE      using Stata command exp for exponential MLE  
. *   (3) MLE      using Stata command ml for user-provided log-likelihood
. * using generated data (see below)
. 
. * Related programs:
. *   mma05p2nls.do          NLS, WNLS, FGNLS for same data using nl command
. *   mma05p3nlsbyml.do      NLS, WNLS, FGNLS for same data using ml command
. *   mma05p4margeffects.do  Calculates marginal effects     
. 
. ********** SETUP **********
. 
. set more off

. version 8

. 
. ********** GENERATE DATA and SUMMARIZE **********
. 
. * Model is  y ~ exponential(exp(a + bx))
. *           x ~ N[mux, sigx^2]
. *        f(y) = exp(a + bx)*exp(-y*exp(a + bx))
. *      lnf(y) = (a + bx) - y*exp(a + bx)
. *        E[y] = exp(-(a + bx))    note sign reversal for the mean
. *        V[y] = exp(-(a + bx)) = E[y]^2
. 
. * The dgp sets particular values of a, b, mux and sigx
. * Here a = 2, b = -1  and  x ~ N[1, 1]
. scalar a = 2

. scalar b = -1

. scalar mux = 1  

. scalar sigx = 1 

. 
. * Set the sample size. Table 5.7 uses N=10,000 
. set obs 10000
obs was 0, now 10000

. 
. * Generate x and y
. set seed 2003

. gen x = mux + sigx*invnorm(uniform()) 

. gen lamda = exp(a + b*x)

. gen Ey = 1/lamda

. * To generate exponential with mean mu=Ey use
. *   Integral 0 to a of (1/mu)exp(-x/mu) dx   by change of variables
. * = Integral 0 to a/mu of exp(-t)dt
. * = incomplete gamma function P(0,a/mu) in the terminology of Stata
. gen y = Ey*invgammap(1,uniform())

. gen lny = ln(y)

. gen lnfy = ln(lamda) - y*lamda

. * twoway scatter Ey x
. 
. * Descriptive Statisitcs
. describe

Contains data
  obs:        10,000                          
 vars:             6                          
 size:       280,000 (97.3% of memory free)
-------------------------------------------------------------------------------
              storage  display     value
variable name   type   format      label      variable label
-------------------------------------------------------------------------------
x               float  %9.0g                  
lamda           float  %9.0g                  
Ey              float  %9.0g                  
y               float  %9.0g                  
lny             float  %9.0g                  
lnfy            float  %9.0g                  
-------------------------------------------------------------------------------
Sorted by:  
     Note:  dataset has changed since last saved

. summarize

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
           x |     10000    1.014313    1.004905  -2.895741   4.994059
       lamda |     10000    4.457478    5.939084   .0500838   133.7191
          Ey |     10000    .6185677    .8294007   .0074784   19.96655
           y |     10000    .6194352    1.291416   .0000445   30.60636
         lny |     10000   -1.554348     1.62358  -10.02114   3.421208
-------------+--------------------------------------------------------
        lnfy |     10000   -.0209485    1.419595   -7.52596   4.402257

. 
. ********** WRITE DATA TO A TEXT FILE **********
. 
. * Write data to a text (ascii) file
. * used for programs mma05p2nlsbyml.do, mma05p3nlsbynl.do 
. * and mma05p4margeffects.do
. * and can also use with programs other than Stata  
. outfile y x using mma05data.asc, replace

. 
. ********** DO THE ANALYSIS: OLS and MLE **********
. 
. ** (1) OLS ESTIMATION
. 
. * OLS is inconsistent in this example
. regress y x

      Source |       SS       df       MS              Number of obs =   10000
-------------+------------------------------           F(  1,  9998) = 3030.74
       Model |  3879.13606     1  3879.13606           Prob > F      =  0.0000
    Residual |  12796.7438  9998  1.27993037           R-squared     =  0.2326
-------------+------------------------------           Adj R-squared =  0.2325
       Total |  16675.8799  9999  1.66775476           Root MSE      =  1.1313

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |   .6198182   .0112587    55.05   0.000     .5977488    .6418876
       _cons |  -.0092545    .016075    -0.58   0.565    -.0407648    .0222558
------------------------------------------------------------------------------

. estimates store rols

. regress y x, robust

Regression with robust standard errors                 Number of obs =   10000
                                                       F(  1,  9998) =  596.30
                                                       Prob > F      =  0.0000
                                                       R-squared     =  0.2326
                                                       Root MSE      =  1.1313

------------------------------------------------------------------------------
             |               Robust
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |   .6198182   .0253823    24.42   0.000     .5700638    .6695725
       _cons |  -.0092545   .0171978    -0.54   0.591    -.0429655    .0244566
------------------------------------------------------------------------------

. estimates store rolsrobust

. 
. ** (2) ML ESTIMATION USING STATA COMMAND FOR EXPONENTIAL MLE
. 
. * The following uses Stata duration model commands. 
. * First need to define the duration variable (here y)
. stset y

     failure event:  (assumed to fail at time=y)
obs. time interval:  (0, y]
 exit on or before:  failure

------------------------------------------------------------------------------
    10000  total obs.
        0  exclusions
------------------------------------------------------------------------------
    10000  obs. remaining, representing
    10000  failures in single record/single failure data
 6194.352  total analysis time at risk, at risk from t =         0
                             earliest observed entry t =         0
                                  last observed exit t =  30.60636

. streg x, dist(exp) nohr

         failure _d:  1 (meaning all fail)
   analysis time _t:  y

Iteration 0:   log likelihood = -20754.005  
Iteration 1:   log likelihood = -17232.884  
Iteration 2:   log likelihood = -15760.556  
Iteration 3:   log likelihood = -15752.193  
Iteration 4:   log likelihood =  -15752.19  
Iteration 5:   log likelihood =  -15752.19  

Exponential regression -- log relative-hazard form 

No. of subjects =        10000                     Number of obs   =     10000
No. of failures =        10000
Time at risk    =  6194.352495
                                                   LR chi2(1)      =  10003.63
Log likelihood  =    -15752.19                     Prob > chi2     =    0.0000

------------------------------------------------------------------------------
          _t |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |  -.9896276   .0098692  -100.27   0.000    -1.008971   -.9702842
       _cons |   1.982921   .0141496   140.14   0.000     1.955188    2.010654
------------------------------------------------------------------------------

. estimates store rexp

. streg x, dist(exp) nohr robust

         failure _d:  1 (meaning all fail)
   analysis time _t:  y

Iteration 0:   log pseudo-likelihood = -20754.005  
Iteration 1:   log pseudo-likelihood = -17232.884  
Iteration 2:   log pseudo-likelihood = -15760.556  
Iteration 3:   log pseudo-likelihood = -15752.193  
Iteration 4:   log pseudo-likelihood =  -15752.19  
Iteration 5:   log pseudo-likelihood =  -15752.19  

Exponential regression -- log relative-hazard form 

No. of subjects       =        10000               Number of obs   =     10000
No. of failures       =        10000
Time at risk          =  6194.352495
                                                   Wald chi2(1)    =   9914.62
Log pseudo-likelihood =    -15752.19               Prob > chi2     =    0.0000

------------------------------------------------------------------------------
             |               Robust
          _t |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |  -.9896276   .0099388   -99.57   0.000    -1.009107   -.9701479
       _cons |   1.982921   .0144307   137.41   0.000     1.954637    2.011205
------------------------------------------------------------------------------

. estimates store rexprobust

. 
. ** (3) ML ESTIMATION USING STATA ML COMMAND
. 
. * For MLE computation can use the following Stata commands
. *   ml model lf      provide the log-density
. *   ml model D0      provide the log-likelihood 
. *   ml model D1      provide the log-likelihood and gradient
. *   ml model D2      provide the log-likelihood, gradient and hessian
. 
. * At a minimum need to provide 
. * (A) program define fcn     where fcn is the function name
. *        defines the log-density  (independent observations assumed)                    
. * (B) ml model lf fcn + some extras 
. *        the extras give the dependent variable and regressors
. * (C) ml maximize 
. *        obtains the mle
. * (D) ml model lf fcn + some extras, robust 
. *        provides robust sandwich standard errors
. 
. * Here we provide the log-density (ml model lf) as this is simplest,
. * and the Stata manual says that numerically only D2 is better.
. 
. * (A) Define the log-density
. *      lnf(y) = (a+bx) - y*exp(a+bx) = theta - y*exp(theta) where theta = x'b
. program define mleexp0
  1.   version 8.0
  2.   args lnf theta      /* Must use lnf while could use name other than theta */
  3.   quietly replace `lnf' = `theta' - $ML_y1*exp(`theta')
  4. end

. 
. * (B) Say that dependent variable is y and regressors are x plus a constant
. ml model lf mleexp0 (y = x)

. 
. * (C) Obtain the MLE
. ml search        /* Optional - can provide better starting values */
initial:       log likelihood = -6194.3525
improve:       log likelihood = -6194.3525
alternative:   log likelihood = -5212.7607
rescale:       log likelihood = -5212.7607

. ml maximize

initial:       log likelihood = -5212.7607
rescale:       log likelihood = -5212.7607
Iteration 0:   log likelihood = -5212.7607  
Iteration 1:   log likelihood = -1563.9176  
Iteration 2:   log likelihood =  -217.6055  
Iteration 3:   log likelihood = -208.73633  
Iteration 4:   log likelihood = -208.71383  
Iteration 5:   log likelihood = -208.71383  

                                                  Number of obs   =      10000
                                                  Wald chi2(1)    =   10054.85
Log likelihood = -208.71383                       Prob > chi2     =     0.0000

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |  -.9896276   .0098692  -100.27   0.000    -1.008971   -.9702842
       _cons |   1.982921   .0141496   140.14   0.000     1.955188    2.010654
------------------------------------------------------------------------------

. estimates store rmle

. 
. * (D) Obtain robust standard errors 
. ml model lf mleexp0 (y = x), robust

. ml search
initial:       log pseudo-likelihood = -6194.3525
improve:       log pseudo-likelihood = -6194.3525
alternative:   log pseudo-likelihood = -5212.7607
rescale:       log pseudo-likelihood = -5212.7607

. ml maximize

initial:       log pseudo-likelihood = -5212.7607
rescale:       log pseudo-likelihood = -5212.7607
Iteration 0:   log pseudo-likelihood = -5212.7607  
Iteration 1:   log pseudo-likelihood = -1563.9176  
Iteration 2:   log pseudo-likelihood =  -217.6055  
Iteration 3:   log pseudo-likelihood = -208.73633  
Iteration 4:   log pseudo-likelihood = -208.71383  
Iteration 5:   log pseudo-likelihood = -208.71383  

                                                  Number of obs   =      10000
                                                  Wald chi2(1)    =    9914.62
Log pseudo-likelihood = -208.71383                Prob > chi2     =     0.0000

------------------------------------------------------------------------------
             |               Robust
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |  -.9896276   .0099388   -99.57   0.000    -1.009107   -.9701479
       _cons |   1.982921   .0144307   137.41   0.000     1.954637    2.011205
------------------------------------------------------------------------------

. estimates store rmlerobust

. 
. * (E) Calculate R-squared and log-likelihood at the ML estimates
. * lnL sums lnf(y) = ln(lamda) - y*lamda
. gen lamdaml = exp(_b[_cons] + _b[x]*x)

. gen lnfml = ln(lamdaml) - y*lamdaml

. quietly means lnfml

. scalar LLml = r(mean)*r(N)

. * R-squared = 1 - Sum_i(y_i - yhat_i)^2 / Sum_i(y_i - ybar)^2
. gen yhatml = 1/lamdaml

. egen ybar = mean(y)

. * quietly means y
. * scalar ybar = r(mean)
. gen y_yhatsqml = (y - yhatml)^2

. gen y_ybarsq = (y - ybar)^2

. quietly means y_yhatsqml

. scalar SSresidml = r(mean)

. quietly means y_ybarsq

. scalar SStotal = r(mean)

. scalar Rsqml = 1 - SSresidml/SStotal

. di LLml "    " Rsqml
-208.71383    .39062307

. 
. ********** DISPLAY RESULTS: First two columns of Table 5.7 p.161
. 
. * (1) OLS - nonrobust and robust standard errors 
. *     Here OLS is inconsistent. 
. *     And expect sign reversal for slope as in true model mean E[y] = exp(-x'b)
. estimates table rols rolsrobust, b(%10.4f) se(%10.4f) t stats(N ll r2) keep(_cons x)

----------------------------------------
    Variable |    rols      rolsrobust  
-------------+--------------------------
       _cons |    -0.0093      -0.0093  
             |     0.0161       0.0172  
             |      -0.58        -0.54  
           x |     0.6198       0.6198  
             |     0.0113       0.0254  
             |      55.05        24.42  
-------------+--------------------------
           N | 10000.0000   10000.0000  
          ll | -1.542e+04   -1.542e+04  
          r2 |     0.2326       0.2326  
----------------------------------------
                          legend: b/se/t

. 
. * (2) MLE by command ereg - nonrobust and robust standard errors 
. estimates table rexp rexprobust, b(%10.4f) se(%10.4f) t stats(N ll) keep(_cons x)

----------------------------------------
    Variable |    rexp      rexprobust  
-------------+--------------------------
       _cons |     1.9829       1.9829  
             |     0.0141       0.0144  
             |     140.14       137.41  
           x |    -0.9896      -0.9896  
             |     0.0099       0.0099  
             |    -100.27       -99.57  
-------------+--------------------------
           N | 10000.0000   10000.0000  
          ll | -1.575e+04   -1.575e+04  
----------------------------------------
                          legend: b/se/t

. 
. * (3) MLE by command ml - nonrobust and robust standard errors 
. estimates table rmle rmlerobust, b(%10.4f) se(%10.4f) t stats(N ll) keep(_cons x)

----------------------------------------
    Variable |    rmle      rmlerobust  
-------------+--------------------------
       _cons |     1.9829       1.9829  
             |     0.0141       0.0144  
             |     140.14       137.41  
           x |    -0.9896      -0.9896  
             |     0.0099       0.0099  
             |    -100.27       -99.57  
-------------+--------------------------
           N | 10000.0000   10000.0000  
          ll |  -208.7138    -208.7138  
----------------------------------------
                          legend: b/se/t

. * And ML log-likelihood (check) and R-squared (needed to be computed)
. di "Log likeihood for ML: " LLml
Log likeihood for ML: -208.71383

. di "R-squared for MLE:    " Rsqml
R-squared for MLE:    .39062307

. 
. ********** CLOSE OUTPUT **********
. log close
       log:  c:\Imbook\bwebpage\Section2\mma05p1mle.txt
  log type:  text
 closed on:  17 May 2005, 13:48:18
----------------------------------------------------------------------------------------------------
