------------------------------------------------------------------------------------------------------
       log:  c:\Imbook\bwebpage\Section2\mma07p3montecarlo.txt
  log type:  text
 opened on:  18 May 2005, 11:28:58

. 
. ********** OVERVIEW OF MMA07P3MONTECARLO.DO **********
. 
. * STATA Program 
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi 
. * used for "Microeconometrics: Methods and Applications" 
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press 
. 
. * Chapter 7.7.1-7.7.5 pp. 250-4
. * Size and power of the Wald test
.  
. * (1) Figure 7.2 Density of Wald test statistic
. * (2) Table 7.2  Actual size of Wald test at various nominal sizes
. * (3) Table 7.2  Actual power of Wald test at various nominal sizes
. * (4) Table 7.2  Nominal power of Wald test at various nominal sizes
. * (5) Alternative way to simulate using postfile rather than simulate
. 
. * on the slope coefficient for a Probit model with simulated data (see below).
. 
. * NOTE: Because this is a simulation using many samples (here 10,000)
. * the generated data are not saved in a text file.
. 
. * Problem can arise if in one of the simulations all of sample is y=0 or y=1
. * Then the probit model is not estimable.
. * Then need increase sample size, change dgp or reduce number of simulations.
. * Here used N=40 with S=10000 for size and for power
. * Another possible change is to have same regressors x across simulations
. 
. ********** SETUP ********** 
. 
. set more off

. version 8.0

. set scheme s1mono  /* Graphics scheme */

. 
. ********** MONTE CARLO OVERVIEW ********** 
. 
. * The data generating process is 
. * - Probit with Pr[y=1] = Phi(b1 + b2*x2) 
. * - where b1 = 0 and b2 = 1
. * - and regressor x ~ N[0,1] is fixed throughout the simulations 
. 
. * The sample size N set below in the global numobs
. * The number of simulations S is set below in the global numsims
. * A third option is to switch to same x in each sample. This needs to be done manually.
. 
. * The simulation is done using stata command simulate
. * At the end of the program, an alternative using postfile is given
. 
. * The program investigates both size and power 
. * of the Wald test that b2 = 1.
. * For power the dgp instead uses b2 = 2.
. 
. ********** INITIAL SIMULATION SET UP ********** 
. 
. set seed 10101

. * Change the following for different sample size N
. global numobs "40"

. * Change the following for different number of simulations S
. global numsims "10000"

. 
. ****** ANALYSIS: SIMULATION OF PROBIT MODEL SLOPE ESTIMATES AND WALD TEST
. 
. * The program is rclass.
. * This means the results returned by the program are put into r( )
. * Here we return meany, vary, betahat, sebetahat, ztestforbetaeq1
. 
. * The probit model is Pr[y=1] = Phi(b1 + b2*x2) where b1=0 and b2=1
. * For size calculations:  b2 = 1
. * For power calculations: b2 = 1.5   (as an example)
. * So pass the argument trueb2 as an argument.
. 
. * The following three lines are only needed 
. * if the regressors are constant across simulations,
. * as then need to generate once and put in a data file to be reused.
. * They are commented out here as here (x,y) both resampled.
. * Also simprobit and simprobit2 need one line changed if x is fixed.
. /*
> set obs numobs     
> gen x = invnorm(uniform())
> save xforsim, replace
> */
. * This version of the program instead redraws both x and y in each simulation
. 
. * The program has one argument
. *   - trueb2 = value of b2 in the dgp 
. 
. program simprobit, rclass
  1.     version 8.0 
  2.     /* define arguments. Here trueb2 = b2 in Phi(b1 + b2*x2) */ 
.     args trueb2
  3.     /* Generate the data: here x and y */ 
.     drop _all
  4.     set obs $numobs
  5.     gen x = invnorm(uniform())
  6.     /* If instead want same x in each simulation,
>        replace above line with:  use xforsim */
.     gen y = 0
  7.     replace y = 1 if 0 + `trueb2'*x + invnorm(uniform()) > 0
  8.     /* Summarize the generated data as a check */
.     summarize y
  9.     return scalar ymean=r(mean) 
 10.     return scalar yvar=r(Var)
 11.     /* Do probit and store key results */
.     probit y x
 12.     return scalar b2hat=_b[x]
 13.     return scalar seb2hat = _se[x]
 14.     return scalar ztestforb2eq1 = (_b[x]-1)/_se[x]
 15. end

. 
. ****** (1) DISTRIBUTION OF WALD TEST STATISTIC (Figure 7.2 p.253)
. 
. * Now call the program simprobit where
. *  - include values for each argument within the quotes "   "
. *    (here the argument is b2true and is set to 1 for size and 1.5 for power)
. *  - make sure that ask for each of the returned results
. 
. * For size calculations set trueb2 = 1
. simulate "simprobit 1" ymean=r(ymean) yvar=r(yvar) b2hat=r(b2hat) /* 
>   */ seb2hat=r(seb2hat) ztestforb2eq1=r(ztestforb2eq1), reps($numsims) 

command:      simprobit 1
statistics:   ymean      = r(ymean)
              yvar       = r(yvar)
              b2hat      = r(b2hat)
              seb2hat    = r(seb2hat)
              ztestfor~1 = r(ztestforb2eq1)

. 
. * Summary of the results returned by simulate
. * For Wald test key output is ztestforb2eq1
. describe 

Contains data
  obs:        10,000                          simulate: simprobit 1
 vars:             5                          18 May 2005 11:29
 size:       240,000 (97.7% of memory free)
-------------------------------------------------------------------------------
              storage  display     value
variable name   type   format      label      variable label
-------------------------------------------------------------------------------
ymean           float  %9.0g                  r(ymean)
yvar            float  %9.0g                  r(yvar)
b2hat           float  %9.0g                  r(b2hat)
seb2hat         float  %9.0g                  r(seb2hat)
ztestforb2eq1   float  %9.0g                  r(ztestforb2eq1)
-------------------------------------------------------------------------------
Sorted by:  

. summarize

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
       ymean |     10000      .49946    .0794447       .225       .775
        yvar |     10000    .2499373    .0089917   .1788462   .2564103
       b2hat |     10000    1.133952    .4516738  -.0306482   9.389184
     seb2hat |     10000    .3589645    .1561059   .1902922   4.583915
ztestforb2~1 |     10000    .1141294    .9558451  -4.087344   2.278257

. 
. * For b2hat there are two ways to estimate the standard deviation.
. * One is the average of seb2hat, the standard error of b2hat
. * The other is the standard deviation of b2hat. 
. * These are equal asymptotically, but perhaps not in small samples due to bias. 
. * Also aveseb2hat is used later in calculating asymptotic power.
. sum seb2hat

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
     seb2hat |     10000    .3589645    .1561059   .1902922   4.583915

. scalar aveseb2hat = r(mean)

. sum b2hat

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
       b2hat |     10000    1.133952    .4516738  -.0306482   9.389184

. scalar stdevb2hat = r(sd)

. di "Average standard error of b2hat:  " aveseb2hat
Average standard error of b2hat:  .3589645

. di "Standard deviation of b2hat:      " stdevb2hat
Standard deviation of b2hat:      .45167383

. 
. * The Wald test statistic will be called Wald
. gen Wald = ztestforb2eq1

. label var Wald "Wald test statistic" 

. 
. * The mean and st.dev. should be 0 and 1 if Wald ~ N[0,1]
. sum Wald

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
        Wald |     10000    .1141294    .9558451  -4.087344   2.278257

. 
. * The 2.5 and 97.5 percentiles should be -1.96 and 1.96 if Wald ~ N[0,1]
. * They can be used to get size-adjusted Wald test at 5 percent.
. _pctile Wald, p(2.5,99.5)

. display "Wald: Lower 2.5 percentile = " r(r1) "  Upper 2.5 percentile = " r(r2) 
Wald: Lower 2.5 percentile = -1.904708  Upper 2.5 percentile = 2.0034728

. 
. * The density of the simulated values of the Wald test should be 
. * a standard normal density if Wald ~ N[0,1]
. * The following plots kernel estimate of density of Wald and a N[0,1] density
. * Could also do Student[N-k] but this looks same as N[0,1] if N>=30.
. gen N01density = normden(Wald)

. sum Wald

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
        Wald |     10000    .1141294    .9558451  -4.087344   2.278257

. 
. graph twoway (kdensity Wald, range(-3 3) clstyle(p1)) /*
>   */  (connect N01density Wald if Wald>-3 & Wald<3, clstyle(p2) sort(Wald) s(i)),  /*
>   */ scale (1.2) plotregion(style(none)) /*
>   */ title("Monte Carlo Simulations of Wald Test") /*
>   */ xtitle("Wald Test Statistic", size(medlarge)) xscale(titlegap(*5)) /* 
>   */ ytitle("Density", size(medlarge)) yscale(titlegap(*5)) /* 
>   */ legend(pos(11) ring(0) col(1)) legend(size(small)) /*
>   */ legend( label(1 "Monte Carlo") label(2 "Standard Normal") /*
>   */         label(3 "Test size = 0.01")) 

. graph export ch7montecarlo.wmf, replace
(file c:\Imbook\bwebpage\Section2\ch7montecarlo.wmf written in Windows Metafile format)

. 
. ****** (2) ACTUAL SIZE OF THE WALD TEST STATISTIC (Table 7.2, p.253)
. 
. * Obtain the size properties of a two-sided Wald test 
. * That rejects if |Wald| > z_alpha/2  where alpha = .01, .05, .1, .2
. 
. * Convert to two-sided test by taking absolute value
. gen absWald = abs(Wald)

. 
. * Give key percentiles of |Wald|
. * Percentiles must be in ascending order for Stata
. _pctile absWald, p(0.80,0.90,0.95,0.99)

. display "I[Upper percentiles of |Wald|: " " 1 " r(r4) " 5 " r(r3) " 10 " r(r2) " 20 " r(r1) 
I[Upper percentiles of |Wald|:  1 .0115847 5 .01074749 10 .00998338 20 .00923005

. 
. * Program to calculate actual size given nominal size
. * Temporary variables and scalars are in quotes ` '
. program size, rclass
  1.      version 8.0 
  2.      args nominalsize
  3.      tempvar reject
  4.      tempname normalcriticalvalue
  5.      quietly {
  6.         scalar `normalcriticalvalue' = invnorm(1-(`nominalsize'/2))
  7.         gen `reject' = 0
  8.         replace `reject' = 1 if absWald > `normalcriticalvalue'
  9.         summarize `reject'
 10.         return scalar actualsize = r(mean)
 11.      } 
 12. end

. 
. * Calculate actual size for nominal sizes 0.01, 0.05, 0.10 and 0.20
. size 0.01

. scalar actualsize01 = r(actualsize) 

. size 0.05

. scalar actualsize05 = r(actualsize) 

. size 0.10

. scalar actualsize10 = r(actualsize) 

. size 0.20

. scalar actualsize20 = r(actualsize) 

. 
. * Following gives Actual Size column of Table 7.2 (p.253)
. * Nominal Sizes and Actual Sizes of Two-sided Wald Test
. di "0.01: " actualsize01 _new "0.05: " actualsize05 _new /*
>   */  "0.10: " actualsize10 _new "0.20: " actualsize20 
0.01: .0053
0.05: .0294
0.10: .0805
0.20: .1922

. 
. ****** (3) ACTUAL POWER OF THE WALD TEST STATISTIC (Table 7.2, p.253)
. 
. * Consider power when b2 = 2 rather than 1
. 
. * Obtain the actual power by simulation 
. * Use the same program simprobit as for size,  
. * except the argument b2true is 2.0 rather than 1.0
. 
. drop _all

. 
. * For size calculations set trueb2 = 2
. simulate "simprobit 2" ymean=r(ymean) yvar=r(yvar) b2hat=r(b2hat) /* 
>   */ seb2hat=r(seb2hat) ztestforb2eq1=r(ztestforb2eq1), reps(10000) 

command:      simprobit 2
statistics:   ymean      = r(ymean)
              yvar       = r(yvar)
              b2hat      = r(b2hat)
              seb2hat    = r(seb2hat)
              ztestfor~1 = r(ztestforb2eq1)

. 
. * Calculate |Wald| 
. gen Wald = ztestforb2eq1
(71 missing values generated)

. gen absWald = abs(Wald)
(71 missing values generated)

. 
. summarize 

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
       ymean |      9929    .4998389    .0791531       .225       .825
        yvar |      9929     .249985    .0090933   .1480769   .2564103
       b2hat |      9929    2.581075     2.73046   .8547966   209.9805
     seb2hat |      9929    1.002628    5.799384   .2816004   540.1536
ztestforb2~1 |      9929    1.667773    .3853416  -.4042006    2.59991
-------------+--------------------------------------------------------
        Wald |      9929    1.667773    .3853416  -.4042006    2.59991
     absWald |      9929    1.668285     .383118   .0033462    2.59991

. 
. * Calculate actual power for nominal sizes 0.01, 0.05, 0.10 and 0.20
. * This can use the earlier program size
. size 0.01

. scalar actualpower01 = r(actualsize) 

. size 0.05

. scalar actualpower05 = r(actualsize) 

. size 0.10

. scalar actualpower10 = r(actualsize) 

. size 0.20

. scalar actualpower20 = r(actualsize) 

. 
. * Following gives Actual Power column of Table 7.2 (p.253)
. * Nominal Sizes and Actual Power of Two-sided Wald Test
. di "0.01: " actualpower01 _new "0.05: " actualpower05 _new /*
>   */ "0.10: " actualpower10 _new "0.20: " actualpower20 
0.01: .0073
0.05: .2257
0.10: .6077
0.20: .8583

. 
. ****** (4) ASYMPTOTIC POWER OF THE WALD TEST STATISTIC (Table 7.2, p.253)
. 
. * Consider power when b2 = 2 rather than 1
. 
. * Calculate asymptotic theoretical power using noncentral chisquare
. * Asymptotic power = Pr[W > chi-square(alpha) | W ~ noncentral chi-square(alpha,ncp)
. * The noncentrality parameter is 0.5*(delta^2)/(se[b2]^2)
. * Here size has b2 = 1 and power has b2 = 1+delta
. * So delta = b2true - 1. 
. * Need to find the standard error of b2. 
. * Use the average from earlier simulations.
. 
. * Program to calculate asymptotic power given nominal size
. * Temporary variables and scalars and arguments are in quotes ` '
. * invchi2tail gives cv such that Pr(Chi2 > cv) = nominalsize
. * Power is 1 minus cdf of noncentral chisquare 
. * nchi2 gives the cdf of noncentral chisquare
. 
. drop _all

. 
. * Arguments are alpha (size), lamda and df (degrees of freedom)
. program power, rclass
  1.      version 8.0 
  2.      args alpha lamda df
  3.      tempname criticalvalue powervianoncentralchi
  4.      quietly {
  5.         scalar `criticalvalue' = invchi2tail(`df',`alpha')
  6.         scalar `powervianoncentralchi' = 1-nchi2(`df',`lamda',`criticalvalue')  
  7.         return scalar asymppower = `powervianoncentralchi'
  8.      } 
  9. end

. 
. * scalar criticalvalue = invchi2tail(df,alpha)   
. * replace power = 1-nchi2(df,lamda,criticalvalue)  
. 
. * Calculate df and lamda.
. * This uses an estimate of se[beta] obtained earlier
. scalar delta = 1     /* Here 2 - 1. Changes for different alternatives */

. scalar lamda = 0.5*(delta*delta)/(aveseb2hat*aveseb2hat)

. scalar df = 1

. di "delta: " delta "  aveseb2hat: " aveseb2hat "  lamda: " lamda "  df: " df 
delta: 1  aveseb2hat: .3589645  lamda: 3.8803151  df: 1

. 
. * Calculate asymptotic power for nominal sizes 0.01, 0.05, 0.10 and 0.20
. power 0.01 lamda df

. scalar asymppower01 = r(asymppower) 

. power 0.05 lamda df

. scalar asymppower05 = r(asymppower) 

. power 0.10 lamda df

. scalar asymppower10 = r(asymppower) 

. power 0.20 lamda df

. scalar asymppower20 = r(asymppower) 

.    
. * Following gives Asymptotic Power column of Table 7.2 (p.253)
. * Nominal Sizes and Asymptotic Power of Two-sided Wald Test
. di "0.01: " asymppower01 _new "0.05: " asymppower05 _new /*
>   */ "0.10: " asymppower10 _new "0.20: " asymppower20 
0.01: .2722675
0.05: .50398701
0.10: .62755902
0.20: .75494224

. 
. ****** (5) ALTERNATIVE ANALYSIS: SIMULATION METHOD USING POSTFILE
. 
. * This is an alternative, given for completeness.
. * This fails if the model is not estimable in any of the simulation samples.
. * By contrast, simulate just drops that simulation sample and continues simulating.
. 
. * For each round of the simulation, the variables in `sim' are sent 
. * as a new line to a stata data set simprobitresults.
. * The names of these variables are given in quotes after S_1
. * Need as many names in quotes after S_1 as variables at post
. * Then can analyze these using summarize etcetera
. 
. * This program has two arguments
. *   - numsims = desired number of simulations
. *   - trueb2  = slope coefficient used to generate the data
. 
. drop _all 

. 
. program simprobit2
  1.      version 8.0 
  2.      args numsims trueb2 
  3.      tempname sim
  4.      postfile `sim' meany vary beta sterror ztestforbeta using probitsimresults, replace
  5.      quietly {
  6.         forvalues i = 1/`numsims' {
  7.             drop _all
  8.             set obs $numobs      /* may need to change */
  9.             gen x = invnorm(uniform())
 10.             /* If instead want same x in each simulation 
>                replace above line with:  use xforsim   */
.             gen y = 0
 11.             /* Use b2 = 1.0 for size and 1.5 for power */
.             replace y = 1 if 0+`trueb2'*x+invnorm(uniform()) > 0
 12.             summarize y
 13.             scalar meany=r(mean) 
 14.             scalar vary=r(Var)
 15.             probit y x
 16.             scalar beta=_b[x]
 17.             scalar sterror = _se[x]
 18.             scalar ztestforbeta = (beta-1)/sterror
 19.      post `sim' (meany) (vary) (beta) (sterror) (ztestforbeta)
 20.         }
 21.      } 
 22.      postclose `sim'
 23. end

. 
. simprobit2 $numsims 1

. use probitsimresults, clear

. 
. * Here we just summarize results for comparison with earlier
. * But could do the further analysis as above
. sum

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
       meany |     10000    .4989575    .0791248       .225       .775
        vary |     10000    .2499885    .0090127   .1788462   .2564103
        beta |     10000    1.135003    .4315248   .0901358   7.205799
     sterror |     10000    .3583266     .133302   .1863547   3.360862
ztestforbeta |     10000    .1218973     .954814  -3.401833   2.299991

. 
. ********** CLOSE OUTPUT **********
. log close
       log:  c:\Imbook\bwebpage\Section2\mma07p3montecarlo.txt
  log type:  text
 closed on:  18 May 2005, 11:29:29
----------------------------------------------------------------------------------------------------