* MMA24P1OLSCLUSTER.DO May 2005 for Stata version 8.0
log using mma24p1olscluster.txt, text replace
********** OVERVIEW OF MMA24P1OLSCLUSTER.DO **********
* STATA Program
* copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
* used for "Microeconometrics: Methods and Applications"
* by A. Colin Cameron and Pravin K. Trivedi (2005)
* Cambridge University Press
* Chapter 24.7 pages 848-53 Table 24.4
* Cluster robust inference for OLS cross-section application using
* Vietnam Living Standard Survey data
* (0) Descriptive Statistics (Table 24.3 first half)
* (1) Linear regression (in logs) with household data (Table 24.4)
* For Tables 24.5-6 for clustered count data see MMA24P2POISCLUSTER.DO
* The cluster effects model is
* y_it = x_it'b + a_i + e_it
* Default xtreg output assumes e_it is iid.
* This is usually too strong an assumption.
* Instead should get cluster-robust errors after xtreg
* See Section 21.2.3 pages 709-12
* Stata Version 8 does not do this but Stata version 9 does.
* Here we do a panel bootstrap - results not reported in the text
* To speed up programs reduce breps - the number of bootstrap reps
* To run this program you need data set
* vietnam_ex1.dta
********** SETUP **********
set more off
version 8.0
set scheme s1mono /* Used for graphs */
********** DATA DESCRIPTION **********
* The data comes from World Bank 1997 Vietnam Living Standards Survey
* A subset was used in chapter 4.6.4.
* The larger sample here is described on pages 848-9
* The data are HOUSEHOLD data
* There are N=5006 households in 194 clusters
* The separate data set vietnam_ex2.dta has household-level data
********** READ IN HOUSEHOLD DATA and SUMMARIZE (Table 24.3) **********
use vietnam_ex1.dta
desc
sum
rename sex SEX
rename age AGE
rename comped98 EDUC
rename farm FARM
rename hhsize HHSIZE
rename commune COMMUNE
rename lhhexp1 LNHHEXP
rename lhhex12m LNEXP12M
gen HHEXP = exp(LNHHEXP)
* Following should give same descriptive statistics
* as in top half (Household) in Table 24.3 p.850
* But there are some differences plus here have FARM not URBAN
sum LNEXP12M AGE SEX HHSIZE FARM EDUC HHEXP LNHHEXP COMMUNE
* Write data to a text (ascii) file so can use with programs other than Stata
* Note that LNEXP12M has some missing values coded as .
outfile LNEXP12M AGE SEX HHSIZE FARM EDUC LNHHEXP COMMUNE /*
*/using vietnam_ex1.asc, replace
********** ANALYSIS: CLUSTER ANALYSIS FOR LINEAR MODEL [Table 24.4 p.851] **********
* Regressor list for the linear regressions
global XLISTLINEAR LNHHEXP AGE SEX HHSIZE FARM EDUC
* OLS with usual standard errors (Table 24.4 columns 1-2)
regress LNEXP12M $XLISTLINEAR
estimates store olsiid
* OLS with heteroskedastic-robust standard errors (Table 24.4 column 3)
regress LNEXP12M $XLISTLINEAR, robust
estimates store olshet
* OLS with cluster-robust standard errors (Table 24.4 column 4)
regress LNEXP12M $XLISTLINEAR, cluster(COMMUNE)
estimates store olsclust
* Random effects estimation (FGLS) (Table 24.4 columns 5-6)
* This uses the xtreg command which first requires identifying the cluster
iis COMMUNE
xtreg LNEXP12M $XLISTLINEAR, re
estimates store refgls
* Note that can cluster bootstrap if desired to get more robust standard errors
* This is done at end of program
* Fixed effects estimation (FGLS) (Table 24.4 columns 7-8)
xtreg LNEXP12M $XLISTLINEAR, fe
estimates store fe
* Note that can cluster bootstrap if desired to get more robust standard errors
* This is done at end of program
* Random effects estimation by MLE assuming normality (Table 24.4 columns 5-6)
* This uses the xtreg command which first requires identifying the cluster
iis COMMUNE
xtreg LNEXP12M $XLISTLINEAR, mle
estimates store remle
* Test of the RE specification using Breusch-Pagan test
* This is statistic in third bottom row of Table 24.4
quietly xtreg LNEXP12M $XLISTLINEAR, re
xttest0
* Hausman test of FE vs. RE specification
* This test is not a robust version.
* Its validity asswumes that errors are iid after including COMMUNE-specific effect
* For this example this may be reasonable as cluster bootstrap se's close to usual se's
xthausman
* Alternative GLS estimation using the GEE approach
* Same as xtgee with family(gaussian) link(id) corr(exchangeable)
* So GLS with equicorrelated errors
xtreg LNEXP12M $XLISTLINEAR, pa
estimates store pa
********** DISPLAY TABLE 24.4 RESULTS page 851 **********
estimates table olsiid olshet olsclust, /*
*/ b(%10.3f) t(%10.2f) stats(r2 N)
estimates table pa fe refgls remle, /*
*/ b(%10.3f) t(%10.2f) stats(r2 N)
********** ADDITIONALLY DO CLUSTER BOOTSTRAPS **********
* These results not given in the text
* Output at websidet uses breps 500
global breps = 500
* Note that can bootstrap if desired to get more robust standard errors
* The first reproduces reg , cluster(COMMUNE)
bootstrap "reg LNEXP12M $XLISTLINEAR" _b, cluster(COMMUNE) reps($breps) level(95)
* The t-statistic vector is e(b)./e(se) where ./ is elt. by elt. division
* But Stata Version 8 does not do ./ so instead need the following
matrix tols = (vecdiag(diag(e(b))*syminv(diag(e(se)))))'
matrix list tols, format(%10.2f)
* The next two reproduce xtreg , cluster(COMMUNE)
* but the cluster option for xtreg is not available for Stata version 8
* For this example the cluster bootstrap se's are within 10 percent
* of the usual xtreg se's, so usual se's may be okay here
* Fixed effects estimator
bootstrap "xtreg LNEXP12M $XLISTLINEAR, fe" _b, cluster(COMMUNE) reps($breps) level(95)
matrix tfe = (vecdiag(diag(e(b))*syminv(diag(e(se)))))'
matrix list tfe, format(%10.2f)
* Random effects estimator
bootstrap "xtreg LNEXP12M $XLISTLINEAR, re" _b, cluster(COMMUNE) reps($breps) level(95)
matrix tre = (vecdiag(diag(e(b))*syminv(diag(e(se)))))'
matrix list tre, format(%10.2f)
********** CLOSE OUTPUT
log close
clear
exit