--------------------------------------------------------------------------------- log: c:\Imbook\bwebpage\section6jan2007\mma27p1milinear.txt log type: text opened on: 30 Jan 2007, 21:42:45 . . ********** OVERVIEW OF MMA27P1MILINEAR.DO ********** . . * STATA Program by A. Colin Cameron and Pravin K. Trivedi (2005) for . * "Microeconometrics: Methods and Applications, Cambridge University Press . . * Chapter 27.8.1 pp. 936-937 Missing Data Imputation in a Linear Model . . * This program creates the first three columns of Tables 27.2-27.4 . * and it creates the data sets analyzed by SAS for multiple imputations . * To give the remaining columns of Tables 27.2-27.4 . . * There are four cases . * 1: 10% missing rho=0.64 for Table 27.5 and mma27linear1.asc . * 2: 25% missing rho=0.64 for Table 27.5 and mma27linear2.asc . * 3: 10% missing rho=0.36 for Table 27.5 and mma27linear3.asc . * 4: 35% missing rho=0.36 for Table 27.5 and mma27linear4.asc . . * THIS PROGRAM DIFFERS FROM THE PROGRAM THAT CREATED THE TABLE GIVEN IN THE B > OOK. . * IT USES A DIFFERENT SEED LEADING TO DIFFERENT DATA SETS . . * The created data are then analyzed using MMA27P2MILINEAR.SAS . * to construct the remaining columns of Tables 27.2-27.4 . . ********** SETUP ********** . . set more off . version 8.0 . set scheme s1mono /* Graphics scheme */ . . ********** SIMULATION OVERVIEW ********** . . * The data generating process is . * y = 1 + x1 + x2 + u, . * x1, x2 ~ bivariate normal with covariance matrix(1,rho\rho,1) . * u ~ normal( with variance set so R^2 = 0.25 in the true OLS regression . * N = 1000 . * The missing data process is . * 10% (or 25%) of x1 are randomly missing . * 10% (or 25%) of x2 are randomly missing . * They are not necessary to be missing on the same observation. . . ************ PROGRAM TO CREATE AND ANALYZE MISSING DATA *********** . . * This program has four arguments . * 1' is rho - correlation between x1 and x2 . * 2' is percentage nonmissing (so 100 - 2' is percentage missing) . * 3' is the number for the data set created . * 4' is the variance of u set so that R^2 = 0.25 in true OLS regression . . * The program . * creates a missing data set . * estimates using listwise deletion and mean imputation . * writes out data set for later multiple imputation by SAS . . capture program drop missing . . program define missing 1. . /* (1) Create complete data set */ . di 2. clear 3. set obs 1000 /* set sample size*/ 4. matrix covvar = (1,1' \ 1',1) /* set covariance matrix for x1, x2*/ 5. matrix means = (0,0) /* set mean for x1, x2*/ 6. drawnorm x1 x2, seed(123) cov(covvar) means(means) /* draw x1, x2*/ 7. sum x1 x2 /* check x1, x2 corectly drawn*/ 8. corr x1 x2 9. drawnorm u, seed(1234) means(0) cov(4') /* draw error u*/ 10. sum u /* check draws of u*/ 11. gen cons = 1 12. gen y = x1 + x2 + u + cons /* generate y*/ 13. gen id = _n 14. sort id 15. save x1x2uy.dta, replace 16. . /* (2) Create data set with some observations missing */ . use x1x2uy.dta, clear /* randomly set 100-2' % of x1 missing*/ 17. keep x1 18. gen id=_n 19. sample 2' 20. sort id 21. rename x1 x1missing /* rename resulting x1 as x1missing*/ 22. save x1.dta, replace 23. use x1x2uy.dta, clear /* randomly set 100-2' % of x2 missing*/ 24. keep x2 25. gen id=_n 26. sample 2' 27. sort id 28. rename x2 x2missing /*rename resulting x2 as x2missing*/ 29. save x2.dta, replace 30. use x1x2uy, clear /* merge x1missing and x2missing */ 31. sort id 32. merge id using x1 33. rename _merge merge1 34. sort id 35. merge id using x2 36. . /* (3) Create the first three columns of Tables 27.2-27.4 */ . . /* OLS with no data missing */ . di _n "Column 1: OLS with no data missing" 37. reg y x1 x2 38. . /* OLS with listwise deletion of missing data */ . di _n "Column 2: OLS with listwise deletion of missing data" 39. reg y x1missing x2missing 40. . /* OLS with mean imputation of missing data */ . /* Generate mean imputations of x1 and x2 */ . gen x1meanimpute=x1missing 41. gen x2meanimpute=x2missing 42. sum x1missing 43. replace x1meanimpute=r(mean) if x1meanimpute==. 44. sum x2missing 45. replace x2meanimpute=r(mean) if x2meanimpute==. 46. di _n "Column 3: OLS with mean imputation of missing data" 47. reg y x1meanimpute x2meanimpute 48. . /* Save data for later SAS multiple imputation use */ . /* save x1x2missuy.dta, replace */ . outfile y x1missing x2missing using mma27linear3'.asc, replace 49. clear 50. . end . . ************ RUN THE PROGRAM TO CREATE SEVERAL MISSING DATA SETS *********** . . * This program has four arguments . * 1' is rho - correlation between x1 and x2 . * 2' is percentage nonmissing (so 100 - 2' is percentage missing) . * 3' is the number for the data set created . * e.g. the first will be mma27lineardata1.asc . * 4' is the variance of u set so that R^2 = 0.25 in true OLS regression . . * Table 27.2 . missing 0.64 90 1 10 /* Case 1: high correlation and low missing */ obs was 0, now 1000 Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- x1 | 1000 -.0016071 1.003757 -4.27458 3.808294 x2 | 1000 .0081246 1.009194 -3.609674 3.751572 (obs=1000) | x1 x2 -------------+------------------ x1 | 1.0000 x2 | 0.6459 1.0000 Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- u | 1000 -.1516427 3.164734 -10.70795 9.553249 file x1x2uy.dta saved (100 observations deleted) file x1.dta saved (100 observations deleted) file x2.dta saved Column 1: OLS with no data missing Source | SS df MS Number of obs = 1000 -------------+------------------------------ F( 2, 997) = 168.94 Model | 3390.39865 2 1695.19933 Prob > F = 0.0000 Residual | 10004.3647 997 10.0344681 R-squared = 0.2531 -------------+------------------------------ Adj R-squared = 0.2516 Total | 13394.7634 999 13.4081715 Root MSE = 3.1677 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- x1 | 1.044427 .1307845 7.99 0.000 .7877827 1.301072 x2 | .9732352 .1300799 7.48 0.000 .7179734 1.228497 _cons | .8486462 .1001794 8.47 0.000 .6520595 1.045233 ------------------------------------------------------------------------------ Column 2: OLS with listwise deletion of missing data Source | SS df MS Number of obs = 812 -------------+------------------------------ F( 2, 809) = 143.70 Model | 2758.83984 2 1379.41992 Prob > F = 0.0000 Residual | 7765.72807 809 9.59916943 R-squared = 0.2621 -------------+------------------------------ Adj R-squared = 0.2603 Total | 10524.5679 811 12.9772724 Root MSE = 3.0983 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- x1missing | 1.121366 .1430648 7.84 0.000 .8405437 1.402188 x2missing | .9221814 .1442559 6.39 0.000 .6390215 1.205341 _cons | .894865 .1087522 8.23 0.000 .6813954 1.108335 ------------------------------------------------------------------------------ (100 missing values generated) (100 missing values generated) Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- x1missing | 900 -.002589 1.003696 -3.408802 3.808294 (100 real changes made) Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- x2missing | 900 .012184 .993411 -3.609674 3.751572 (100 real changes made) Column 3: OLS with mean imputation of missing data Source | SS df MS Number of obs = 1000 -------------+------------------------------ F( 2, 997) = 153.76 Model | 3157.56292 2 1578.78146 Prob > F = 0.0000 Residual | 10237.2004 997 10.2680044 R-squared = 0.2357 -------------+------------------------------ Adj R-squared = 0.2342 Total | 13394.7634 999 13.4081715 Root MSE = 3.2044 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- x1meanimpute | 1.18795 .130398 9.11 0.000 .9320638 1.443836 x2meanimpute | .9191342 .1317481 6.98 0.000 .6605988 1.17767 _cons | .8467516 .1013475 8.35 0.000 .6478727 1.045631 ------------------------------------------------------------------------------ . . * Table 27.3 . missing 0.64 75 2 10 /* Case 2: high correlation and high missing */ obs was 0, now 1000 Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- x1 | 1000 -.0016071 1.003757 -4.27458 3.808294 x2 | 1000 .0081246 1.009194 -3.609674 3.751572 (obs=1000) | x1 x2 -------------+------------------ x1 | 1.0000 x2 | 0.6459 1.0000 Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- u | 1000 -.1516427 3.164734 -10.70795 9.553249 file x1x2uy.dta saved (250 observations deleted) file x1.dta saved (250 observations deleted) file x2.dta saved Column 1: OLS with no data missing Source | SS df MS Number of obs = 1000 -------------+------------------------------ F( 2, 997) = 168.94 Model | 3390.39865 2 1695.19933 Prob > F = 0.0000 Residual | 10004.3647 997 10.0344681 R-squared = 0.2531 -------------+------------------------------ Adj R-squared = 0.2516 Total | 13394.7634 999 13.4081715 Root MSE = 3.1677 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- x1 | 1.044427 .1307845 7.99 0.000 .7877827 1.301072 x2 | .9732352 .1300799 7.48 0.000 .7179734 1.228497 _cons | .8486462 .1001794 8.47 0.000 .6520595 1.045233 ------------------------------------------------------------------------------ Column 2: OLS with listwise deletion of missing data Source | SS df MS Number of obs = 564 -------------+------------------------------ F( 2, 561) = 78.09 Model | 1584.77811 2 792.389056 Prob > F = 0.0000 Residual | 5692.33232 561 10.1467599 R-squared = 0.2178 -------------+------------------------------ Adj R-squared = 0.2150 Total | 7277.11044 563 12.9255958 Root MSE = 3.1854 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- x1missing | 1.00912 .1793531 5.63 0.000 .6568342 1.361406 x2missing | .8950129 .1783059 5.02 0.000 .5447843 1.245242 _cons | .9427479 .134133 7.03 0.000 .6792836 1.206212 ------------------------------------------------------------------------------ (250 missing values generated) (250 missing values generated) Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- x1missing | 750 -.0060357 1.008517 -3.408802 3.808294 (250 real changes made) Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- x2missing | 750 .0137276 .9989503 -3.609674 3.751572 (250 real changes made) Column 3: OLS with mean imputation of missing data Source | SS df MS Number of obs = 1000 -------------+------------------------------ F( 2, 997) = 117.87 Model | 2561.52852 2 1280.76426 Prob > F = 0.0000 Residual | 10833.2348 997 10.8658323 R-squared = 0.1912 -------------+------------------------------ Adj R-squared = 0.1896 Total | 13394.7634 999 13.4081715 Root MSE = 3.2963 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- x1meanimpute | 1.196407 .1337799 8.94 0.000 .9338849 1.45893 x2meanimpute | .9605264 .135061 7.11 0.000 .6954898 1.225563 _cons | .8489102 .1042654 8.14 0.000 .6443054 1.053515 ------------------------------------------------------------------------------ . . * Not tabulated . missing 0.36 90 3 10 /* Case 3: low correlation and low missing */ obs was 0, now 1000 Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- x1 | 1000 -.0016071 1.003757 -4.27458 3.808294 x2 | 1000 .0105351 1.007028 -2.773818 3.677286 (obs=1000) | x1 x2 -------------+------------------ x1 | 1.0000 x2 | 0.3702 1.0000 Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- u | 1000 -.1516427 3.164734 -10.70795 9.553249 file x1x2uy.dta saved (100 observations deleted) file x1.dta saved (100 observations deleted) file x2.dta saved Column 1: OLS with no data missing Source | SS df MS Number of obs = 1000 -------------+------------------------------ F( 2, 997) = 139.74 Model | 2804.48667 2 1402.24333 Prob > F = 0.0000 Residual | 10004.3647 997 10.0344682 R-squared = 0.2189 -------------+------------------------------ Adj R-squared = 0.2174 Total | 12808.8514 999 12.8216731 Root MSE = 3.1677 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- x1 | 1.035233 .1074822 9.63 0.000 .824316 1.246151 x2 | .9779566 .1071331 9.13 0.000 .7677243 1.188189 _cons | .8486462 .1001794 8.47 0.000 .6520595 1.045233 ------------------------------------------------------------------------------ Column 2: OLS with listwise deletion of missing data Source | SS df MS Number of obs = 812 -------------+------------------------------ F( 2, 809) = 117.33 Model | 2252.57307 2 1126.28653 Prob > F = 0.0000 Residual | 7765.7281 809 9.59916947 R-squared = 0.2248 -------------+------------------------------ Adj R-squared = 0.2229 Total | 10018.3012 811 12.3530224 Root MSE = 3.0983 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- x1missing | 1.094635 .1172644 9.33 0.000 .8644561 1.324813 x2missing | .935909 .1188084 7.88 0.000 .7026999 1.169118 _cons | .894865 .1087522 8.23 0.000 .6813954 1.108335 ------------------------------------------------------------------------------ (100 missing values generated) (100 missing values generated) Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- x1missing | 900 -.002589 1.003696 -3.408802 3.808294 (100 real changes made) Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- x2missing | 900 .014072 .9871184 -2.773818 3.677286 (100 real changes made) Column 3: OLS with mean imputation of missing data Source | SS df MS Number of obs = 1000 -------------+------------------------------ F( 2, 997) = 123.92 Model | 2550.13389 2 1275.06695 Prob > F = 0.0000 Residual | 10258.7175 997 10.2895863 R-squared = 0.1991 -------------+------------------------------ Adj R-squared = 0.1975 Total | 12808.8514 999 12.8216731 Root MSE = 3.2077 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- x1meanimpute | 1.130347 .1128203 10.02 0.000 .9089547 1.35174 x2meanimpute | .9394267 .114715 8.19 0.000 .7143161 1.164537 _cons | .8469922 .1014524 8.35 0.000 .6479074 1.046077 ------------------------------------------------------------------------------ . . * Table 27.4 . missing 0.36 75 4 10 /* Case 4: low correlation and high missing */ obs was 0, now 1000 Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- x1 | 1000 -.0016071 1.003757 -4.27458 3.808294 x2 | 1000 .0105351 1.007028 -2.773818 3.677286 (obs=1000) | x1 x2 -------------+------------------ x1 | 1.0000 x2 | 0.3702 1.0000 Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- u | 1000 -.1516427 3.164734 -10.70795 9.553249 file x1x2uy.dta saved (250 observations deleted) file x1.dta saved (250 observations deleted) file x2.dta saved Column 1: OLS with no data missing Source | SS df MS Number of obs = 1000 -------------+------------------------------ F( 2, 997) = 139.74 Model | 2804.48667 2 1402.24333 Prob > F = 0.0000 Residual | 10004.3647 997 10.0344682 R-squared = 0.2189 -------------+------------------------------ Adj R-squared = 0.2174 Total | 12808.8514 999 12.8216731 Root MSE = 3.1677 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- x1 | 1.035233 .1074822 9.63 0.000 .824316 1.246151 x2 | .9779566 .1071331 9.13 0.000 .7677243 1.188189 _cons | .8486462 .1001794 8.47 0.000 .6520595 1.045233 ------------------------------------------------------------------------------ Column 2: OLS with listwise deletion of missing data Source | SS df MS Number of obs = 564 -------------+------------------------------ F( 2, 561) = 63.73 Model | 1293.25167 2 646.625835 Prob > F = 0.0000 Residual | 5692.33236 561 10.14676 R-squared = 0.1851 -------------+------------------------------ Adj R-squared = 0.1822 Total | 6985.58403 563 12.4077869 Root MSE = 3.1854 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- x1missing | .9730563 .1480736 6.57 0.000 .6822099 1.263903 x2missing | .9135332 .1468518 6.22 0.000 .6250866 1.20198 _cons | .9427479 .134133 7.03 0.000 .6792836 1.206212 ------------------------------------------------------------------------------ (250 missing values generated) (250 missing values generated) Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- x1missing | 750 -.0060357 1.008517 -3.408802 3.808294 (250 real changes made) Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- x2missing | 750 .0107041 1.001285 -2.773818 3.677286 (250 real changes made) Column 3: OLS with mean imputation of missing data Source | SS df MS Number of obs = 1000 -------------+------------------------------ F( 2, 997) = 91.81 Model | 1992.09242 2 996.046211 Prob > F = 0.0000 Residual | 10816.759 997 10.8493069 R-squared = 0.1555 -------------+------------------------------ Adj R-squared = 0.1538 Total | 12808.8514 999 12.8216731 Root MSE = 3.2938 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- x1meanimpute | 1.10629 .1233564 8.97 0.000 .8642216 1.348358 x2meanimpute | .9388805 .1242473 7.56 0.000 .6950643 1.182697 _cons | .8539126 .1041736 8.20 0.000 .649488 1.058337 ------------------------------------------------------------------------------ . . ********** CLOSE OUTPUT ********** . log close log: c:\Imbook\bwebpage\section6jan2007\mma27p1milinear.txt log type: text closed on: 30 Jan 2007, 21:42:45 -------------------------------------------------------------------------------