* MMA27P3MILOGIT.DO January 2007 for Stata version 8.0 * based on logitmcar.do clear capture log close log using mma27p3milogit.txt, text replace ********** OVERVIEW OF MMA27P3MILOGIT.DO ********** * STATA Program by A. Colin Cameron and Pravin K. Trivedi (2005) for * "Microeconometrics: Methods and Applications, Cambridge University Press * Chapter 27.8.2 pp. 937-939 Missing Data Imputation in a Logit Model * This program creates the first three columns of Tables 27.5-27.6 * and it creates the data sets analyzed by SAS for multiple imputations * To give the remaining columns of Tables 27.5-27.6 * There are four cases * 1: 10% missing rho=0.64 for Table 27.5 and mma27logit1.asc * 2: 25% missing rho=0.64 for mma27logit2.asc * 3: 10% missing rho=0.36 for mma27logit3.asc * 4: 35% missing rho=0.36 for Table 27.6 and mma27logit4.asc * THIS PROGRAM DIFFERS FROM THE PROGRAM THAT CREATED THE TABLE GIVEN IN THE BOOK. * IT USES A DIFFERENT SEED LEADING TO DIFFERENT DATA SETS * The created data are then analyzed using MMA27P4MILOGIT.SAS * to construct the remaining columns of Tables 27.5-27.6 ********** SETUP ********** set more off version 8.0 set scheme s1mono /* Graphics scheme */ ********** SIMULATION OVERVIEW ********** * The data generating process is logit with * y = 1(ystar > 0) * ystar = constant + x1 + x2 + u, * x1, x2 ~ bivariate normal with covariance matrix(1,rho\rho,1) * u ~ logistic with variance pi^2/3 * N = 1000 * The missing data process is * 10% (or 25%) of x1 are randomly missing * 10% (or 25%) of x2 are randomly missing * They are not necessary to be missing on the same observation. * Note that estimated model will give * estimated coefficients -1/sqrt(p1^2/3) equals -0.551 approx. ************ PROGRAM TO CREATE AND ANALYZE MISSING DATA *********** * This program has four arguments * `1' is rho - correlation between x1 and x2 * `2' is percentage nonmissing (so 100 - `2' is percentage missing) * `3' is the number for the data set created * `4' is the variance of u set so that R^2 = 0.25 in true OLS regression * The program * creates a missing data set * estimates using listwise deletion and mean imputation * writes out data set for later multiple imputation by SAS capture program drop missing program define missing /* (1) Create complete data set */ di clear set obs 1000 /* set sample size*/ matrix covvar = (1,`1' \ `1',1) /* set covariance matrix for x1, x2*/ matrix means = (0,0) /* set mean for x1, x2*/ drawnorm x1 x2, seed(123) cov(covvar) means(means) /* draw x1, x2*/ sum x1 x2 /* check x1, x2 corectly drawn*/ corr x1 x2 gen u = sqrt(_pi^2/3)*logit(uniform()) /* draw logistic error u */ sum u /* check draws of u*/ gen cons = 1 gen ystar = x1 + x2 + u + cons /* generate ystar */ gen y = 0 /* generate y*/ replace y=1 if ystar<=0 gen id = _n sort id save x1x2uy.dta, replace /* (2) Create data set with some observations missing */ use x1x2uy.dta, clear /* randomly set 100-`2' % of x1 missing*/ keep x1 gen id=_n sample `2' sort id rename x1 x1missing /* rename resulting x1 as x1missing*/ save x1.dta, replace use x1x2uy.dta, clear /* randomly set 100-`2' % of x2 missing*/ keep x2 gen id=_n sample `2' sort id rename x2 x2missing /*rename resulting x2 as x2missing*/ save x2.dta, replace use x1x2uy, clear /* merge x1missing and x2missing */ sort id merge id using x1 rename _merge merge1 sort id merge id using x2 /* (3) Create the first three columns of Tables 27.5-27.6 */ /* OLS with no data missing */ di _n "Column 1: OLS with no data missing" logit y x1 x2 /* OLS with listwise deletion of missing data */ di _n "Column 2: OLS with listwise deletion of missing data" logit y x1missing x2missing /* OLS with mean imputation of missing data */ /* Generate mean imputations of x1 and x2 */ gen x1meanimpute=x1missing gen x2meanimpute=x2missing sum x1missing replace x1meanimpute=r(mean) if x1meanimpute==. sum x2missing replace x2meanimpute=r(mean) if x2meanimpute==. di _n "Column 3: OLS with mean imputation of missing data" logit y x1meanimpute x2meanimpute /* Save data for later SAS multiple imputation use */ /* save x1x2missuy.dta, replace */ outfile y x1missing x2missing using mma27logit`3'.asc, replace clear end ************ RUN THE PROGRAM TO CREATE SEVERAL MISSING DATA SETS *********** * This program has four arguments * `1' is rho - correlation between x1 and x2 * `2' is percentage nonmissing (so 100 - `2' is percentage missing) * `3' is the number for the data set created * e.g. the first will be mma27lineardata1.asc * `4' is the variance of u set so that R^2 = 0.25 in true OLS regression * Table 27.5 missing 0.64 90 1 10 /* Case 1: high correlation and low missing */ * Not tabulated missing 0.64 75 2 10 /* Case 2: high correlation and high missing */ * Not tabulated missing 0.36 90 3 10 /* Case 3: low correlation and low missing */ * Table 27.6 missing 0.36 75 4 10 /* Case 4: low correlation and high missing */ ********** CLOSE OUTPUT ********** log close clear exit