EXCEL 2007: Statistical Inference for Two-variable Regression

A. Colin Cameron, Dept. of Economics, Univ. of Calif. - Davis

  • This January 2009 help sheet gives information on
  • This handout is the place to go to for statistical inference for two-variable regression output.
     

    REGRESSION USING THE DATA ANALYSIS ADD-IN

    This requires the Data Analysis Add-in: see Excel 2007: Access and Activating the Data Analysis Add-in
    The data used are in carsdata.xls
    The method is explained in Excel 2007: Two-Variable Regression using Data Analysis Add-in

    Regression of CARS on HH SIZE led to the following Excel output:

    Two-variable regression output

    The regression output has three components:


    INTERPRET REGRESSION STATISTICS TABLE
     


    Explanation
    Multiple R 0.894427 R = square root of R2
    R Square 0.8 R2 = coefficient of determination
    Adjusted R Square 0.733333 Adjusted R2 used if more than one x variable
    Standard Error 0.365148 This is the sample estimate of the standard deviation of the error u
    Observations 5 Number of observations used in the regression (n)

    The Regression Statistics Table gives the overall goodness-of-fit measures:
          R2 = 0.8
          Correlation between y and x is 0.8944 (when squared gives correlation squared = 0.8 = R2 ).
          Adjusted R2  is discussed later under multiple regression.

    The standard error here refers to the estimated standard deviation of the error term u.
    It is sometimes called the standard error of the regression. It equals sqrt(SSE/(n-k)).
    It is not to be confused with the standard error of y itself (from descriptive statistics) or with the standard errors of the regression coefficients given below.
     

    INTERPRET ANOVA TABLE
     

    df SS MS F Signifiance F
    Regression 1 1.6 1.6 12 0.04519
    Residual 3 0.4 0.133333

    Total 4 2.0


    The ANOVA (analysis of variance) table splits the sum of squares into its components.

    Total sums of squares
    = Residual (or error) sum of squares + Regression (or explained) sum of squares.

    Thus Σ i (yi - ybar)2 = Σ i (yi - yhati)2 + Σ i (yhati - ybar)2
    where yhati is the value of ypredicted from the regression line
    and ybar is the sample mean of  y.

    For example:
    R2 = 1 - Residual SS / Total SS    (general formula for R2)
         = 1 - 0.4/2.0            (from data in the ANOVA table)
         = 0.8                        (which equals R2 given in the regression Statistics table).

    The remainder of the ANOVA table is described in more detail in Excel: Multiple Regression.
     

    INTERPRET REGRESSION COEFFICIENTS TABLE
     

    Coefficient Standard error t Stat P-value Lower 95% Upper 95%
    Intercept      0.8 0.38297 2.089 0.1279 -0.4188 2.0188
    HH SIZE
         0.4 0.11547 3.464 0.0405 0.0325 0.7675

    The population regression model is:    y = β1 + β2 x + u
    where the error u is assumed to be distributed independently with mean 0 and constant variance.

    Here we focus on inference on β2, using the row that begins with hh size.
    Similar interpretation is given for inference on β1, using the row that begins with intercept.

    The column "Coefficient" gives the least squares estimates of β1 and β2.

    The column "Standard error" gives the standard errors (i.e.the estimated standard deviation) of the least squares estimate of β1 and β2 .

    The second row of the column "t Stat" gives the computed t-statistic for H0: β2 = 0 against Ha: β2 ≠ 0.
    This is the coefficient divided by the standard error: here 0.4 / 0.11547 = 3.464.
    It is compared to a T distribution with (n-k) degrees of freedom where here n = 5and k = 2.

    The column "P-value" gives for hh size are for H0: β2 = 0 against Ha: β2 ≠ 0.
    This equals the Pr{|T| > t-Stat}where T is a T-distributed random variable with n-k degreres of freedom and t-Stat is the computed value of the t-statistic given is the previous column.
    Note that this P-value is for a 2-sided test.
    For a 1-sided test divide this P-value by 2 (also checking the sign of the t-Stat).

    The columns "Lower 95%" and "Upper 95%" values define a 95% confidence interval for β2.

    A simple summary of the above output is that


    CONFIDENCE INTERVALS FOR SLOPE COEFFICIENT

    95% confidence interval for slope coefficient β2 is from Excel output (.0325, .7675).

    Excel computes this as
       b2 ± t_.025(3) × se(b2)
    = 0.8 ± TINV(0.05, 3) × 0.11547
    = 0.8 ± 3.182 × 0.11547
    = 0.8 ± .367546
    = (.0325,.7675).

    Other confidence intervals can be obtained.
    For example, to find 99% confidence intervals: in the Regression dialog box (in the Data Analysis Add-in),
    check the Confidence Level box and set the level to 99%.


    TEST HYPOTHESIS OF ZERO SLOPE COEFFICIENT ("TEST OF STATISTICAL SIGNIFICANCE")

    Excel automatically gives output to make this test easy.

    Consider test H0: β2 = 0 against Ha: β2 ≠ 0 at significance level α = .05.

    Using the p-value approach

    Using the critical value approach

    If instead one-sided tests are performed, we need to adjust the above.


    TEST HYPOTHESIS OF SLOPE COEFFICIENT EQUAL TO VALUE OTHER THAN ZERO

    For non-zero hypthesized value of the slope parameter we need to manually do the computations.
     
    Consider test H0: β2 = 1 against Ha: β2 ≠ 1 at significance level α = .05.
    This is that an extra household member means an extra car.

    Then
      t = (b2 - H0 value of β2) / (standard error of b2 )
        = (0.4 - 1.0) / 0.11457
        = -5.196.

    Using the p-value approach

    Using the critical value approach


    FITTED VALUES AND RESIDUALS FROM REGRESSION LINE

    Fitted values and residuals from the regression line.
      
    y =CARS x = HH SIZE
    yhat = 0.8+0.4*x e = y - yhat
       1    1     1.2    -.2
       2    2     1.6    0.4
       2    3     2.0    0.0
       2    4     2.4    -.4
       3    5     2.8    0.2

    These can be obtained using the Regression dialog box in the Data Analysis regression Add-in,
    by checking the Residuals box.

    For further information on how to use Excel go to
         http://cameron.econ.ucdavis.edu/excel/excel.html