Michael Wolf from the University of Zurich asks the following brilliant OVB question:
Say the long regression of interest is
(1) yi =α+ρsi +γ1MOi +γ2IQi +vi . (1)
Here, MO stands for motivation and IQ stands for intelligence. In your notation then, Ai = (MOi, IQi)′ and γ = (γ1, γ2)′.
In practice, motivation and intelligence are not observed and one estimates the short regression
y i = α + ρ s i + η i ,
w i t h
(2) η i = A ′i γ + v i .
Since si is correlated with the error term vi (unless si is uncorrelated with Ai or γ = 0), the short regression has OVB. So far so good.
But then you claim that if one estimates the short regression with IV using an suitable instrument xi that is uncorrelated with both the omitted control variables MOi and IQi (and thus uncorrelated with the error term ηi), on the one hand, and correlated with the regressor si, on the other hand, one can estimate ρ consistently. Here, I have some doubt.
What if instead of (1), the long regression of interest were ‘only’ (3) y i = α* + ρ*s i + γ I Q i + u i .
So here Ai = IQi. Since the instrument xi is uncorrelated with the (single) omitted control variable IQi, then estimating the short regression (2) with IV using the same instrument xi also results in a consistent estimator of ρ *, according to your logic. But this would seem a contradiction, since ρ* differs fromρ.
Yikes! A worrisome contradiction indeed . . . or so it would seem.
But the regressions in our discussion are linked with causal parameters, and that makes all the difference.
We start with 4.1.1, which defines a constant linear causal effect. So LATE = rho in this setup. The A (ability) variable at the top of page 116 is not a generic omitted variable but its a (set of) omitted variable(s) “that give a selection on observables story… the variables A are assumed to be the only reason eta and S are correlated, so that E[Sv]=0.” In other words, the regression of log wages on S and A produces the constant causal effect as coefficient on rho. This is not generic; its an assumption.
Some other OLS regression, which controls for only say a subset of the variables in A (assuming A is multivariate) does not produce the same rho, as Michael rightly notes. But IV is indifferent to the various OLS regs you’re thinking about running. We have anchored the IV parameter by making regression 4.1.2 causal and arguing that it is this rho that IV uncovers.
Another way to put this: Given our assumptions, LATE is rho in 4.1.1. What OLS regression produces this same parameter? Only the one including the controls required for selection on observables. Since Michael’s equation (3) is inadequately controlled, it won’t generate the same rho. How to see this in the math? It’s subtle. Take the residual, eta, in 4.1.1, and regress that on the IQ variable that appears in equation (3). The residual from this is orthogonal to IQ of course. But since our A is Michael’s [MO, IQ], its not orthogonal to S because we must control for MO as well as IQ to get orthogonality with S. Therefore the schooling coefficient in (3) in is not the schooling coefficient in (1) or in our causal model, 4.1.1 and 4.1.2.
Finally, we’re led to conjecture that if the OVB concept was easy and obvious, econometricians would spend more time explaining it.
The Cosmic Allness of OVB
Michael Wolf from the University of Zurich asks the following brilliant OVB question:
Say the long regression of interest is
(1) yi =α+ρsi +γ1MOi +γ2IQi +vi . (1)
Here, MO stands for motivation and IQ stands for intelligence. In your notation then, Ai = (MOi, IQi)′ and γ = (γ1, γ2)′.
In practice, motivation and intelligence are not observed and one estimates the short regression
y i = α + ρ s i + η i ,
w i t h
(2) η i = A ′i γ + v i .
Since si is correlated with the error term vi (unless si is uncorrelated with Ai or γ = 0), the short regression has OVB. So far so good.
But then you claim that if one estimates the short regression with IV using an suitable instrument xi that is uncorrelated with both the omitted control variables MOi and IQi (and thus uncorrelated with the error term ηi), on the one hand, and correlated with the regressor si, on the other hand, one can estimate ρ consistently. Here, I have some doubt.
What if instead of (1), the long regression of interest were ‘only’
(3) y i = α* + ρ*s i + γ I Q i + u i .
So here Ai = IQi. Since the instrument xi is uncorrelated with the (single) omitted control variable IQi, then estimating the short regression (2) with IV using the same instrument xi also results in a consistent estimator of ρ *, according to your logic. But this would seem a contradiction, since ρ* differs from ρ.
We start with 4.1.1, which defines a constant linear causal effect. So LATE = rho in this setup. The A (ability) variable at the top of page 116 is not a generic omitted variable but its a (set of) omitted variable(s) “that give a selection on observables story… the variables A are assumed to be the only reason eta and S are correlated, so that E[Sv]=0.” In other words, the regression of log wages on S and A produces the constant causal effect as coefficient on rho. This is not generic; its an assumption.
Some other OLS regression, which controls for only say a subset of the variables in A (assuming A is multivariate) does not produce the same rho, as Michael rightly notes. But IV is indifferent to the various OLS regs you’re thinking about running. We have anchored the IV parameter by making regression 4.1.2 causal and arguing that it is this rho that IV uncovers.
Another way to put this: Given our assumptions, LATE is rho in 4.1.1. What OLS regression produces this same parameter? Only the one including the controls required for selection on observables. Since Michael’s equation (3) is inadequately controlled, it won’t generate the same rho. How to see this in the math? It’s subtle. Take the residual, eta, in 4.1.1, and regress that on the IQ variable that appears in equation (3). The residual from this is orthogonal to IQ of course. But since our A is Michael’s [MO, IQ], its not orthogonal to S because we must control for MO as well as IQ to get orthogonality with S. Therefore the schooling coefficient in (3) in is not the schooling coefficient in (1) or in our causal model, 4.1.1 and 4.1.2.