… in the the Andean foothills, Chile, November 2011
Covariate Contradiction?
Thoughtful reader Nikhil from UBC asks:
I had a question regarding LATE. In your book you say in a model with covariates, 2SLS leads to a sort of "covariate averaged LATE" even when one does not have a saturated model. Does this mean that as one introduces covariates the 2SLS estimator is most likely to change and that change in the 2SLS estimate is not a comment on the validity of the instrument?However in your empirical examples you seem to suggest that invariance of 2SLS estimates to introduction of covariates is a desirable thing.For example in the first paragraph on pg-152 of Chapter 4, below Table 4.6.1, you state, "The invariance to covariates seems desirable: since the same-sex instrument is essentially independent of the covariates, control for covariates is unnecessary to eliminate bias and should primarily affect precision." Essentially my question is: should I start worrying if I see my 2SLS estimates change as I introduce more covariates in my model? Thanks Wow, awesome question! MHE is indeed a little fast and loose on this. Let me take a stab at clarification. In Section 4.6.2, we talk about how models with covariates can be understood as generating a weighted average of cov-specific LATEs across covariate cells. True enough ... if the the instrument is discrete and the first stage saturates (includes a full set of covariate interactions). So far so good. Of course, in practice, you might not want to saturate. OK, so do Abadie kappa weighting and get the best-in-class linear approx to the fully saturated model. Too lazy to do Abadie? Just do plain old 2SLS, and that will likely be close enough to a more rigorously justified approx or weighted average. Later, however, as Nikhil notes - below table 4.6.1 and on the following page - we express relief (or satisfaction at least) when IV estimates come out insensitive to covariates (using samesex) on the grounds that samesex is independent of covs. Contradiction? Marginal LATE, that is, LATE with no covs, is also a weighted average of covariate-specific LATE. The weight here is the histogram of X (convince yourself of this using the law of iterated expectations). Now, sticking the covariates in and saturating (where we start in 4.5.2) produces a weighted average with different, more complex, weighting scheme (instead of the histogram of X as for marginal LATE, it's the histogram times the variance of conditional-on-covs first-stage, as in Thm 4.5.1). In practice, tho, w/o too much heterogeneity, we don't expect weighting this way or that to be a big deal. On the other hand, even under constant effects, covs may matter big time when there's substantial omitted variables. bias. Seeing that randomly assigned instrument generates IV estimates invariant to covs makes me happy - as always, its the OVB I worry about first! So to be specific - Nikhil asks if he should worry when IV ests are sensitive to covs - I'd say, yes, worry a little. Try to figure out if what you thought was a good instrument is in fact highly confounded with covariates. If so, its maybe not such a great experiment after all. If not, then perhaps the senitivity you're seeing is just a difference in weighting schemes at work JA
Why are There So Many Dummies?
Lina from Essex writes: When talking about grouped data and 2SLS (section 4.1.3) you mention that expanding a continuous instrument is equivalent to have a set of Wald estimators that consistent estimates the causal effect of interest and in the Vietnam paper you mention that using the whole set of dummies as instruments is more efficient. I was wondering whether using grouped data and instrumenting by the set of dummies for different values of the continuous instrument differ from using the continuous instrument (i.e. in your case using the continuous RSN). Is there any gain of efficiency in the estimation? or is it just to interpret the result under the set of Wald estimators? In other words. If you have the continuous instrument why would you expand it? and have over identification?. Thank you very much!!!, all the best. Good question Lina. One answer is the conceptual appeal of putting together Wald estimators. Takes the mystery out of 2SLS! But there is a more formal argument for dummying out intervals of a continuous instrument and then doing 2SLS with the dummies. As discussed in Section 4.1.3, in a homoskedastic constant-effects model with a continuous instrument, the efficient method of moments estimator uses the (unknown) E[D|Z] as an instrument, where D is the variable to be instrumented and Z is the continuous instrument. You can think of a model with many dummies for intervals of Z as a nonparametric approximation to this efficient but infeasible procedure. Just using Z itself as an instrument would be a ** parametric ** approx and therefore, perhaps, not as good. Of course, you could add polynomials in Z for a similar nonparametric flavor, but the first stage would be ugly, and as you conjecture, we would lose the conceptual appeal of combining Wald estimators. My 1990 draft lottery paper shows this reasoning in action. See Newey (1990) for the theory. JA
Regression what?!
Matt from Western Kentucky U comments on Chapter 3. . .
Question: You state:
“Our view is that regression can be motivated as a particular sort of
weighted matching estimator, and therefore the differences between
regression and matching estimates are unlikely to be of major
empirical importance” (Chapter 3 p. 70)
I take this to mean that in a ‘mostly harmless way’ regular OLS
regression is in fact a method of matching, or is a matching
estimator. Is that an appropriate interpretation? In ‘The Stata
Journal and his blog, Andrew Gelman takes issue with my understanding,
he states:
“A casual reader of the book might be left with the unfortunate
impression that matching is a competitor to regression rather than a
tool for making regression more effective.”
Any guidance?
Well Matt, Andrew Gelman’s intentions are undoubtedly good but I’m afraid he risks doing some harm here. Suppose you’re interested in the effects of treatment, D, and you have a discrete control variable, X, for a selection-on-observables story. Regress on D an a full set of dummies (i.e., saturated) model for X. The resulting estimate of the effect of D is equal to matching on X, and weighting across covariate cells by the variance of treatment conditional on X, as explained in Chapter 3. While you might not always want to saturate, any other regression model for X gives the best linear approx to this version subject to whatever parameterization you’re using.
This means that i can’t imagine a situation where matching makes sense but regression does not (though some my say that I’m known for my lack of imagination when it comes to econometric methods)
JA
High Fashion at the Spring Meeting of Young Economists
as seen at the University of Groningen . . .what a good-lookin crew!
what are they spelling? I wish I knew
good eye!
Hui Cao from China caught this one . . . and it’s in the “corrected printing” to boot!
On 6/21/11 12:44 PM, 晖 曹 wrote:
On page 75: [p(Xi=x|Di=1)(1-p(Xi=x|Di=1)] should be [p(Di=1|Xi=x)(1-P(Di=1|Xi=x)}
Yes indeed!
Fixed effects, lagged dependent variables, or what?
Arzu Kibris asks Question: Suppose you have data on turnout at the county level for two periods, t and t-1, and suppose there has been a change in the electoral threshold in some states from t-1 to t. You want to analyze whether this change has effected turnout. Because counties are clustered within states, you think there might be some unobserved state level effects which you control with state dummies. In models of electoral behavior, to capture habit, lagged dependent variable is also included as a control. So, in this example turnout at t-1 is included as an independent variable. Would such a model be considered a fixed effects model with lagged dependent variable? It does not include county-level fixed effects , nonetheless province level fixed effects are still accounted for. I could not place such models in your discussion. I would really appreciate if you could clarify.
Interesting question Arzu. First thing I would do to clarify the problem
is focus on the source of variation. If the law changes of interest are
at the state level, then that’s where the action is. You want to control
for state effects since thats the source of OVB. You can control for county
effects but counties are a red herring once you’ve got states under control.
It sounds like you are trying to have both fixed effects and lagged dependent
variables. I don’t find the idea of lagged dependent variables very appealing
in state-level DD. Its hard to see why the lagged dependent variable is a
primary source of OVB, while there are almost surely time-invariant state effects
to worry about.
Good luck with your project!
JA
AP F-stat . . . one more time!
One more correction here folks. If you follow our prescriptions you won’t get the degrees of freedom for the F-stat right. ivreg2 gets this right for you (thanks to Jenny Hunt for pointing out this discrepancy and to Mark Schaffer from the ivreg team for resolving it).
But if you want to impress your friends and do it longhand . . .
Suppose you have two endogenous variables x1 and x2, and you want the AP F-stat for x2. There is also a third (exogenous) covariate x3 and you have q instruments, z1, z2, …, zq. Using ivreg2 you would run
ivreg2 y (x1 x2 = z*) x3, ffirst
To do this manually in Stata, try:
local q = 3 /* or whatever the number of your instruments is */
reg x1 z* x3
predict double x1hat
reg x2 x1hat x3
predict double x2res, resid
reg x2res z* x3
testparm z*
dis r(F)*(`q’/(`q’-1))
The testparm procedure in the second last line will produce an F-test with q numerator degrees of freedom. But you lost one dof for partialling out x1 (you need one of your q instruments to identify x1), so your correct number of dof is q-1. The last line in the code fixes the dofs, and should produce the same answer you get from ivreg2. If you follow the procedure in MHE, p. 217-18 by first partialling out x3 from the instruments you wouldn’t have an x3 anymore in your last regression you run for the F-test. In this case, you will also have to fix the denominator dofs for the F-test, to adjust for the dofs lost due to the exogenous covariates.
SP
QOB Qonfusion
Ilyssa wonders
Question: In Table 4.1.1 (p. 124), how are there 30 instruments in Column 8 rather than 27 (= 3 qob dummies * 9 year of birth dummies)?
Why indeed? There are still 3 QOB main effects.
JA