• Welcome to the MHE Blog. We'll use this space to post corrections and comments and to address any questions you might have about the material in Mostly Harmless Econometrics. We especially welcome questions that are likely to be of interest to other readers. The econometric universe is infinite and expanding, so we ask that questions be brief and relate to material in the book. Here is an example:
    "In Section 8.2.3, (on page 319), you suggest that 42 clusters is enough for the usual cluster variance formula to be fairly accurate. Is that a joke, or do you really think so?"
    To which we'd reply: "Yes."

Typo on p. 136

Careful reader Christian Perez notes that on page 136 the parameter to be estimated in equation 4.1.15 is \rho and not \e as stated in the text.  Thanks Chris!

Published | Tagged | 1 Comment

Multivariate first stage F . . . NOT

This just in from the ivreg2 team (Chris Baum, Mark Schaffer, and Steve Stillman):

How should you construct a first stage F stat to measure instrument strength when you have more than one endogenous variable?  Not by following the instructions we gave at the bottom of page 218.  Althought the theoretical expressions that motivate the p. 218 procedure are right, the computational algorithm we gave is not.

Specifically, where it says:

“regress the first-stage fitted values (for x_2) on the other first-stage fitted values and any exogenous covs . . .”

it should read:

“regress x_2 on the other first stage fitted values and any exogenous covs . . .”

If you do what we originally wrote, you’ll get an R2 of one, always a cause for concern.

Thanks guys for cleaning this up!

Published | Tagged , | 1 Comment

Bbbbb . . . bivariate probit!

Raphael Studer from Switzerland noticed that the bivariate probit likelihood on page 199 looks suspiciously like the likelihood for old fashioned monaural probit.

Thanks Raphael – this is indeed the wrong likelihood, so don’t try to maximize that at home, folks.  It works only if you don’t have an endogenous regressor in the first place.  For the correct biprobit likelihood, see, e.g., pp. 849-851 in Greene (2007) or better yet, just do it in stata using biprobit (if you must).

This of course raises the question of how we came to make such a mistake.  Is it because Angrist has such a strong aversion to latent-index models that he couldn’t stand the sight of the full likelihood?  Or is it just another silly mistake Steve missed in galleys?

Published | Tagged | Leave a comment

Adding lagged dependent variables to differenced models

Reader Christopher Ordowich asks:

In sections 5.3-5.4, there is a great discussion of using
fixed effects vs. a lagged dependent variable with panel data. I am
having trouble reconciling some of this discussion with a section in a
recent paper by Imbens and Wooldridge (2008) titled “Recent
Developments in the Econometrics of Program Evaluation.” On page 68 of
their paper (as published by IZA in 2008) they suggest that it might
be better in some circumstances with two periods of data to use first
differencing and a lag of the dependent variable (assuming
unconfoundedness given lagged outcomes). I understand your discussion
of instrumenting for lagged variables if you have more than two
periods, but with two periods, how do you react to adding a lag (the
baseline value of the dependent variable) after first differencing
with only two periods of data? I have had difficulty finding support
for this approach elsewhere and given that you have given much thought
to this issue, I was wondering what your opinion might be.

The way I see it, once you add a lagged dependent variable to a differenced model, you are really doing lagged-dep-var control and not fixed effects.  Steve may disagree (he’s generally less dogmatic than me).  This is not always exactly true but it is a theorem for the simple example we use to contrast f.e. and lagged-dep-var control in Section 5.4

Here’s that again:

two periods
no covariates
the treatment, D_it, is zero for everybody in period 1 and switched on for some in period 2 (think of a training program that some people participate in between periods; period 1 is before, period 2 is after (similar to Ashenfelter and Card, 1985)

ignoring constants,  fixed effects estimation fits

(1) Y_it – Y_it-1 = aD_it + error

lagged dependent variable estimation fits

(2) Y_it = gY_it-1 + bD_it + error

As I understand it, the Imbens-Wooldridge proposal is to throw Y_it-1 into equation (1):

(3) Y_it – Y_it-1 = dY_it-1 + cD_it + error

But in this case, c is (algebraically) the same as b.  Why ? The coefficient c is

c= COV(Y_it – Y_it-1, D_it*)/V(D_it*)

where D_it* is the residual from a regression of D_it on Y_it-1.  But this residual is orthogonal to Y_it-1, hence

c= COV(Y_it – Y_it-1, D_it*)/V(D_it*) = COV(Y_it, D_it*)/V(D_it*) = b in equation (2)

So I say: “You wanna do fixed effects?  no lagged dependent variable, please (or at least be prepared to instrument it if you include one).  You wanna control for  lagged dependent variables?  Then, just do it!

— JDA

Published | Tagged , | Leave a comment

Typo on page 130

Well, we like the occasional casual relationship as much as the next guy, but on page 130 the relationship between draft-eligibility and earnings is meant to be causal . . .

Thanks to Peter Dizikes for pointing this out!

Published | Tagged | Leave a comment

In good company at The Economist Book Shop

economistbookshop_aug20091

Taken August 28, 2009, really!

Published | | Leave a comment

Typo on page 174

Hendrik Juerges from the University of Mannheim caught this one:
Bottom of page 174
— should read: “where rho_1 is LATE using …”
— not: “where psi_1 is LATE using …”

many thanks Hendrik!

Published | Tagged | Leave a comment

Is 2SLS really OK?

Elias Dinas from EUI asks: In section 4.6.1 you explain very clearly the problems from the straightforward use of the 2SLS logic in binary choice and/orendogenous treatment models. You also provide a simple ‘linearized’ alternative but this is useful at the cost of introducing back-door identifying information. It so happens that I have a continuous Y a binary D, instrumented with two Zs (one binary the other continuous). I guess that if Y was also a dummy, MLE could provide consistent estimates for the average effect (following wooldridge 2003:478). However, in this case, I think I am left with two alternatives: 2-stage probit least squares (the cdsimeq command in stata) whose second stage however seems to belong in the fobidden regressions family, and the ‘linearized’ 2-Stage solution you suggest in the book. So my question is should I prefer one over the other or even consider a third option? Thank you very much for your help and looking forward for your reply. Elias Dinas

Thanks for your question Elias.

Section 4.6.1 discusses two approaches to 2SLS with a dummy endogenous variable, forbidden (plug-in) regression and the use nonlinear fitted values as instruments, neither of which we really like. Rather, as suggested by our discussion of nonlinear models with endogenous regressors in Section 6.4.3 (LDV reprise), we think you should use garden-variety 2SLS (IV) for dummy endogenous variables (as always; of course you can try fancier methods in the privacy of your own home, but this is what we like to see in published papers). With a single Bernoulli instrument IV gives you LATE; with two Bernoulli instruments, you get a weighted average of the two underlying LATEs. When one instrument is continuous, the weighting is a little trickier (see, e.g., the “fish paper”). But my experience is that the marginal effects from nonlinear structural models will be close to 2SLS (that’s how you can tell the structural model MFX were done correctly), and with 2SLS you might even get the standard errors right!

–JA

Published | Tagged | Leave a comment

MHE goes viral!

Recently seen at Logan airport checkpoint

Lucky she got through!

Lucky she got through

(and this is not one of the authors)

Published | | Leave a comment

OLS is between the effect on the treated and the effect on controls

We learn something new (and useful!) every day . . .

Macartan Humphreys of Columbia University has shown why regression estimates of treatment effects can often be expected to fall between the average effect on the treated and the average effect on controls.   His theorem goes like this:  Let D denote treatment, let p(X) denote the propensity score E[D|X], and let M(X) denote the covariate-specific treatment effects, E[Y1-Y0|X].   Suppose that M(X) varies in a monotone way with p(X) (either weakly increasing or weakly decreasing). Then OLS estimates of the treatment effect in model using saturated control for covariates (i.e., the sort of regression discussed in Section 3.3.1 of MHE) will lie between E[Y1 – Y0| D=1] and E[Y1-Y0| D=0].  Read all about it in his working paper.

Why is a treatment effect likely to be monotone in the propensity score?  This happens in the Angrist (1998) study of the effects of military service because those who benefit the most from military service are least likely to be qualified and therefore least likely to be treated.  In other cases, where self-selection is more important than qualifications (as in the Roy [1951] model), those most likely to benefit from treatment may be the most likely to get treated.  Either case is fine as long as it’s one or the other.

Why is this useful?  It’s one more reason why OLS is a good summary statistic for program impact.  Check out this figure from Macartan’s paper, which illustrates the OLS-is-in-between property using the Angrist (1998) data:

Figure 3 from Humphreys (2009)

The figure shows how OLS estimates of the effects of voluntary military service are almost always between matching estimates of effects on veterans and matching estimates of effects on non-veterans.  This happens because covariate-specific estimates of veteran effects are either unrelated to the propensity score or they are a weakly decreasing function of the propensity score.

Published | Tagged | Leave a comment