🗓️ Week 9
Regression Discontinuity Designs

Dr. George Melios

London School of Economics and Political Science

26 Nov 2024

Check-in

We’re thinking through ways that we can identify the effect of interest without having to control for everything
One way is by focusing on within variation - if all the endogeneity can be controlled for or only varies between-individuals, we can just focus on within variation to identify it
Con: washes out a lot of variation! Result can be noisier if there’s not much within-variation to work with
Also, this requires no endogenous variation over time
That might be a tricky assumption! Often there are plenty of back doors that shift over time

Regression Discontinuity

Today we are going to talk about Regression discontinuity design (RDD)
It doesn’t apply everywhere, but when it does, it’s very easy to buy the identification assumptions
Not that it doesn’t have its own issues, of course, but it’s pretty good!

Regression Discontinuity

The basic idea is this:

We look for a treatment that is assigned on the basis of being above/below a cutoff value of a continuous variable
For example, if you get above a certain test score they let you into a “gifted and talented” program
- Or if you are just on one side of a time zone line, your day starts one hour earlier/later
- Or if a candidate gets 50.1% of the vote they’re in, 40.9% and they’re out

We call these continuous variables “Running variables” because we run along them until we hit the cutoff

Regression Discontinuity

But wait, hold on, if treatment is driven by running variables, won’t we have a back door going through those very same running variables?? Yes!
And we can’t just control for RunningVar because that’s where all the variation in treatment comes from. Uh oh!

Regression Discontinuity

The key here is realizing that the running variable affects treatment only when you go across the cutoff
So really the diagram looks like this!

Regression Discontinuity

So what does this mean?
If we can control for the running variable everywhere except the cutoff, then…
We will be controlling for the running variable, closing that back door
But leaving variation at the cutoff open, allowing for variation in treatment
We focus on just the variation around the treatment, narrowing the range of the running variable we use so sharply that it’s basically controlled for. Then the effect of cutoff on treatment is like an experiment!

Regression Discontinuity

Basically, the idea is that right around the cutoff, treatment is randomly assigned
If you have a test score of 89.9 (not high enough for gifted-and-talented), you’re basically the same as someone who has a test score of 90.0 (just barely high enough)
But we get variation in treatment!
This specifically gives us the effect of treatment for people who are right around the cutoff a.k.a. a “local average treatment effect” (we still won’t know the effect of being put in gifted-and-talented for someone who gets a 30)

Regression Discontinuity

A very basic idea of this, before we even get to regression, is to create a binned chart
And see how the bin values jump at the cutoff
A binned chart chops the Y-axis up into bins
Then takes the average Y value within that bin. That’s it!
Then, we look at how those X bins relate to the Y binned values.
If it looks like a pretty normal, continuous relationship… then JUMPS UP at the cutoff X-axis value, that tells us that the treatment itself must be doing something!

Regression Discontinuity

Concept Checks

Why is it important that we look as norrowly as possible around the cutoff? What does this get us over comparing the entire treated and untreated groups?
Can you think of an example of a treatment that is assigned at least partially on a cutoff?
Why can’t we just control for the running variable as we normally would to solve the endogeneity problem?

Fitting Lines in RDD

Looking purely just at the cutoff and making no use of the space away from the cutoff throws out a lot of useful information
We know that the running variable is related to outcome, so we can probably improve our prediction of what the value on either side of the cutoff should be if we use data away from the cutoff to help with prediction than if we just use data near the cutoff, which is what that animation does
We can do this with good ol’ OLS.
The bin plot we did can help us pick a functional form for the slope

Fitting Lines in RDD

To be clear, producing the line(s) below is our goal. How can we do it?
The true model I’ve made is an RDD effect of .7, with a slope of 1 to the left of the cutoff and a slope of 1.5 to the right

Regression in RDD

First, we need to transform our data
We need a “Treated” variable that’s TRUE when treatment is applied - above or below the cutoff
Then, we are going to want a bunch of things to change at the cutoff. This will be easier if the running variable is centered around the cutoff. So we’ll turn our running variable \(X\) into \(X - cutoff\) and call that \(XCentered\)

Varying Slope

Typically, you will want to let the slope vary to either side
In effect, we are fitting an entirely different regression line on each side of the cutoff
We can do this by interacting both slope and intercept with \(treated\)!
Coefficient on Treated is how the intercept jumps - that’s our RDD effect. Coefficient on the interaction is how the slope changes

\[Y = \beta_0 + \beta_1Treated + \beta_2XCentered + \beta_3Treated\times XCentered + \varepsilon\]

Varying Slope

(as an aside, sometimes the effect of interest is the interaction term - the change in slope! This answers the question “does the effect of \(X\) on \(Y\) change at the cutoff? This is called a”regression kink” design. We won’t go more into it here, but it is out there!)

Polynomial Terms

We don’t need to stop at linear slopes!
Just like we brought in our knowledge of binary and interaction terms to understand the linear slope change, we can bring in polynomials too. Add a square maybe!
Don’t get too wild with cubes, quartics, etc. - polynomials tend to be at their “weirdest” near the edges, and we don’t want super-weird predictions right at the cutoff. It could give us a mistaken result!
A square term should be enough

Polynomial Terms

How do we do this? Interactions again. Take any regression equation… \[Y = \beta_0 + \beta_1X + \beta_2X^2 + \varepsilon\]
And just center the \(X\) (let’s call it \(XC\), add on a set of the same terms multiplied by \(Treated\) (don’t forget \(Treated\) by itself - that’s \(Treated\) times the interaction!)

\[Y = \beta_0 + \beta_1XC + \beta_2XC^2 + \beta_3Treated + \beta_4Treated\times XC + \beta_5Treated\times XC^2 + \varepsilon\]

The coefficient on \(Treated\) remains our “jump at the cutoff” - our RDD estimate!

                              feols(Y ~ X_cent..
Dependent Var.:                                Y
                                                
Constant                        -0.0340 (0.0385)
X_centered                      0.6990. (0.3641)
treatedTRUE                   0.7677*** (0.0577)
X_centered square               -0.5722 (0.7117)
X_centered x treatedTRUE         0.7509 (0.5359)
treatedTRUE x I(X_centered^2)     0.5319 (1.034)
_____________________________ __________________
S.E. type                                    IID
Observations                               1,000
R2                                       0.84779
Adj. R2                                  0.84702
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Concept Checks

Would the coefficient on \(Treated\) still be the regression discontinuity effect estimate if we hadn’t centered \(X\)? Why or why not?
Why might we want to use a polynomial term in our RDD model?
What relationship are we assuming between the outcome variable and the running variable if we choose not to include \(XCentered\) in our model at all (i.e. a “zero-order polynomial”)

Assumptions

We knew there must be some assumptions lurking around here
What are we assuming about the error term and endogeneity here?
Specifically, we are assuming that the only thing jumping at the cutoff is treatment
Sort of like parallel trends (see week 11), but maybe more believable since we’ve narrowed in so far
For example, if having an income below 150% of the poverty line gets you access to food stamps AND to job training, then we can’t really use that cutoff to get the effect of just food stamps
The only thing different about just above/just below should be treatment

Graphically

Other Difficulties

More assumptions, limitations, and diagnostics!

Windows
Granular running variables
Manipulated running variables
Fuzzy regression discontinuity

Windows

The basic idea of RDD is that we’re interested in the cutoff
The points away from the cutoff are only useful in helping us predict values at the cutoff
Do we really want that full range? Is someone’s test score of 30 really going to help us much in predicting \(Y\) at a test score of 89?
So we might limit our analysis within just a narrow window around the cutoff, just like that initial animation we saw!
This makes the exogenous-at-the-jump assumption more plausible, and lets us worry less about functional form (over a narrow range, not too much difference between a linear term and a square), but on the flip side reduces our sample size considerably

Windows

Pay attention to the sample sizes, accuracy (true value .7) and standard errors!

m1 <- feols(Y~treated*X_centered, data = df)
m2 <- feols(Y~treated*X_centered, data = df %>% filter(abs(X_centered) < .25))
m3 <- feols(Y~treated*X_centered, data = df %>% filter(abs(X_centered) < .1))
m4 <- feols(Y~treated*X_centered, data = df %>% filter(abs(X_centered) < .05))
m5 <- feols(Y~treated*X_centered, data = df %>% filter(abs(X_centered) < .01))
etable(m1,m2,m3,m4,m5, keep = 'treatedTRUE')

                                         m1                 m2
Dependent Var.:                           Y                  Y
                                                              
treatedTRUE              0.7467*** (0.0376) 0.7723*** (0.0566)
treatedTRUE x X_centered 0.4470*** (0.1296)   0.6671. (0.4022)
________________________ __________________ __________________
S.E. type                               IID                IID
Observations                          1,000                492
R2                                  0.84769            0.74687
Adj. R2                             0.84723            0.74531

                                         m3                 m4              m5
Dependent Var.:                           Y                  Y               Y
                                                                              
treatedTRUE              0.7086*** (0.0900) 0.6104*** (0.1467) 0.5585 (0.4269)
treatedTRUE x X_centered     -1.307 (1.482)      6.280 (4.789)   41.21 (72.21)
________________________ __________________ __________________ _______________
S.E. type                               IID                IID             IID
Observations                            206                 93              15
R2                                  0.69322            0.59825         0.48853
Adj. R2                             0.68867            0.58470         0.34904
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Granular Running Variable

One assumption we’re making is that the running variable varies more or less continuously
That makes it possible to have, say, a test score of 89 compared to a test score of 90 it’s almost certainly the same as except for random chance
But what if our data only had test score in big chunks? I don’t know you’re 89 or 90, I just know you’re “80-89” or “90-100”
A lot less believable that the only difference between these groups is random chance and we’ve closed the back doors by focusing on the cutoff
Plenty of other things change between 80 and 100! That’s not “smooth at the cutoff”

Granular Running Variable

Not a whole lot we can do about this
There are some fancy RDD estimators that allow for granular running variables
But in general, if this is what you’re facing, you might be in trouble
Before doing an RDD, think “is it plausible that someone with the highest value just below the cutoff, and someone with the lowest value just above the cutoff are only at different values because of random chance?”

Looking for Lumping

Ok, now let’s go back to our continuous running variables
What if the running variable is manipulated?
Imagine you’re a teacher grading the gifted-and-talented exam. You see someone with an 89 and think “aww, they’re so close! I’ll just give them an extra point…”
Or, if you live just barely on one side of a time zone line, but decide to move to the other side because you prefer waking up later
Suddenly, that treatment is a lot less randomly assigned around the cutoff!

Looking for Lumping

If there’s manipulation of the running variable around the cutoff, we can often see it in the presence of lumping
I.e. if there’s a big cluster of observations to one side of the cutoff and a seeming gap missing on the other side

Looking for Lumping

Here’s an example from the real world in medical research - statistically, p-values should be uniformly distributed
But it’s hard to get insignificant results published in some journals. So people might “p-hack” until they find some form of analysis that’s significant, and also we have heavy selection into publication based on \(p < .05\). Can’t use that cutoff for an RDD!

p-value graph from Perneger & Combescure, 2017

Looking for Lumping

How can we look for this stuff?
We can look graphically by just checking for a jump at the cutoff in number of observations after binning

df_bin_count <- df %>%
  # Select breaks so that one of hte breakpoints is the cutoff
  mutate(X_bins = cut(X, breaks = 0:10/10)) %>%
  group_by(X_bins) %>%
  count()

Looking for Lumping

The first one looks pretty good. We have one that looks not-so-good on the right

Looking for Lumping

Another thing we can do is do a “placebo test”
Check if variables other than treatment or outcome vary at the cutoff
We can do this by re-running our RDD but just swapping out some other variable for our outcome
If we get a significant jump, that’s bad! That tells us that other things are changing at the cutoff which implies some sort of manipulation (or just super lousy luck)

Regression Discontinuity in R

We can specify an RDD model by just telling it the dependent variable \(Y\), the running variable \(X\), and the cutoff \(c\).
We can also specify how many polynomials to us with p
(it applies the polynomials more locally than our linear OLS models do - a bit more flexible without weird corner preditions)
It will also pick a window for us with h
Plenty of other options
Including a fuzzy option to specify actual treatment outside of the running variable/cutoff combo

rdrobust

We’ve gone through all kinds of procedures for doing RDD in R already using regression
But often, professional researchers won’t do it that way!
We’ll use packages and formulas that do things like “picking a bandwidth (window)” for us in a smart way, or not relying so strongly on linearity
The rdrobust package does just that!

rdrobust

summary(rdrobust(df$Y, df$X, c = .5))

Sharp RD estimates using local polynomial regression.

Number of Obs.                 1000
BW type                       mserd
Kernel                   Triangular
VCE method                       NN

Number of Obs.                  501          499
Eff. Number of Obs.             185          170
Order est. (p)                    1            1
Order bias  (q)                   2            2
BW est. (h)                   0.174        0.174
BW bias (b)                   0.293        0.293
rho (h/b)                     0.594        0.594
Unique Obs.                     501          499

=============================================================================
        Method     Coef. Std. Err.         z     P>|z|      [ 95% C.I. ]       
=============================================================================
  Conventional     0.707     0.085     8.311     0.000     [0.540 , 0.874]     
        Robust         -         -     6.762     0.000     [0.484 , 0.878]     
=============================================================================

rdrobust

summary(rdrobust(df$Y, df$X, c = .5, fuzzy = df$treatment))

Sharp RD estimates using local polynomial regression.

Number of Obs.                 1000
BW type                       mserd
Kernel                   Triangular
VCE method                       NN

Number of Obs.                  501          499
Eff. Number of Obs.             185          170
Order est. (p)                    1            1
Order bias  (q)                   2            2
BW est. (h)                   0.174        0.174
BW bias (b)                   0.293        0.293
rho (h/b)                     0.594        0.594
Unique Obs.                     501          499

=============================================================================
        Method     Coef. Std. Err.         z     P>|z|      [ 95% C.I. ]       
=============================================================================
  Conventional     0.707     0.085     8.311     0.000     [0.540 , 0.874]     
        Robust         -         -     6.762     0.000     [0.484 , 0.878]     
=============================================================================

rdrobust

We can even have it automatically make plots of our RDD! Same syntax

rdplot(df$Y, df$X, c = .5)

That’s it!

That’s what we have for RDD
Go explore the regression discontinuity Seminar
And the paper to read!

🗓️ Week 9 Regression Discontinuity Designs

Check-in

Regression Discontinuity

Regression Discontinuity

Regression Discontinuity

Regression Discontinuity

Regression Discontinuity

Regression Discontinuity

Regression Discontinuity

Regression Discontinuity

Concept Checks

Fitting Lines in RDD

Fitting Lines in RDD

Regression in RDD

Varying Slope

Varying Slope

Polynomial Terms

Polynomial Terms

Concept Checks

Assumptions

Graphically

Other Difficulties

Windows

Windows

Granular Running Variable

Granular Running Variable

Looking for Lumping

Looking for Lumping

Looking for Lumping

Looking for Lumping

Looking for Lumping

Looking for Lumping

Regression Discontinuity in R

rdrobust

rdrobust

rdrobust

rdrobust

That’s it!

🗓️ Week 9
Regression Discontinuity Designs