🗓️ Week 7
Interactions & beyond ols

PB4A7- Quantitative Applications for Behavioural Science

Dr. George Melios

London School of Economics and Political Science

13 Nov 2024

Interactions

For both polynomials and logarithms, the effect of a one-unit change in \(X\) differs depending on its current value (for logarithms, a 1-unit change in \(X\) is different percentage changes in \(X\) depending on current value)
But why stop there? Maybe the effect of \(X\) differs depending on the current value of other variables! - Enter interaction terms!

\[ Y = \beta_0 + \beta_1X + \beta_2Z + \beta_3X\times Z + \varepsilon \]

Interaction terms are a little tough but also extremely important.

Interactions

Expect to come back to these slides, as you’re almost certainly going to use interaction terms in both our assessment and the dissertation

Interactions

Change in the value of a control can shift a regression line up and down
Using the model \(Y = \beta_0 + \beta_1X + \beta_2Z\), estimated as \(Y = .01 + 1.2X + .95Z\):

Interactions

But an interaction can both shift the line up and down AND change its slope
Using the model \(Y = \beta_0 + \beta_1X + \beta_2Z + \beta_3X\times Z\), estimated as \(Y = .035 + 1.14X + .94Z + 1.02X\times Z\):

Interactions

How can we interpret an interaction?
The idea is that the interaction shows how the effect of one variable changes as the value of the other changes
The derivative helps!

\[ Y = \beta_0 + \beta_1X + \beta_2Z + \beta_3X\times Z \] \[ \partial Y/\partial X = \beta_1 + \beta_3 Z \]

The effect of \(X\) is \(\beta_1\) when \(Z = 0\), or \(\beta_1 + \beta_3\) when \(Z = 1\), or \(\beta_1 + 3\beta_3\) if \(Z = 3\)!

Interactions

Often we are doing interactions with binary variables to see how an effect differs across groups
Now, instead of the intercept giving the baseline and the binary coefficient giving the difference, the coefficient on \(X\) is the baseline effect of \(X\) and the interaction is the difference in the effect of \(X\)
The interaction coefficient becomes “the difference in the effect of \(X\) between the \(Z\) =”No” and \(Z\) = “Yes” groups”
(What if it’s continuous? Mathematically the same but the thinking changes - the interaction term is the difference in the effect of \(X\) you get when increasing \(Z\) by one unit)

Notes on Interactions

Like with polynomials, the coefficients on their own now have little meaning and must be evaluated alongside each other. \(\beta_1\) by itself is just “the effect of \(X\) when \(Z = 0\)”, not “the effect of \(X\)”
Yes, you do almost always want to include both variables in un-interacted form and interacted form. Otherwise the interpretation gets very thorny

Notes on Interactions

Interaction effects are poorly powered. You need a lot of data to be able to tell whether an effect is different in two groups. If \(N\) observations is adequate power to see if the effect itself is different from zero, you need a sample of roughly \(16\times N\) to see if the difference in effects is nonzero. Sixteen times!!
It’s tempting to try interacting your effect with everything to see if it’s bigger/smaller/nonzero in some groups, but because it’s poorly powered, this is a bad idea! You’ll get a lot of false positives

OLS and the Dependent Variable

A typical OLS equation looks like:

\[ Y = \beta_0 + \beta_1X + \varepsilon \]

and assumes that the error term, \(\varepsilon\), is normal.

The normal distribution is continuous and smooth and has infinite range
And the linear form stretches off to infinity in either direction as \(X\) gets small or big
Both of these imply that the dependent variable, \(Y\), is continuous and can take any value (why is that?)!
If that’s not true, then our model will be misspecified in some way

Non-Continuous Dependent Variables

When might dependent variables not be continuous and have infinite range?

Years working at current job (can’t be negative)
Are you self-employed? (Binary)
Number of children (must be a round number, can’t be negative)
Which brand of soda did you buy? (categorical)
Did you recover from your disease? (binary)
How satisfied are you with your purchase on a 1-5 scale? (must be a round number from 1 to 5, and the difference between 1 and 2 isn’t necessarily the same as the difference between 2 and 3)

Binary Dependent Variables

In many cases, such as variables that must be round numbers, or can’t be negative, even though there are ways of properly handling these issues, people will usually ignore the problem and just use OLS, as long as the data is continuous-ish (i.e. doesn’t have a LOT of observations right at 0 next to the impossible negative values, or has lots of different values so the round number smooth out)
However, the problems of using OLS are a bit worse for binary data, and so they’re the most common case in which we do something special to account for it
Binary dependent variables are also really common! We’re often interested in whether a certain outcome happened or didn’t (if we want to know if a drug was effective, we are likely asking if you are cured or not!)

So, how can we deal with having a binary dependent variable, and why do they give OLS such problems?

The Linear Probability Model

First off, let’s ignore the completely unexplained warnings I’ve just given you and do it with OLS anyway, and see what happens
Running OLS with a binary dependent variable is called the “linear probability model” or LPM

\[ D = \beta_0 + \beta_1X + \varepsilon \]

Throughout these slides, let’s use \(D\) to refer to a binary variable

The Linear Probability Model

In terms of how we do it, the interpretation is the exact same as regular OLS, so you can bring in all your intuition
The only difference is that our interpretation of the dependent variable is now in probability terms
If \(\hat{\beta}_1 = .03\), that means that a one-unit increase in \(X\) is associated with a three percentage point increase in the probability that \(D = 1\)
(percentage points! Not percentage - an increase from .1 to .13, say, not .1 to .103)

The Linear Probability Model

So what’s the problem?

The linear probability model can lead to…

Terrible predictions
Incorrect slopes that don’t acknowledge the boundaries of the data

Terrible Predictions

OLS fits a straight line. So if you increase or decrease \(X\) enough, eventually you’ll predict that the probability of \(D = 1\) is bigger than 1, or lower than 0. Impossible!
We can address part of this by just not trying to predict outside the range of the data, but if \(X\) has a lot of variation in it, we might get those impossible predictions even for values in our data. And what do we do with that?
(Also, because errors tend to be small for certain ranges of \(X\) and large for others, we have to use heteroskedasticity-robust standard errors)

Terrible Predictins

Incorrect Slopes

Also, OLS requires that the slopes be constant
(Not necessarily if you use a polynomial or logarithm, but the following critique still applies)
This is not what we want for binary data!
As the prediction gets really close to 0 or 1, the slope should flatten out to nothing
If we predict there’s a .50 chance of \(D = 1\), a one-unit increase in \(X\) with \(\hat{\beta}_1 = .03\) would increase that to .53
If we predict there’s a .99 chance of \(D = 1\), a one-unit increase in \(X\) with \(\hat{\beta}_1 = .03\) would increase that to 1.02…
Uh oh! The slope should be flatter near the edges. We need the slope to vary along the range of \(X\)

Incorrect Slopes

We can see how much the OLS slopes are overstating changes in \(D\) as \(X\) changes near the edges by comparing an OLS fit to just regular ol’ local means, with no shape imposed at all
We’re not forcing the red line to flatten out - it’s doing that naturally as the mean can’t possibly go any lower! OLS barrels on through though

Linear Probability Model

So what can we make of the LPM?

Bad if we want to make predictions
Bad at estimating slope if we’re looking near the edges of 0 and 1
(which means it’s especially bad if the average of \(D\) is near 0 or 1)

When might we use it anyway?

It behaves better in small samples than methods estimated by maximum likelihood (which many other methods are)
If we only care about slopes far away from the boundaries
If alternate methods (like we’re about to go into) put too many other statistical demands on the data (OLS is very “easy” from a computational standpoint)
If we’re using lots of fixed effects (OLS deals with these far more easily than nonlinear methods)
If our right-hand side is just binary variables (if X has limited range it might not predict out of 0-1!)

Generalized Linear Models

So LPM has problems. What can we do instead?
Let’s introduce the concept of the Generalized Linear Model

Here’s an OLS equation:

\[ Y = \beta_0 + \beta_1X + \varepsilon \]

Here’s a GLM equation:

\[ E(Y | X) = F(\beta_0 + \beta_1X) \]

Where \(F()\) is some function.

Generalized Linear Models

\[ E(D | X) = F(\beta_0 + \beta_1X) \]

We can call the \(\beta_0 + \beta_1X\) part, which is the same as in OLS, the index function. It’s a linear function of our variable \(X\) (plus whatever other controls we have in there), same as before
But to get our prediction of what \(Y\) will be conditional on what \(X\) is ( \(D|X\) ), we do one additional step of running it through a function \(F()\) first. We call this function a link function since it links the index function to the outcome
If \(F(z) = z\), then we’re basically back to OLS
But if \(F()\) is nonlinear, then we can account for all sorts of nonlinear dependent variables!

So in other words, our prediction of \(D\) is still based on the linear index, but we run it through some nonlinear function first to get our nonlinear output!

Generalized Linear Models

We can also think of this in terms of the latent variable interpretation

\[ D^* = \beta_0 + \beta_1X \]

Where \(D^*\) is an unseen “latent” variable that can take any value, just like a regular OLS dependent variable (and roughly the same in concept as our index function)

And we convert that latent variable to a proabability using some function

\[ E(D | X) = F(D^*) \]

and perhaps saying something like “if we estimate \(Y^*\) is above the number \(c\), then we predict \(D = 1\)”

Probit and Logit

Let’s go back to our index-and-function interpretation. What function should we use?
(many many different options depending on your dependent variable - poisson for count data, log link for nonnegative skewed values, multinomial logit for categorical data…)
For binary dependent variables the two most common link functions are the probit and logistic links. We often call a regression with a logistic link a “logit regression”

\[ Probit(index) = \Phi(index) \]

where \(\Phi()\) is the standard normal cumulative distribution function (i.e. the probability that a random standard normal value is less than or equal to \(index\) )

\[ Logistic(index) = \frac{e^{index}}{1+e^{index}} \]

For most purposes it doesn’t matter whether you use probit or logit, but logit is getting much more popular recently (due to its common use in data science - it’s computationally easier) so we’ll focus on that, and just know that pretty much all of this is the same with probit

Logit

Notice that we can’t possibly predict a value outside of 0 or 1, no matter how wild \(X\) and our index get
As \(index\) goes to \(-\infty\),

\[ Logistic(index) \rightarrow \frac{0}{1+0} = 0 \]

And as \(index\) goes to \(\infty\),

\[ Logistic(index) \rightarrow \frac{\infty}{1+\infty } = 1 \]

Logit

Also notice that, like the local means did, its slope flattens out near the edges

🗓️ Week 7 Interactions & beyond ols

Interactions

Interactions

Interactions

Interactions

Interactions

Interactions

Notes on Interactions

Notes on Interactions

OLS and the Dependent Variable

Non-Continuous Dependent Variables

Binary Dependent Variables

The Linear Probability Model

The Linear Probability Model

The Linear Probability Model

Terrible Predictions

Terrible Predictins

Incorrect Slopes

Incorrect Slopes

Linear Probability Model

Generalized Linear Models

Generalized Linear Models

Generalized Linear Models

Probit and Logit

Logit

Logit

🗓️ Week 7
Interactions & beyond ols