Applied Econometrics

Fall
2025
Methods
Professor: Brian Cadena
Published

January 14, 2025

Week 1

  • Period (.) is an NA value.

  • mean of a dummy variable does tell us something.

  • Words in blue is actually just a labeling scheme to another value. This makes it easier to read but still keeping same value.

  • Why do you log? Heteroskedasticity and interpretation.

Regression Review

  • Regression as an Estimate of the CEF

    • Conditional Expectation Function E[y|x]

    • Regression is a way of calculating means for different types of observations with different values of x.

  • We are trying to come up with a CEF that fits a cloud of points.

  • Regression: lets pick a line that minimizes the sum of squared residuals.

  • don’t forget the “on average” part when interpreting.

  • the intercept is a specific Conditional expectation value.

  • Constant is the expected value when all values of X are zero. This is why we don’t really care about it because it is substantively unimportant.

  • We need to contextualize the size of our coefficient.

  • important to rescale the x variable.

  • coefficient / standard error = t stat.

    • know what null hypothesis is!

    • in the example: \[t = \frac{\hat{\beta}-0}{se}\]

      • 0 is our null hypothesis.

      • let’s say the null wasn’t zero

        • see slides and do file.

        • also look at the confidence interval

  • t- stat and p-value are saying the same thing.

  • Start with the null hypothesis being true. If the Null is true then the probability of observing the value we observed is the p-value.

    • if its null is zero, is it strange to get the coefficient i got?
  • The null hypothesis in a constant is zero. The line goes through the origin is the null basically.

  • iid!

  • homoscedasticity is an important assumption!

    • its basically never true.
  • standard error is square root of the VCV matrix.

  • What are robust standard errors

    • robust to heteroskedasticity

      • robust means resistant/strong.

      • these are standard errors that are robust even if if our errors have heteroskedasitcity. The standard errors are strong enough.

        • wont change our coefficients.
        • usually when we robust, our SE will get bigger - more conservative in our hypothesis testing.
  • What goes in the omega matrix

    • residuals squared on the main diagonal and zeros on the off diagonal.
  • We pretty much always use robust standard errors.

    • reg wage points, robust

      • this is the code to have robust standard errors.

      • wont change our betas but will influence our standard errors and t-stats and p-values.

  • The upper and lower bounds of our CI will tell us the values that have exactly a p-value of .05.

  • Don’t just look at stars.

  • lincom - take the recent regression, linear combination.

  • Regression output - see do file. for latex just do “tex” for latex output.

Functional Form Changes

  • suppose we want to log our y

    • gen logwage = log(wage)

    • reg logwage points, robust

  • the log salary increase is .09.

    • proprotional change in y when x goes up by one unit.

      • so it is percent.

        • one unit change in x = a 9% change in y
  • Log of constant is pretty lame and irrelevant.

  • Why log y or x?

    • it doesnt make sense to log variables that are already proportions.

    • Goal is to approximate CEF with linear form.

      • the reason people log is because we are trying to approximate the CEF with a linear form!

        • “I think the real world works in a proportional sense”

          • does one of those sentences make more sense than the other (log vs non log results)
        • reasonable shared effect of x on y.

          • counties grew on average by 15k OR counties grew on average of 2%

Polynomials of x

  • Why square the x?

  • providing flexibility - don’t think of x^2 as a control

  • \(y = \beta_0 + \beta_1x+\beta_2x^2\)

    • \(\frac{dy}{dx} = \beta_1+2\beta_2x\)
  • Null hypothesis in class example for quadratic is the line straight?

Categorical variables

  • Cardinal - the numerical value has a meaning

    • treating an x as continuous imposes cardinality. if you cannot justify that assumption, treat as if unordered.

      • it doesn’t make sense race increases by one unit. Thus, you should always treat as unordered and run as a factor.
  • ordinal -

  • unordered -

F stat - is joint slopes are all equal to 0.

  • Dummy variable trap = include all categories, you get perfect multicollinearity.

    • non-invertable matrix is what is happening under the hood.
  • What happens if you leave out too many categories?

    • lets say we leave out two cats

      • constant is just average of cat1 or cat 2.

      • interpretation just sucks.

Interactions

TK MAKE SURE YOU INCLUDE TABLE OUTPUT!

  • Is the relationship between y and x, the same for 2 different groups/subsets/types (or more).

  • the coefficient in female is just the difference compared to the coefficient. (for in class example)

    • yeared coefficient is the slope for men.
  • the reason we run an interaction is not because we didnt know anyhting among hte coefficients but it is because we are interested in everything after. We know they are different slopes but are they statistically different?

  • take derivative with respect to female.

    • 0 + beta1 + 0 + beta2 yearsed

    • derivatives are your friend in interactions.

Control Variables

  • To some extent we actually keep doing bivariate regressions but in a weird way.

  • saying hold this one variable constant.

  • when you say all else equal it insinuates a causal claim.

  • need to standardize coefficients for interpretation.

  • scale of x variable determines the scale of your coefficient.

  • Frisch Waugh Lovel FWL Theorem

    • aftq on loginc

    • aftq on years of ed

    • look closely at multiple regression stata do file.

    • we are basically running extra regressions.

      • its not magic - using regressions to purge influence of X2 on X1 and Y.

Week 4

  • Problem set 3 answers posted.

  • Problem set 4 is long.

  • internal validity: will this research design give me an unbiased estimate of the parameter that I am claiming ot estimate

    • RCT - want to estimate ATE. - will my design actually give me an unbiased estimate of the ATE.
  • external validity: does this parameter generalize?

  • why include control variables in a RCT regression.

  • ATE - best case scenario:

    • Required Assumption to identify parameter: Randomization done correctly

    • Fights with reviewers: External Validity, Sample Size (precision)

  • Estimand - the target - what we are trying to learn

  • estimator - the thing we actually do with out data. We use this to estimate our estimand.

Regression and Causality

  • Imagine you didn’t run an experiment - or there is no obvious natural experiment.

    • we just have some cross sectional data.

      • i am interested in this variable but it wasn’t randomly assigned.

        • but i have a lot of other variables I can use to control for.

          • when do i actually have a causal model?
  • Can we ever get causality from a regression?

    • when the conditional independence assumption holds.

      • what is that?

        • actual treatment status is independent of the outcomes.

          • if you could take your data and find observations that have the same values on everything but have variation on some treated/untreated variable - what is the effect?

          • basically you are finding the counterfactual.

            • not perfect - example - look at Prince Charles and Ozzy Osbourne.
  • \(Y_{si}=f_i(s)\)

    • notation for all possible potential outcomes
  • CIA suggests that we can estimate many causal effects, but need more structure for regression motivation.

  • no i on coefficient on CIA to regression slide.

  • This is basically all just omitted variable bias

  • Question: if a full regression model has a causal interp. and I leave out a variable, what goes wrong?

    • It is a bias estimate of the truth!!!!!
  • Long regression vs short regression:

    • what happens when you have OVB?

      • See slides with the big formuals
  • OVB it moves your bias on a number line - It has a direction! it is either positive or negative!

Week 5

  • Does FWL give me variation that is as good as an experiment?

  • it is fine to leave out variables that don t have an effect on the outcome.

  • also fine to leave out variables that are uncorrelated with x_1 even if they have an effect on outcome.

    • OVB is a product of these two. If one is zero then you are safe.
  • You can leave out mediators.

    • think school years -> income

      • you can leave out writing ability because it is a mediator.
  • I have a causal research question - wish i had a better research design but i don’t - thus this is the best I can do.

  • Propensity Score Matching

Propensity Score Matching

  • Matching helps with figuring out an estimand

  • Balance is very important because it can bias our estimand.

    • matching essentially allows you to rebalance your data
  • we are not in RCT land.

  • we are going to need to match on more than just one variable

  • exact matching vs. propensity score in matching.

  • why use matching?

    • no linear form assumption

    • OLS uses everyone. Matching only uses similar untreated obs.

  • matching does not fix OVB.

  • tough part: how do we assign the weight on the untreated outcome.

  • OLS is not quite the same. What OLS is doing residuals on residuals.

  • Propensity score : the probability of being treated conditional on some Xs.

    • we are going to collapse all of our Xs into one number 0-1 (probability)

    • Rosenbaum and Rubin (1983)

  • prediction problem - what is the probability of being treated conditional on all the Xs.

  • Compare treated vs. untreated obs that have very similar propensity score matching.

  • need to show common support ALWAYS.

  • Nearest Neighbor Matching: find an obs that has the closest propensity score and use their untreated outcome.

Panel Data

  • we’ve only been thinking about causality with cross sectional data.

  • but now we have data over time.

  • need to know long vs wide

  • Long data = multiple dates per cross-sectional data

    • multiple rows for each observation across time
  • Wide data = one observation - variables items are over time

  • stata panel commands want your data LONG.

  • reshape switches data from long to wide or vice versa.

Diff-in-Diff

When to Use It:

  • RCT is not an option

  • natural experiment = quasi-random event changes our ability to see treated vs. untreated outcomes.

    • Good news: you don’t have to randomize anything

    • Bad news: frequently randomization is less than ideal

      • usually treatment given to whole population

      • no obvious control group - you need to choose a comparison group.

  • Diff-in-diff is just basically you wanting to compare two groups.

  • basically it is 4 sub-group means.

    • we can see

      • g = group

      • t = period of being treated

  • Equal trends assumption is very important.

    • fundamentally an assumption.
  • Time is additive and separable

  • this hinges on who you select as the counterfactual.

  • you need to go to bat for your assumption.

  • we are going to impose the equal counterfactual assumption

  • KEY ASSUMPTION: Change over time for comparison group estimates counterfactual, i.e. what would have happened to treated group w/o treatment.

Beyond 2x2 DiD - Single Treatment Time

  • TWFE - group and time fixed effect

  • the goal is to convince the reader that you have equal counterfactual trends

Placebo Analysis

  • key idea: run diff-in-diff using an alternative outcome that should be unaffected. Hope for null results

  • this is basically a robustness check.

  • you do this after a diff-in-diff.

  • there is still linger concerns over the counterfactual so let me do this.

  • Rhetorically, it comes after the main results

Triple Difference Setup

  • key idea: run diff-in-diff on two different subgroups of observations

WHEN WE USE DIFF IN DIFF AS AN EMPIRICAL STRATEGY

  • we are making an assumption about a counterfactual outcome

  • we cannot prove the counterfactual assumption

  • we can make it more believable

  • DID depends on an assumption about a counterfactual; there is nothing you can do to formally test whether it’s true

  • Don’t say you have tested your fundamental assumption and passed.

Synthetic Diff-in-Diff

  • Setup: Natural Experiment with a single treated group.

  • take obs from donor pool -> created weighted averge of donor pool then use as counterfactual.

  • I have real california then i need to make fake california - which is a weighted average from other states.

  • You use this instead of a diff-in-diff if there is no obvious counteractual case.

  • sythentic diff-in-diff is going to have really good equal trends but its going to be still a messy - you are still going to have defend that your synthetic control and its donor pool is going to be needed to defended because if you are just looking at lets say a treatment in california, your donor pool and sysntehtic control may get based off people from vermont and wyoming and etc. Which youll need to defend.

  • Algorithm picks it

  • you choose what to match on and what the donor pool is so there is still some research control.

  • Abadie, A., Diamond, A. and Hainmueller, J., 2010. Synthetic Control MEthods fo Comparative Case Studies: Estimating the effect of Californias tobacco control program. JASA

  • This method is good to get a point estimate of the treatment.

Staggered Adoption Methods

  • Very active recently

  • different groups and different time periods AND treatments are turning off and on

Fixed Effects and Panel Data

  • Fixed effects as a research methodology

  • Fixed effects isolates variation that we think identifies a causal relation

  • Grouped Data - usually same cross-sectional units in multiple time periods

  • “it” subscripts - an individual in a year.

  • unions operate in an area that is not perfectly competitive.

    • if the firm is at the zero profit condition -> then we dont expect unions -> because a union would ask for more money and the union would say no -> if they did, the firm would shutdown cause they wouldn’t be able to afford it.
  • We believe that \(A_{it}\) are the only thing standing between us and causality.

  • Required assumptions:

    • Normally, we can’t do anything when we can only apply the CIA to a conditional expectation that includes unobservables in the conditioning set.

      • we’ll assume the problem away.

        • Expectation is linear and additive in all terms

        • \(A_{it} = A_i\) (Time-constant unobservables)

        • constant treatment effects.

  • Controlling for who the person is!

    • it is a categorical variable for basically “who is this”.

    • partialling out the effect of WHO these people are.

    • its going to change our comparison

      • we are going compare people WITHIN.
  • we are just treating the person as a categorical variable.

  • shifting comparison from across to WITHIN!

  • for the union example

    • looking at variation over time in union status (hopefully you have variation for that)

      • so then you can say when that person is in a union job, they earn more versus when they arent.

      • you are losing a lot of data because you need a lot of within variation.

      • fixed effects increases standard errors.

        • need to talk to brian about SE and FE and how they relate - bring in clustered SE discussion.
  • If FE is run, should we ever believe their estimated effect?

    • FE Doesn’t give us a causal estimate naturally. It is very hard.
  • time-invariant is important. FE can only account for TIME INVARIANT!

    • need to make sure that OVB is time invariant.
  • FE IS SWITCHING FROM ACROSS TO WITHIN.

  • Why do i even have variation left?

    • thought process - what out there in the world is leading to me having within variation.
  • what kind of bias do we get with classical measurement error

    • measurement error in y goes into the error

    • measurement error in x is attenuation bias - bias towards zero!

      • x with noise!

        • effects our beta
  • FE helps with OVB BUT we might have measurement error in within variation that is not in cross-sectional.

    • switch to variation that has a lot more noise with the within estimator leading to an attenuation.

Instrumental Variables

  • Do problem set 7 and 8 for the midterm

  • IV helps solve OVB

  • If you can find an IV you can get the causal effect.

  • An IV must

    • be related to x

    • AND exclusion restriction

      • only reason you have a relationship between z and y is because of z affecting x.

      • Z CANNOT affect y DIRECTLY

      -   it does not say z and y are unrelated.
      
          -   THEY ARE RELATED BUT ONLY THROUGH OUR X VARIABLE!
      
          -   z needs to be unrelated to the error term. Which are these observable
  • 2 stage least squares IS instrumental variable.

  • The reason we do an IV is because we think this variable on Y is more interesting than the THIS STORM on Y.

    • Structural equation is more interesting than the reduced form equation (See slides) THEN RUN AN IV!
  • reduced form divided by the first stage coefficient

  • IV convincing when something actually doing randomizaiton.

  • INSTURMENT HAS TO GO THROUGH ENTIRELY THROUGH S

  • When reading/writing an IV paper

    • start by defending the CIA for the reduced form - NECESSARY!

      • reduced form regression is the causal effect of the instrument on the outcome!

        • instrument is as good as randomly assigned
    • Figure 1 and Figure 2 - first stage and reduced form, same z on the x-axis

    • show the change from OLS - remember whole point is that OLS biased

    • Defend the exclusion restriction - False experiments with unaffected groups/time periods

    • specific suggestions for Bartik/examiner designs in papers on reading list.

  • Late over ate is an external validity question

    • can i take this causal estimate and generalize it beyond the context of this study.

First differences

  • measurement error in x leads to attenuation bias.

  • fixed effects is a linearity functional form. you assume it.

  • Subtract row from above it for each individual.

  • Which one to choose? Think about the DGP

  • FE compares all the time periods when someone is treated vs someone not treated.

    • FD when a man goes to married from unmarried, is his income bigger?

      • Do workers get bigger raises (From marriages) in the year the treatment occurs.
  • FD is kinda like regression discontinuity - effect of treatment is immediate

  • FD is good if effect of treatment is immediate.

  • regression is a tool for making comparisons

    • it is a tool to compare observations to other observations

      • what comparison are we going to make

        • what regression is ‘right’ dependends on what comparison is ‘right’

Standard Errors

Standard deviation of our estimated parameter

  • standard error is an estimation of the standard deviation of our \(\beta\)

  • When we talk about a SE - it is our estimate of the standard deviation of the distribution of that \(\hat{\beta}\)

  • denominator in our t-stat

  • determines how wide our confidence interval is

  • THESE ARE ESTIMATED!

  • We are worried over rejecting a null.

  • how variable our beta hat is.

  • ROBUST SE IS JSUT TAKING THE MAIN DIAGONAL OF THE OMEGA MATRIX AND PUTTING THAT OMEGA INTO OUR SE. OFF DIAGONAL REMAINS 0

    • INSTEAD OF AVERAGE, OBSERVATION SPECIFIC EPSILON HAT SQUARES ON DIAGONAL

    • OMEGA MATRIX IS STILL IN OLS SE BUT WE ASSUME ITS THE IDENTITY MATRIX

  • Cluster SE is blocking the omega matrix along the main diagonal.

  • The cdf is the integral of our pdf

  • robust standard error literally calculates the main diagonal but then imposes zero on the off diagonals.

  • Four potential fixes

    • parametric fixes - model the correlation

    • clustering - use white-like var-cov matrix, but allow for non-zero off-diagonal elements

      • literally block along the main diagonal and some off diagonal. Errors are correlated with each other in each block.
    • run group-level regressions

    • block bootstrap

The Moulton Problem

Fixed effects vs. Random Effects

  • Random effects is the solution to

    • unobserved things about the individual effect value of y and individual specific intercept

      • uncorrelated with the treatment

      • random effects does not influence

        • not including individual demeaning

        • not worried about omitted variable bias - we are just worried we dont have independent observations.

  • Random effects is really just about autocorrelation. Fixed effects is to solve OVB.

  • Random effects is an alternative to clustering

Citation

BibTeX citation:
@online{neilon2025,
  author = {Neilon, Stone},
  title = {Applied {Econometrics}},
  date = {2025-01-14},
  url = {https://stoneneilon.github.io/notes/American_Behavior/},
  langid = {en}
}
For attribution, please cite this work as:
Neilon, Stone. 2025. “Applied Econometrics.” January 14, 2025. https://stoneneilon.github.io/notes/American_Behavior/.