Applied Econometrics
Week 1
Period (.) is an NA value.
mean of a dummy variable does tell us something.
Words in blue is actually just a labeling scheme to another value. This makes it easier to read but still keeping same value.
Why do you log? Heteroskedasticity and interpretation.
Regression Review
Regression as an Estimate of the CEF
Conditional Expectation Function E[y|x]
Regression is a way of calculating means for different types of observations with different values of x.
We are trying to come up with a CEF that fits a cloud of points.
Regression: lets pick a line that minimizes the sum of squared residuals.
don’t forget the “on average” part when interpreting.
the intercept is a specific Conditional expectation value.
Constant is the expected value when all values of X are zero. This is why we don’t really care about it because it is substantively unimportant.
We need to contextualize the size of our coefficient.
important to rescale the x variable.
coefficient / standard error = t stat.
know what null hypothesis is!
in the example: \[t = \frac{\hat{\beta}-0}{se}\]
0 is our null hypothesis.
let’s say the null wasn’t zero
see slides and do file.
also look at the confidence interval
t- stat and p-value are saying the same thing.
Start with the null hypothesis being true. If the Null is true then the probability of observing the value we observed is the p-value.
- if its null is zero, is it strange to get the coefficient i got?
The null hypothesis in a constant is zero. The line goes through the origin is the null basically.
iid!
homoscedasticity is an important assumption!
- its basically never true.
standard error is square root of the VCV matrix.
What are robust standard errors
robust to heteroskedasticity
robust means resistant/strong.
these are standard errors that are robust even if if our errors have heteroskedasitcity. The standard errors are strong enough.
- wont change our coefficients.
- usually when we robust, our SE will get bigger - more conservative in our hypothesis testing.
What goes in the omega matrix
- residuals squared on the main diagonal and zeros on the off diagonal.
We pretty much always use robust standard errors.
reg wage points, robust
this is the code to have robust standard errors.
wont change our betas but will influence our standard errors and t-stats and p-values.
The upper and lower bounds of our CI will tell us the values that have exactly a p-value of .05.
Don’t just look at stars.
lincom - take the recent regression, linear combination.
Regression output - see do file. for latex just do “tex” for latex output.
Functional Form Changes
suppose we want to log our y
gen logwage = log(wage)
reg logwage points, robust
the log salary increase is .09.
proprotional change in y when x goes up by one unit.
so it is percent.
- one unit change in x = a 9% change in y
Log of constant is pretty lame and irrelevant.
Why log y or x?
it doesnt make sense to log variables that are already proportions.
Goal is to approximate CEF with linear form.
the reason people log is because we are trying to approximate the CEF with a linear form!
“I think the real world works in a proportional sense”
- does one of those sentences make more sense than the other (log vs non log results)
reasonable shared effect of x on y.
- counties grew on average by 15k OR counties grew on average of 2%
Polynomials of x
Why square the x?
providing flexibility - don’t think of x^2 as a control
\(y = \beta_0 + \beta_1x+\beta_2x^2\)
- \(\frac{dy}{dx} = \beta_1+2\beta_2x\)
Null hypothesis in class example for quadratic is the line straight?
Categorical variables
Cardinal - the numerical value has a meaning
treating an x as continuous imposes cardinality. if you cannot justify that assumption, treat as if unordered.
- it doesn’t make sense race increases by one unit. Thus, you should always treat as unordered and run as a factor.
ordinal -
unordered -
F stat - is joint slopes are all equal to 0.
Dummy variable trap = include all categories, you get perfect multicollinearity.
- non-invertable matrix is what is happening under the hood.
What happens if you leave out too many categories?
lets say we leave out two cats
constant is just average of cat1 or cat 2.
interpretation just sucks.
Interactions
TK MAKE SURE YOU INCLUDE TABLE OUTPUT!
Is the relationship between y and x, the same for 2 different groups/subsets/types (or more).
the coefficient in female is just the difference compared to the coefficient. (for in class example)
- yeared coefficient is the slope for men.
the reason we run an interaction is not because we didnt know anyhting among hte coefficients but it is because we are interested in everything after. We know they are different slopes but are they statistically different?
take derivative with respect to female.
0 + beta1 + 0 + beta2 yearsed
derivatives are your friend in interactions.
Control Variables
To some extent we actually keep doing bivariate regressions but in a weird way.
saying hold this one variable constant.
when you say all else equal it insinuates a causal claim.
need to standardize coefficients for interpretation.
scale of x variable determines the scale of your coefficient.
Frisch Waugh Lovel FWL Theorem
aftq on loginc
aftq on years of ed
look closely at multiple regression stata do file.
we are basically running extra regressions.
- its not magic - using regressions to purge influence of X2 on X1 and Y.
Week 4
Problem set 3 answers posted.
Problem set 4 is long.
internal validity: will this research design give me an unbiased estimate of the parameter that I am claiming ot estimate
- RCT - want to estimate ATE. - will my design actually give me an unbiased estimate of the ATE.
external validity: does this parameter generalize?
why include control variables in a RCT regression.
ATE - best case scenario:
Required Assumption to identify parameter: Randomization done correctly
Fights with reviewers: External Validity, Sample Size (precision)
Estimand - the target - what we are trying to learn
estimator - the thing we actually do with out data. We use this to estimate our estimand.
Regression and Causality
Imagine you didn’t run an experiment - or there is no obvious natural experiment.
we just have some cross sectional data.
i am interested in this variable but it wasn’t randomly assigned.
but i have a lot of other variables I can use to control for.
- when do i actually have a causal model?
Can we ever get causality from a regression?
when the conditional independence assumption holds.
what is that?
actual treatment status is independent of the outcomes.
if you could take your data and find observations that have the same values on everything but have variation on some treated/untreated variable - what is the effect?
basically you are finding the counterfactual.
- not perfect - example - look at Prince Charles and Ozzy Osbourne.
\(Y_{si}=f_i(s)\)
- notation for all possible potential outcomes
CIA suggests that we can estimate many causal effects, but need more structure for regression motivation.
no i on coefficient on CIA to regression slide.
This is basically all just omitted variable bias
Question: if a full regression model has a causal interp. and I leave out a variable, what goes wrong?
- It is a bias estimate of the truth!!!!!
Long regression vs short regression:
what happens when you have OVB?
- See slides with the big formuals
OVB it moves your bias on a number line - It has a direction! it is either positive or negative!
Week 5
Does FWL give me variation that is as good as an experiment?
it is fine to leave out variables that don t have an effect on the outcome.
also fine to leave out variables that are uncorrelated with x_1 even if they have an effect on outcome.
- OVB is a product of these two. If one is zero then you are safe.
You can leave out mediators.
think school years -> income
- you can leave out writing ability because it is a mediator.
I have a causal research question - wish i had a better research design but i don’t - thus this is the best I can do.
Propensity Score Matching
Propensity Score Matching
Matching helps with figuring out an estimand
Balance is very important because it can bias our estimand.
- matching essentially allows you to rebalance your data
we are not in RCT land.
we are going to need to match on more than just one variable
exact matching vs. propensity score in matching.
why use matching?
no linear form assumption
OLS uses everyone. Matching only uses similar untreated obs.
matching does not fix OVB.
tough part: how do we assign the weight on the untreated outcome.
OLS is not quite the same. What OLS is doing residuals on residuals.
Propensity score : the probability of being treated conditional on some Xs.
we are going to collapse all of our Xs into one number 0-1 (probability)
Rosenbaum and Rubin (1983)
prediction problem - what is the probability of being treated conditional on all the Xs.
Compare treated vs. untreated obs that have very similar propensity score matching.
need to show common support ALWAYS.
Nearest Neighbor Matching: find an obs that has the closest propensity score and use their untreated outcome.
Panel Data
we’ve only been thinking about causality with cross sectional data.
but now we have data over time.
need to know long vs wide
Long data = multiple dates per cross-sectional data
- multiple rows for each observation across time
Wide data = one observation - variables items are over time
stata panel commands want your data LONG.
reshape switches data from long to wide or vice versa.
Diff-in-Diff
When to Use It:
RCT is not an option
natural experiment = quasi-random event changes our ability to see treated vs. untreated outcomes.
Good news: you don’t have to randomize anything
Bad news: frequently randomization is less than ideal
usually treatment given to whole population
no obvious control group - you need to choose a comparison group.
Diff-in-diff is just basically you wanting to compare two groups.
basically it is 4 sub-group means.
we can see
g = group
t = period of being treated
Equal trends assumption is very important.
- fundamentally an assumption.
Time is additive and separable
this hinges on who you select as the counterfactual.
you need to go to bat for your assumption.
we are going to impose the equal counterfactual assumption
KEY ASSUMPTION: Change over time for comparison group estimates counterfactual, i.e. what would have happened to treated group w/o treatment.
Citation
@online{neilon2025,
author = {Neilon, Stone},
title = {Applied {Econometrics}},
date = {2025-01-14},
url = {https://stoneneilon.github.io/notes/American_Behavior/},
langid = {en}
}