Experiments

Fall
2025
Methods
Professor: Alex Siegel
Published

January 14, 2025

Week 1

What is an experiment?

  • Gold standard for causal inference.

  • Research design and purposefully manipulating a treatment and uses random assignment to create comparable groups, so differences in outcomes can be attributed to the treatment (a causal effect).

Unit: unit of analysis

Treatment: well-defined manipulation with clear versions (dose, timing, delivery, content). Must be implementable and replicable.

Outcome: Pre-specified primary measure (behavioral, attitudinal, etc.)

Assignment mechanism: the stochastic rule mapping units to treatments (complete, Bernouli, blocked/stratified).

Randomization: implementation of the assignment mechanism so that treatment is independent of {Y(1). Y(0)} ( in expectation); enables unbiased diff-in-means and design-based inference.

Estimand: The quantity you want to learn from a study (ATE, ITT, LATE, CATE) defined for a population and a contrast between interventions.

Causal inference: Is learning how outcomes would change under different, well-defined interventions for specified units, settings, and times.

  • How outcomes would change under different treatment effects.

We can get close to causal inference in lots of ways, but we need to be clear about what our assumptions are.

Social science theories are almost always causal in nature.

Manski: data + assumptions = conclusions -> experiments make assumptions clearer and lighter.

  • Every causal claim rests on assumptions. Experiments makes them explicit and often weaker.

  • SUTVA - Stable unit treatment value assumption

Old way: kitchen-sink regression + causal weasel words (“associated with,” “linked to,” “drives”, “increases”, “predicts”)

Fundamental Problem of Causal Inference: For unit i, cannot observe \(Y_i(1)\) and \(Y_i(0)\) simultaneously

Good experimental design requires good theory - WHAT IS THE MANIPULABLE LEVER

What is a Theory?

  • A logically coherent explanation of why/how a manipulable cause changes an outcome, within stated scope conditions, yielding testable implications.

What is a Mechanism?

  • The sequence of intermediate changes that transmit the effect from cause -> outcome (use verbs: inform -> update beliefs -> shift norms -> act).

What is a Hypothesis?

  • a falsifiable prediction about a causal contrast (treatment vs control) derived from theory.

Why Preregister?

  • Clarifies the estimand, outcomes, and analysis before seeing data.

  • reduces p-hacking/HARKing and garden of forking paths.

  • increased credibility with reviewers, partners, and future you.

  • make deviations transparent: changes are documented and justified.

  • What counts as preregistration?

    • a time stamped, accessible record

    • can be a public or embargoed until data collection ends.

    • not a prison: you can amend; just explain why.

Activity:

Does the walkability of an environment increase voter turnout?

Yes it was causal

design: Get a walkability data set and a voter turnout data set and run a regression. Controlling for various factors. Problem - I have endogeneity as people can move and sort themselves - it is possible that people that move to

Among individuals between the 2016 - 2020 elections, what is the effect of living in a walkable vs. non-walkable area on voter turnout?

Week 2

  • Causal questions are always about counterfactuals.

    • Does the minimum wage increase the unemployment rate?

      • Counterfactual: would it have gone up if the increase had not occurred?
  • Causal inference = a missing data problem.

  • Political canvassing study example

  • Pretreatment covariates -

  • we want the unit causal effect. However, we don’t have two states of the world for any given person.

  • SATE - sample average treatment effect

    • average causal effect for the exact units in this study.

    • complete randomization means unbiased for SATE.

  • SATT - Sample Average Treatment Effect on the Treated

    • RCT (randomized controlled trial)
  • PATE (population ATE)

  • CATE: conditional average treatment effect

    • heterogeneous effects

    • subgroups.

  • Z means assignment to treatment.

  • actual receipt: D (some accept, some don’t)

    • someone was assigned to watch a movie (the treatment), however they got a snack and didn’t actually receive it.
  • CACE/LATE assumptions:

    • Random Z (independence)

    • Exclusion: Z affects Y only via D.

    • Monotonicity: no defiers

  • DEFINE estimands: ATE/ITT/CACE/CATE clearly.

  • First present ITT first, then CACE with the Wald ratio + SE/CI

  • For CATE: pre-specificy subgroups or honest discovery; show subgroup baselines.

  • Post-treatment adjustment: don’t control for variables affected by D.

  • attrition/truncation: outcomes defined only for a subset

  • Interference: if spillovers likely, redesign or redefine estimand.

  • issue of bundled treatment major issue in like everything i do.

Week 3

Readings: Book Chapters

MIDA

Model

  • The M in MIDA does not necessarily represent our beliefs about how the world actually works. Instead, it describes a set of possible worlds in enough detail that we can assess how our design would perform if the real world worked like those in M

  • Examples of models:

    • Contact theory: When two members of different groups come into contact under specific conditions, they learn more about each other, which reduces prejudice, which in turn reduces discrimination.

    • Prisoner’s dilemma. When facing a collective action problem, each of two people will choose non-cooperative actions independent of what the other will do.

    • Health intervention with externalities. When individuals receive deworming medication, school attendance rates increase for them and for their neighbors, leading to improved labor market outcomes in the long run.

Inquiry

  • Think of inquiry as the question.

Data Strategy

  • The data strategy is the full set of procedures we use to gather information from the world. The three basic elements of data strategies parallel the three features of inquiries: units are selected, conditions are assigned, and outcomes are measured.

  • What are our sampling procedures? Treatment-assignment procedures? Measurement procedures?

Answer Strategy

The answer strategy is what we use to summarize the data produced by the data strategy. Just like the inquiry summarizes a part of the model, the answer strategy summarizes a part of the data.

  • Multilevel modeling and poststratification

  • Bayesian process tracing

  • Difference-in-means estimation

Lecture

Declare Design

  • Use their package!

  • MI (theoretical) sets the challenge

  • DA (empirical) is your response

  • Chapter 18 reread! Very important but dense.

Research Question

  • Template: Among units in setting/time, what is the effect of A vs B on outcome?

  • Think of counterfactual!

  • Hypotheses must match structure of research questions (falsifiable)!

  • MEAN DETECTABLE EFFECT!

    • if significant and Mean Detectable Effect equal to or greater than X, what will a policymaker/org do.

MIDA

  • M: Model- set of plausible worlds.

  • I: Inquiry: the estimands your questions target

  • D: Data Strategy: sampling, assignment, measurement

    • SAMPLING - who is going to be in your sample

      • convenience sampling:

        • non-probability sampling - easy and cheap

        • MTurk/Prolific, campus pools, social ads

        • mechanism or existence of proofs: your testing whether a causal pathway can operate (not how big it is in a target population)

        • Pick a sample that the mechanism might be most likely to detect OR least likely.

        • You get the SATE of the panel. External validity is limited.

      • Probability/Stratified

        • units drawn with known inclusion probabilities from a defined frame.

          • stratified random sampling: list-based (e.g., voter file/registry)
        • Strength: supports PATE with principled weights/post-stratification; clear coverage assumptions.

      • Quota/ Balanced Convenience

        • non-probability sample with quotes so sample margins match benchmarks
    • ASSIGNMENT - who is getting the treatment/control

      • Complete RA (CRA) - randomize across the full sample to a fixed share; covariate balance holds in expectation

        • No strong-priors/ simple baseline? -> Complete RA
      • Blocked/Stratified RA (BRA): Randomize within pre-defined strata so key covaraites are balanced by design.

        • GOAL: Make units wtihin strata more similar on Y’s drivers -> smaller SEs.

        • Best single choice: a pre-treatment measure of the outcome (lagged turnout, protest score, baseline index)

        • Block for subgroup guarantees you care about to ensure CATE sample sizes.

        • Clustered trials: pair/block on cluster level baselines, cluster size, region.

        • Keep it parsimonious

        • Need precision + good predictors? -> Block Stratify

      • Clustered RA: Randomize whole groups (schools, villages, teams) together when delivery or spillovers are group-level.

        • Delivery or a interference is group level? -> Cluster.
      • Redistricted/Re-randomization: Draw randomization but only accept those that meet pre–set balance criteria.

        • Care about global covariate balances but can’t block finely? -> restricted re-randomization.
      • Stepped-wedge: stagger rollout across periods so everyone is treated by the end; analyze interim contrasts.

        • Ethics/logistics require treating all eventually or few units available? -> stepped wedge.
      • Randomized saturation: Randomize cluster-level treatments % (e.g., 25% vs 75%) then assign individauls accordingly to study spillovers.

        • Want spillover/saturation effects? -> randomized saturation
  • A: Answer Strategy: estimators, SEs.

    • Complete RA -> OLS with robust SEs

    • Blocked RA -> block Fixed Effects

    • Clustered RA -> Cluster Robust SEs

    • Probability sample -> survey weights/ post-stratification

    • Noncompliance -> IV/2SLS for CACE.

    • Multiple comparisons: pre-specify families; Bonferroni corrections) if needed.

  • POWER ANALYSIS: What it is & why it matters

    • Power: the chance your study detects a real effect (returns a significant result when the true effect is at least the size you care about).
    • MDE - minimal delectable effect - is it worth it to do the study?

Week 4

Measurement!

  • We care about the effect of the concept that Z represents on the concept that Y represents

  • Valid inference requires: design validity and measurement validity for both Z and Y.

  • Validity:

    • content - assess the degree to which an indicator represents the universe of content entailed in the systematized concept being measured.

    • construct - the data behaves like theory predicts

    • convergent - correlates with related measures

    • discriminant -

  • Bundled treatment - compound bundles of many things.

  • Define compliance to match the concept (information actually processed, not merely door opened)

  • Manipulation checks - did they understand? comprehension checks. Perception checks, did they notice the attribute? and behavioral probes.

  • Pre-register checks!

  • Hawthorne - knowing you are being observed.

  • non-systematic noise - independent of Z; inflates variance, hurts power.

  • Systematic error - correlated with Z: induces bias.

  • as long as it is not correlated with treatment - we are okay!

  • more noise = more standard error.

  • multiple noise indicators can improve precision

  • Gains: higher correlation with latent construct, tighter CIs, greater power.

  • If X is an index - bundled treatment

Citation

BibTeX citation:
@online{neilon2025,
  author = {Neilon, Stone},
  title = {Experiments},
  date = {2025-01-14},
  url = {https://stoneneilon.github.io/notes/American_Behavior/},
  langid = {en}
}
For attribution, please cite this work as:
Neilon, Stone. 2025. “Experiments.” January 14, 2025. https://stoneneilon.github.io/notes/American_Behavior/.