Post

The Causal Inference Series : Part I

The Causal Inference Series : Part I

Introduction

Sometimes, numbers tell a story that feels clear, until you look a little closer!

Opening Example
September 20, 2025

Take this example: Two hospitals are treating patients for the same illness. Hospital A presents an 80% recovery rate, while Hospital B trails behind at 60%. Naturally, you’d assume Hospital A is doing a better job.

But here’s the twist: Hospital A mainly handles simple cases. Hospital B, on the other hand, treats more severe ones. When you break down the data by case severity, it turns out that Hospital B actually has better recovery rates in both mild and severe cases. The overall numbers were hiding the truth.

This kind of statistical surprise is known as Simpson’s Paradox: when a trend appears in aggregate data but disappears or reverses when you look at the subgroups.

A famous real-world example happened in 1973 at UC Berkeley. At the time, their graduate school admissions data seemed to show discrimination against women: 44% of male applicants were admitted, compared to just 35% of female applicants. But when researchers dug deeper, the story flipped. Most departments were actually admitting women at higher rates than men. The paradox? Women were applying more often to highly competitive departments with lower acceptance rates overall.

Core Insight
September 22, 2025

These examples illustrate the fundamental challenge: observing that two things occur together tells us nothing about whether one causes the other.

So in this series, we’ll explore how concepts like Simpson’s Paradox lead us into the deeper world of causal reasoning. By the end, we’ll move beyond surface-level data and into the mathematical foundations of what it means to say that one thing causes another.

Correlation vs Causation

  • Correlation (or association): Two variables $X$ and $Y$ display a statistical relationship, e.g. as $X$ increases, $Y$ tends to increase (positive correlation), or tends to decrease (negative correlation).

  • Causation: A change in $X$ brings about (directly or indirectly) a change in $Y$; we write $X \to Y$.

A famous admonition is: “correlation does not imply causation.” That is, the fact that $X$ and $Y$ are correlated is not sufficient to conclude $X$ causes $Y$.

How correlations can arise without causation

When you see a correlation, several possible underlying “stories” could explain it : not all of them involve $X \to Y$. Here are common scenarios:

  1. Reverse causation (or bidirectional causation): $Y \to X$, or both directions.

    • E.g. more wealth might increase education opportunities, but also more education might lead to higher wealth.
  2. Confounding / common cause: A third variable $Z$ influences both $X$ and $Y$. The correlation is spurious in terms of $X \to Y$.

    • Example: Ice cream sales and drowning incidents correlate (both rise in summer). The hidden confounder is ambient temperature.
  3. Coincidence / randomness: The correlation is simply by chance, especially if one looks at many variable pairs and picks those that “look interesting.”

    • Even if $\rho = 0.9$ (a high correlation), that does not guarantee a causal link. Correlations may arise spuriously in large data sets.
  4. Mediation / indirect causation: $X$ causes $Inter$, which in turn causes $Y$. So $X$ is upstream of $Y$, but not via a direct link.

    • In this case one can talk of a causal chain $X \to Inter \to Y$.
  5. Measurement artifacts, selection bias, or data issues: The observed correlation is introduced by how data is collected, missing data, aggregation, or measurement errors.

  6. Simultaneity or feedback: $X$ and $Y$ respond to each other or to a common equilibrium (e.g. supply and demand).

Pearl’s Causal Hierarchy (Ladder of Causation)

Pearl proposes three levels of causal reasoning, each strictly more expressive than the previous.

LevelName / RoleQuestion FormWhat It CapturesWhat You Need / What You Can Do
Level 1Association (Observing / Seeing)“What is $P(Y \mid X)$?”Statistical association / correlation / predictive relationYou only need the joint (or conditional) distributions of observed variables. You can do prediction, pattern recognition, statistical inference.
Level 2Intervention (Doing / Experimenting)“What is $P(Y \mid \mathrm{do}(X))$?”Causal effect of forcing $X$ to a valueYou need a causal model (or assumptions like no unobserved confounding) that lets you reason about interventions (the “do-operator”). Enables policy evaluation, A/B testing, decision-making.
Level 3Counterfactuals (Imagining / Retrospective)“What would $Y$ have been if $X$ had been different (for this same unit)?”Individual-level causal reasoning, “what if” for the actual caseYou need structural (mechanistic) causal models (structural equations, latent factors). Enables explanations, personalized treatment effects, “why did this instance happen?”
Key Insight
September 25, 2025

Moving up the ladder requires stronger assumptions and different methods. Most AI operates only at Level 1. We’re going to explore these notions later on the series. For now, let’s move to the main notions we’ll use to model the concept.


Fundamental Causal Concepts

Treatment and Outcome

Treatment $T$: The intervention or exposure of interest. Can be binary (drug vs. placebo), multi-valued (low/medium/high dose), or continuous (price).

Outcome $Y$: The variable we care about measuring. Must be well-defined and measurable.

Unit $i$: The entity receiving treatment could be a person, company, city, or any observational unit.

Confounding

Confounder $C$ : A variable that affects both treatment and outcome, creating a misleading correlation. Example: Ice cream sales correlate with drowning deaths. Temperature is a confounder: Hot weather increases both ice cream consumption and swimming (which increases drowning risk).

Notation: $C \to T$ and $C \to Y$

Causal Effect

Individual Treatment Effect $ITE$: The difference between outcomes under treatment vs. control for the same unit: \(\mathrm{ITE}_i = Y_{1,i} - Y_{0,i}\)

where $Y_{1,i}$ is the outcome if treated and $Y_{0,i}$ is the outcome if not treated.

Fundamental Problem
September 25, 2025

But the problem we have is that we can never observe both simultaneously for the same unit: This is the Fundamental Problem of Causal Inference.

Observable difference:\(\mathbb{E}[Y \mid T = 1] - \mathbb{E}[Y \mid T = 0]\)This is just comparing treated vs untreated groups in observational data.

Observable vs. Causal quantity:

  • Observable: $E[Y \mid T=1] - E[Y \mid T=0]$ (comparing treated to control groups)
  • Causal: $E[Y_{1}] - E[Y_{0}]$ (comparing same units under different treatments)
Important Note
October 01, 2025

The causal parameter $\mathbb{E}[Y_1] - \mathbb{E}[Y_0]$ is equal to the observable difference under special conditions that we’ll explore in next posts.

The Selection Bias Problem

Why doesn’t simple comparison work? Consider a study examining whether attending tutoring sessions $T$ improves exam scores $Y$. Observational data shows students who attended tutoring scored $5$ points lower on average than those who didn’t: $E[Y|T=1] - E[Y|T=0] = -5$. Does tutoring harm performance? No! Students who attended tutoring were struggling students with weaker backgrounds who sought extra help precisely because they expected to perform poorly. The true causal effect $E[Y_{1}] - E[Y_{0}]$ might actually be $+10$ points of improvement. The observed $-5$ conflates the genuine treatment effect with selection bias: The $–5$ result is confusing because students who choose tutoring are already $15$ points different from those who don’t. This shows why just comparing treated and untreated groups in observational studies can be misleading when trying to find cause-and-effect relationships.

What’s Next

In this post, we’ve established why correlation differs from causation and introduced the basic framework for thinking causally. In the next post, we’ll formalize these intuitions with the Potential Outcomes Framework, a way to clearly define cause-and-effect and think about “what if” scenarios.

This post is licensed under CC BY 4.0 by the author.
PRESENT DAY • PRESENT TIME