LLMQuant Newsletter

LLMQuant Newsletter

Mean Reversion in Action: Building a Pairs Trading Strategy with the Ornstein

How to implement, calibrate, and test a modern statistical arbitrage model

LLMQuant's avatar
LLMQuant
Nov 13, 2025
∙ Paid

Statistical arbitrage has long been one of the core strategies in quantitative equity trading, yet most retail and even some institutional frameworks remain stuck in the classic “pairs trading” mindset. But modern way is far more sophisticated. It works not just with stock pairs, but with stock portfolios; not with ad-hoc spread rules, but with mathematically grounded mean-reverting processes; and not with intuition, but with risk-factor decomposition, systematic signal generation, and rigorous backtesting.

In this article, we discuss a full statistical arbitrage strategy based on the Ornstein–Uhlenbeck (OU) process, following the approach proposed by Avellaneda and Lee in their influential 2008 paper Statistical Arbitrage in the U.S. Equities Market. We begin with a single-factor setup to extract residuals, calibrate their OU dynamics, generate s-scores, and construct long–short, market-neutral portfolios. In the second part, we explore extensions involving multiple ETFs, principal component analysis (PCA), and volume-adjusted signals examining which refinements add value and which introduce noise.


1. What Makes Statistical Arbitrage “Statistical”?

Avellaneda and Lee emphasize three defining characteristics of statistical arbitrage strategies. First, trading signals must be fully systematic generated by rules or models, not by discretionary judgment. Second, the long–short portfolio must be market-neutral, designed such that the beta to broad market factors is effectively zero. Third, profits arise from statistical properties such as mean reversion, not from forecasts of fundamentals or macroeconomic trends.

In practice, this means that if two stocks share similar risk exposures, then the residual may behave like a stationary mean-reverting process. When the residual deviates far from its mean, we take long or short bets expecting it to revert. The OU process provides a natural mathematical model for this behavior.


2. Modeling Residuals with the Ornstein–Uhlenbeck Process

Consider two stocks, P and Q. Instead of trading them on raw prices or returns, we construct a spread by regressing one on the other. The residual serves as the candidate mean-reverting signal. More generally, for a stock and its sector ETF, we decompose returns into:

  • a systematic component explained by the ETF or risk factors, and

  • a residual component XtX_tXt​, which we model as an OU process.

The OU process is specified as:

\(dX_t = \kappa (m - X_t) dt + \sigma dW_t\)

where κ is the speed of mean reversion, m its long-term mean, and σ its volatility. Stocks whose residuals revert quickly provide stronger trading opportunities.

For our empirical test, we use the BBH Biotech ETF and its constituent stocks from 2019 to 2021. A 60-day rolling regression extracts daily betas and residuals, after which we calibrate OU parameters through a second regression on the cumulative residuals.

A key step is filtering for stocks with sufficiently strong mean reversion. Following the paper, we select stocks with annualized κ>252/30, meaning they revert to the mean within roughly a month.


3. Generating Signals with the s-Score

Once we estimate OU parameters, we compute the s-score, essentially a standardized measure of deviation from the expected long-term mean. When s-scores exceed positive thresholds, the spread is considered “too high,” suggesting a short position; when they fall below negative thresholds, we go long.

Using thresholds similar to Avellaneda and Lee:

  • open short when s>1.25;

  • close short when s<0.75;

  • open long when s<−1.25;

  • close long when s>−0.5.

This creates a systematic rule that reacts to meaningful deviations while avoiding noise.

Each trade consists of buying 1 dollar of the stock and shorting β dollars of the ETF (or vice versa), ensuring market neutrality.

# calculate positions
algo_pos = pd.DataFrame(index=s_scores.index[1:], columns=stocks)

for s in stocks:
    positions = pd.DataFrame(index=s_scores.index, columns=[s])
    pos = 0
    for t in s_scores.index:
        score = s_scores.loc[t][s]
        if score>1.25:
            positions.loc[t][s] = -1 # open short
            pos = -1
        elif score<-1.25:
            positions.loc[t][s] = 1 # open long
            pos = 1
        elif score<0.75 and pos==-1: 
            positions.loc[t][s] = 0 # close short
            pos = 0
        elif score>-0.5 and pos==1:
            positions.loc[t][s] = 0 # close long
            pos = 0
        else:
            positions.loc[t][s] = pos # carry forward current position

    algo_pos[s] = positions

Keep reading with a 7-day free trial

Subscribe to LLMQuant Newsletter to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 LLMQuant
Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture