ETF-Rebalancing Cascades

1. Motivation

This post looks at the consequences of ETF rebalancing. These funds follow pre-announced rules that involve discrete thresholds. The well-known SPDR tracks the S&P 500, but there are over 1400 different ETFs tracking a wide variety of different underlying indexes. When any of these underlying indexes change, the corresponding ETFs have to change their holdings. These thresholding rules mean that, in an extreme example, if Verisk Analytics gets $\mathdollar 1$ larger and moves from being the $501$ st largest stock to being the $500$ th largest stock (actually happened), then the ETFs tracking the S&P 500 are going to suddenly have to build large positions in Verisk over a relatively short period of time. See here for more examples.

When there are many different ETFs tracking many different thresholds, these rebalancing rules can interact with one another and amplify small initial demand shocks. For example, when Verisk increases in size and gets bought by ETFs tracking the S&P 500, it will be slightly more correlated with the market (Barberis, Shleifer, and Wurgler, 2005). As a result, ETFs like SPHB that track large-cap high-beta stocks might have to buy Verisk as well, which in turn can have further consequences down the line. This is what journalists have in mind when they worry about ETFs “turning the market close into a buying or selling frenzy.” To model this phenomenon, I use an approach based on branching processes à la Nirei (2006).

The idea that traders might herd (Devenow and Welch, 1996) or amplify shocks (Veldkamp, 2006) is not new. But, these rebalancing rules can also transmit shocks to completely unrelated corners of the market. This idea is new. Rebalancing involves buying one stock and selling another. It sounds obvious, but it matters that we’re looking at “rebalancing” rules and not just “purchasing” rules. For instance, when Verisk got added to the S&P 500, it replaced Joy Global, a mining-tools manufacturing company. So, all of the ETFs tracking the S&P 500 had to sell their positions in Joy Global, pushing Joy’s price down and making it more likely that ETFs that need to hold a position in mining companies choose one of Joy’s competitors like ABB Limited. And, this additional demand for ABB can cause subsequent rebalancing, which is the result of an initial change in the value of Verisk, a totally unrelated stock.

No trader would ever be able to guess that the reason another Swiss company like Novartis is realizing selling pressure is that Verisk was Joy Global in the S&P 500, which caused ETFs tracking the S&P 500 to sell Joy, which caused other industrial ETFs to replace Joy with ABB Limited, which caused still other ETFs tracking the largest stocks in each European country to replace their positions in Novartis with a position in ABB. So, even though each ETF’s rebalancing rules are completely predictable, their aggregate behavior generates noise. By analogy, the population of France at any given instant is a definite fact. It is not random. No one is timing births by coin flips, dice rolls, radioactive decay, etc… But, as John Maynard Keynes pointed out, whether this number is even or odd at each instant is effectively random. Were $1,324$ or $1,325$ people born in Paris during the time it took you to finish counting the population of Nice?

2. Market Structure

Suppose there’s a single stock that can be held by $F$ different ETFs, and each fund’s demand for the stock, $x_f$ , is the sum of $3$ different components:

(1) $\begin{align*} x_f &= \alpha_f + \mathrm{B}(\mathbf{x}) - \lambda \cdot z_f \qquad \text{where} \qquad \mathbf{x} = \begin{bmatrix} x_1 & x_2 & \cdots & x_F \end{bmatrix}^{\top}. \end{align*}$

When writing down models of the ETF market, people usually only think about the first component, $\alpha_f$ . This is just the intrinsic demand that an ETF would have for the stock if it were the only fund in the market, like the SPDR was in the early 1990s. If there are no large-cap high-beta ETFs, then having the SPDR purchase additional shares of Verisk wouldn’t have any knock-on effects in the example above. Think about shocks to each fund’s $\alpha_f$ as shocks to whether or not the stock is included in the fund’s benchmark index.

The second component, $\mathrm{B}(\mathbf{x})$ , captures how an ETF’s demand is affected by the decisions of other ETFs. This is the effect of S&P 500-tracking ETFs on the holdings of ETFs that track a large-cap high-beta index. Because buying by one ETF leads to additional buying by other ETFs, $\mathrm{B}(\mathbf{x})$ is increasing in ETF demand:

(2) $\begin{align*} {\textstyle \frac{\partial}{\partial x_f}}[\mathrm{B}(\mathbf{x})] &> 0. \end{align*}$

Because each ETF’s demand only has a subtle effect on other ETFs’ demand, we have that:

(3) $\begin{align*} {\textstyle \frac{\partial}{\partial x_f}}[\mathrm{B}(\mathbf{x})] &= \mathrm{O}(\sfrac{1}{F}), \end{align*}$

where $\mathrm{O}(\cdot)$ is Landau notation. So, when there are more ETFs that might hold a stock, each ETF’s decisions have a smaller effect on the decisions of its peers. In the theoretical analysis below, I’m going to study the function $\mathrm{B}(\mathbf{x}) = \frac{\beta}{F} \cdot \sum_{f=1}^F x_f$ , which satisfies this property.

Finally, the third component, $-\lambda \cdot z_f$ , reflects the fact that ETFs use thresholding rules. Even if Verisk has a market capitalization that is $\mathdollar 1$ smaller than the $500$ th largest stock, Verisk won’t be held by the SPDR, which tracks the S&P 500. The moment Verisk becomes the $500$ th largest stock, the SPDR is going to have to buy a large block of shares. This threshold-adjustment component is defined using modular arithmetic,

(4) $\begin{align*} z_f &= \lambda^{-1} \cdot \mathrm{mod}( y_f , \, \lambda), \end{align*}$

where $y_f = \alpha_f + \mathrm{B}(\mathbf{x})$ and $\mathrm{mod}( y_f , \, \lambda )$ is the remainder that’s left over after dividing $y_f$ by $\lambda$ . So, if $\lambda = 3$ shares and an ETF would have bought $y_f = 5$ shares, then $z_f = \sfrac{2}{3}$ and its resulting demand is $x_f = 3$ shares. A fund is always demanding some multiple of $\lambda$ shares. Think about $\lambda$ as the size of a typical ETF’s adjustment once a stock is added to its benchmark.

3. Equilibrium Concept

What does it mean for this market to be in equilibrium, and how does the market transition between equilibria? Via Tarski’s fixed-point theorem, we know that for any choice of ${\boldsymbol \alpha}$ , if

both $\alpha_f$ and $\lambda$ are bounded, and
there are scalars $\underline{x} = \mathrm{B}(\underline{x},\,\underline{x},\,\ldots,\,\underline{x}) + \underline{\alpha} - \overline{\lambda}$ and $\overline{x} = \mathrm{B}(\overline{x},\,\overline{x},\,\ldots,\,\overline{x}) + \overline{\alpha} - \underline{\lambda}$ ,

then there exists a solution, $\mathbf{x}^\star$ , to Equation (1) for all $F$ funds. This solution is the equilibrium associated with ${\boldsymbol \alpha}$ . Here, $\overline{\alpha}$ and $\underline{\alpha}$ denote the upper and lower bounds on $\alpha_f$ —i.e., $\alpha_f \in [\underline{\alpha}, \, \overline{\alpha}]$ ; and, $\overline{\lambda}$ and $\underline{\lambda}$ denote the upper and lower bounds on $\lambda$ . To make things concrete, note that, if $\alpha_f \overset{\scriptscriptstyle \mathrm{iid}}{\sim} \mathrm{Unif}(0,\,\lambda)$ , then $\mathbf{x}^\star = 0$ .

At the start of each trading day, the market is in an equilibrium at $\mathbf{x}^\star$ associated with the intrinsic demand ${\boldsymbol \alpha}$ . We can normalize this value to $0$ . Then, over the course of the day, the stock’s characteristics change. It gets added to some ETFs’ benchmarks. I model this as a shock to each ETF’s intrinsic demand,

(5) $\begin{align*} \hat{\alpha}_f = \alpha_f + \sfrac{\epsilon_f}{F} \end{align*}$

where $\epsilon_f$ is positive with mean $\mu_{\epsilon} > 0$ and distributed i.i.d. across funds. This shock is divided through by $F$ so that its impact is the same magnitude as the feedback effect from one ETF’s demand to another. Given this shock, how will ETFs’ demand evolve over the course of the final $10$ to $15$ minutes of trading?

To answer this question, we need to define how ETFs update their demand. I follow the approach used in Cooper (1994) where ETFs adjust their positions by applying the best response function iteratively. In the first round, ETFs adjust their holdings by $\lambda$ if they realize a sufficiently large shock to their intrinsic demand:

(6) $\begin{align*} x_{f,1} &= \begin{cases} x_{f,0} + \lambda &\text{if } \sfrac{\epsilon_f}{F} \geq \lambda \cdot (1 - z_{f,0}) \\ x_{f,0} &\text{else} \end{cases} \\ \text{and} \quad z_{f,1} &= z_{f,0} + \lambda^{-1} \cdot ( \, \sfrac{\epsilon_f}{F} - \{ x_{f,1} - x_{f,0} \} \, ). \end{align*}$

Then, in each subsequent period, ETFs adjust their holdings by $\lambda$ if the demand pressure from other ETFs is sufficiently large:

(7) $\begin{align*} x_{f,t+1} &= \begin{cases} x_{f,t} + \lambda &\text{if } \{ \mathrm{B}(\mathbf{x}_t) - \mathrm{B}(\mathbf{x}_{t-1})\} \geq \lambda \cdot (1- z_{f,t}) \\ x_{f,t} &\text{else} \end{cases} \\ \text{and} \quad z_{f,t+1} &= z_{f,t} + \lambda^{-1} \cdot \left( \, \{ \mathrm{B}(\mathbf{x}_t) - \mathrm{B}(\mathbf{x}_{t-1})\} - \{ x_{f,t+1} - x_{f,t} \} \, \right). \end{align*}$

This process continues until no more changes are required at $T = \min\{ \, t \mid \mathbf{x}_{t+1} = \mathbf{x}_t \, \}$ . Initially, the shock is exogenous, $\sfrac{\epsilon_f}{F}$ ; then, in all later rounds the shock comes from other ETFs, $\mathrm{B}(\mathbf{x}_t) - \mathrm{B}(\mathbf{x}_{t-1})$ . Importantly, ETFs in this model don’t proactively adjust their positions to account for future changes in demand that they see coming down the line in future iterations. After all, ETFs are constrained to mimic their benchmarks.

4. Cascade Length

It is now possible to show that small initial shocks to each ETF’s demand can lead to long rebalancing cascades. Let $\ell_t$ denote the number of ETFs that buy $\lambda$ additional shares of the target stock in period $t$ :

(8) $\begin{align*} \ell_t &= \lambda^{-1} \cdot {\textstyle \sum_{f=1}^F} (x_{f,t} - x_{f,t-1}). \end{align*}$

Similarly, let $L$ denote the total number of ETF position changes from time $t=1$ to time $t=T$ :

(9) $\begin{align*} L &= {\textstyle \sum_{t=1}^T} \ell_t = \lambda^{-1} \cdot {\textstyle \sum_{f=1}^F} (x_{f,T} - x_{f,0}). \end{align*}$

So, when $L \gg 1$ , then a small initial shock will cause a large number of ETFs to change their positions later on. This is what I mean by the length of a rebalancing cascade.

In order to characterize the distribution of $L$ analytically, I need to make a couple of functional form assumptions. First, I assume that the interaction of ETF-demand schedules is governed by the rule:

(10) $\begin{align*} \mathrm{B}(\mathbf{x}) &= {\textstyle \frac{\beta}{F}} \cdot {\textstyle \sum_{f=1}^F} x_f, \end{align*}$

with parameter $0 \leq \beta \leq \mathrm{min}(\lambda^{-1},\,1)$ . As $\beta$ gets larger and larger, each ETF’s demand has a larger and larger effect on other ETFs’ demand schedules. Finally, suppose that the distribution of ETF intrinsic demands,

(11) $\begin{align*} \alpha_f \overset{\scriptscriptstyle \mathrm{iid}}{\sim} \mathrm{Unif}(0,\,\lambda), \end{align*}$

is uniformly distributed on the interval from $0$ to $\lambda$ .

Using results from Harris (1963) it’s possible to show that, as the number of funds gets large, $F \to \infty$ , the sequence of adjustments $\{ \ell_t \}$ converges to a Poisson-distributed Galton-Watson process. The probability-density function for the total cascade length is given by:

(12) $\begin{align*} \mathrm{pdf}(L) &= (\beta \cdot L + \sfrac{\mu_{\epsilon}}{\lambda})^{L-1} \cdot \frac{\sfrac{\mu_{\epsilon}}{\lambda} \cdot e^{-(\beta \cdot L - \sfrac{\mu_{\epsilon}}{\lambda})}}{L!}. \end{align*}$

The figure above plots this function when $\beta = 0.10$ and $\mu_{\epsilon} = 0.10$ .

A Galton-Watson process, $\{\ell_t\}$ , is stochastic process that starts out with formula $\ell_0=1$ and then evolves according to the rule

(13) $\begin{align*} \ell_{t+1} &= {\textstyle \sum_{\ell=1}^{\ell_t}} \xi_{\ell}^{(t)}, \end{align*}$

where $\{ \, \xi_{\ell}^{(t)} \mid t , \, \ell \in \mathbb{N} \, \}$ is a set of i.i.d. Poisson-distributed random variables. To illustrate by example, the figure below displays a single realization from a Galton-Watson process. In this example, there are $L=7$ total ETFs that rebalance their holdings in response to an initial demand shock. In this example, the rebalancing cascade lasts $T = 3$ rounds. There are $\ell_2 = 3$ ETFs that rebalance in the second round, and the rebalancing demand from the third of these ETFs causes two more ETFs to adjust their positions, $\xi_{3}^{(2)} = 2$ .

Given the probability-density function described above, it’s easy to show that the mean and variance of the number of ETFs that rebalance is:

(14) $\begin{align*} \mathrm{E}[L] &= {\textstyle \frac{\mu_{\epsilon}}{\lambda} \cdot \frac{1}{1-\beta}} \quad \text{and} \quad \mathrm{Var}[L] = {\textstyle \frac{\mu_{\epsilon}}{\lambda} \cdot \frac{1}{(1-\beta)^3}}. \end{align*}$

So, even when each individual ETF realizes a shock to its intrinsic demand over the course of the trading day that is quite small on average, $\mu_{\epsilon} \approx 0$ , the stock can still realize large demand shocks when each ETF’s demand has a large spillover effect, $\beta \approx 1$ , and when they don’t have to make very granular position changes, $\lambda \approx 0$ . This is the amplification result I mentioned above. The figures below verifies these calculations using simulations with $F = 10^3$ ETFS. In particular, the left panel shows that the initial shock of only $\mu_{\epsilon} = 0.10$ leads to more than one rebalancing cascade on average when $\lambda = 0.10$ .