Research Notebook

A Model of Rebalancing Cascades

October 23, 2016 by Alex

1. Motivating Examples

Trading strategies can interact with one another to amplify small initial shocks to fundamentals:

  • Quant Crisis, Aug 2007: “During the week of August 6, 2007, a number of [quantitative hedge funds] experienced unprecedented losses… Initial losses [were] due to the forced liquidation of one or more large equity market-neutral portfolios… and the subsequent price impact… caused other similarly constructed portfolios to experience losses. These losses, in turn, caused other funds to deleverage their portfolios [and] led to further losses [and] more deleveraging and so on.”
  • Flash Crash, May 2010: “The Dow Jones industrial average plunged more than 600 points in a matter of minutes that day and then recovered in a blink… [it] began with the sale by Waddell & Reed of 75,000 E-Mini S&P 500 futures contracts… late in the trading day… [with] many of the contracts bought by… computerized traders who… [then] traded contracts back and forth [like a] ‘hot potato’.”
  • Drop in Oil Prices, May 2011: “Never before had crude oil plummeted so deeply during the course of a day… prices were off by nearly \mathdollar 13 a barrel… [and] market players were unable to identify any single bank or fund orchestrating a massive sale to liquidate positions… [rather] computerized trading just kicked in when key price levels were reached.”
  • End-of-Day Volume, Oct 2011: “In the last 18 minutes of trading, the S&P 500-stock index jumped more than 10 points with no news to account for the rally. If you were left scratching your head, you were not alone… [and, the] culprit behind the late-day market swings: exchange-traded funds or ETFs.”
  • Sterling after Brexit, Oct 2016: “If a country’s exchange rate represents international investors’ confidence in its government’s policies, the markets have given Britain the thumbs-down… The most likely explanation for the plunge lies in the action of algorithmic trades… These sales can be contagious, with one program’s trades setting off the sell signals of other algorithms.”

I refer to these sorts of events as rebalancing cascades. Stock 1‘s fundamentals change, so a trading strategy sells stock 1 and replaces it with stock 2. This purchase of stock 2 forces another trading strategy to buy stock 2 and sell stock 3. And, this sale forces…

The examples above seem to suggest that you don’t need a very big initial change in stock 1‘s fundamentals to trigger a cascade. For instance, when oil prices suddenly dropped in May 2011, traders were “unable to identify any single bank or fund orchestrating a massive sale”. They were left “scratching [their] heads”, to use the language of the next example. So, in this post, I write down a random-networks model à la Watts (2002) to understand when we should expect small changes in fundamentals to trigger these sorts of long rebalancing cascades.

2. Market Structure

Consider a market with S stocks, s = 1, \, 2, \, \ldots, \, S, where S is a really big number. If a change in the fundamentals of stock s' will force some trading strategy to rebalance and buy stock s instead, then let’s say that stock s' and stock s are neighbors. Suppose that two randomly selected stocks are neighbors with probability \sfrac{\lambda}{S}. This uniform-random-matching assumption means that the number of neighbors that each stock has, N_s, is Poisson distributed with mean \lambda \overset{\scriptscriptstyle \mathrm{def}}{=} \mathrm{E}[N_s],

(1)   \begin{equation*} N_s \sim \mathrm{Pois}(\lambda). \end{equation*}

The fact that each stock is only neighbors with a fraction of the market captures the idea that different trading strategies rebalance in different ways. For technical reasons, let’s assume that \lambda = \mathrm{O}[\log(S)]. The figure below gives examples of this sort of random network when S=100.

plot-rebalancing-cascade-network-20oct2016

Now, suppose that \Delta_s \in \{ 0, \, 1\} is an indicator variable for whether or not stock s‘s fundamentals have changed. If a bunch of strategies start trading stock s because of changes in its neighboring stocks’ fundamentals, then this additional trading activity can affect stock s‘s fundamentals. For example, if lots of funds buy a stock and push it into the S&P 500, then the stock will have a higher market beta. Let’s model neighboring stocks’ effect on stock s‘s fundamentals as follows,

(2)   \begin{equation*} \Delta_s = \begin{cases} 1 &\text{if } (\sfrac{1}{N_{s}}) \cdot {\textstyle \sum_{s' \in \mathcal{N}_s}} \Delta_s' \geq \phi \\ 0 &\text{else} \end{cases}, \end{equation*}

where \phi \in (0, \, 1] captures the vulnerability of a stock’s fundamentals to rebalancing. If there are lots of different strategies trading stock s in lots of different ways, N_s > N^\star, then no single rebalancing decision will be important enough to change stock s‘s fundamentals. But, if stock s has at most N^\star \overset{\scriptscriptstyle \mathrm{def}}{=} \lfloor \sfrac{1}{\phi} \rfloor neighbors, then a change in the fundamentals of a single neighbor will generate enough rebalancing to cause stock s‘s fundamentals to change too. Let’s say that stock s has V_s \overset{\scriptscriptstyle \mathrm{def}}{=} \sum_{s' \in \mathcal{N}_s} 1_{\{ \, N_{s'} \leq N^\star \}} such vulnerable neighbors.

diag-vulnerable-stocks-20oct2016

Here’s the exercise I have in mind. Imagine that we select a stock at random, s, and exogenously change its fundamentals, \Delta_s = 1. If s happens to have a vulnerable neighbor, s', then the rebalancing caused by our initial shock will change the fundamentals of a second stock, \Delta_{s'} = 1. And, if s' happens to have an additional vulnerable neighbor of its own, s'', then the second wave of rebalancing caused by our initial shock to stock s will change the fundamentals of a third stock as well, \Delta_{s''} = 1. If stock s'' doesn’t have any additional vulnerable neighbors, then we will have triggered a rebalancing cascade of length 3 with our initial shock to a single stock’s fundamentals. I want to characterize the distribution of cascade lengths for a randomly selected initial stock s,

(3)   \begin{equation*} C_s  =  \Delta_s  +  1_{\{ \mathcal{V}_s \neq \emptyset \}}  \cdot  \left\{  \,  {\textstyle \sum_{s' \in \mathcal{V}_s}}  \left( \, \Delta_{s'}  +  1_{\{ \mathcal{V}_{s'} \neq \emptyset \}}  \cdot  \left\{  \,  {\textstyle \sum_{s'' \in \mathcal{V}_{s'}}} \left( \,  \Delta_{s''}  +  \cdots  \,  \right) \,  \right\} \,  \right) \,  \right\}. \end{equation*}

as a function of the market’s average connectivity, \lambda, and vulnerability threshold, \phi.

3. Generating Functions

Generating functions make it possible to compute the distribution of cascade lengths. Here’s the basic idea. Take a look at Graham, Knuth, and Patashnik (1994, Ch 7) for more details. Suppose we’re flipping coins and counting the number of heads. The distribution of the number of heads, h, after one flip is given by:

(4)   \begin{equation*} \text{\# of heads}|\text{1 flip} =  \begin{cases} 1 &\text{w/ prob $q$, and} \\ 0 &\text{w/ prob $(1-q)$.} \end{cases} \end{equation*}

If q = \sfrac{1}{2}, then the coin is fair. The generating function for this same distribution is:

(5)   \begin{align*} \mathrm{G}(x|\text{1 flip}) &= {\textstyle \sum_{h=0}^1} \, p_h \cdot x^h. \end{align*}

Each term in the series is associated with one possible outcome for the total number of heads. p_0 = (1 - q) is the probability of realizing h=0 heads. p_1 = q is the probability of realizing h=1 heads. We say that \mathrm{G}(x|\text{1 flip}) generates the distribution because we can compute all its moments by evaluating the derivatives of the generating function at x=1. For example, the 0th-order derivative, \left. \mathrm{G}(x|\text{1 flip})\right|_{x=1} = 1, says that the coin never lands on its edge or winks out of existence. And, if we want to compute the expected number of heads, then we can use the 1st-order derivative:

(6)   \begin{align*} \mathrm{E}[h|\text{1 flip}] &= \left. x \cdot \mathrm{G}'(x|\text{1 flip}) \right|_{x=1}  \\ &= \left. x \cdot {\textstyle \sum_{h=0}^1} \, p_h \cdot h \cdot x^{h-1} \right|_{x=1} \\ &= \left. {\textstyle \sum_{h=0}^1} \, p_h \cdot h \cdot x^h \right|_{x=1} \\ &= p_0 \cdot 0 \cdot 1^0 + p_1 \cdot 1 \cdot 1^1  \\ &= q. \end{align*}

The fact that derivatives of the generating function give the moments of the associated distribution will be useful below. Let’s call this Property 1: derivatives are moments.

Here are two additional properties to keep in mind. Property 2: multiple samples. If we raise the generating function for the number of heads in one flip to the nth power, then we get the generating function for the total number of heads in n flips. To illustrate, look at what happens if we square the generating function for the number of heads in one flip:

(7)   \begin{align*} \mathrm{G}(x|\text{1 flip})^2  &=  \left\{ \, p_0 \cdot x^0 + p_1 \cdot x^1 \, \right\} \times \left\{ \, p_0 \cdot x^0 + p_1 \cdot x^1 \, \right\} \\ &= p_0^2 \cdot x^0 + 2 \cdot p_0 \cdot p_1 \cdot x^1 + p_1^2 \cdot x^2 \\ &= (1 - q)^2 \cdot x^0 + 2 \cdot (1 - q) \cdot q \cdot x^1 + q^2 \cdot x^2. \end{align*}

The result is just the generating function for the number of heads in two flips, \mathrm{G}(x|\text{2 flips}).

Property 3: partial information. If we multiply the generating function for the total number of heads in (n-1) flips by x^1, then we have the generating function for the total number of heads in all n flips conditional on having already seen heads on the first flip. To illustrate, notice what happens when we multiply the generating function for the number of heads in one flip by x^1:

(8)   \begin{align*} \mathrm{G}(x|\text{2 flips}, \, \text{heads on 1st flip}) &= x^1 \cdot \mathrm{G}(x|\text{1 flip})  \\ &= p_0 \cdot x^1 + p_1 \cdot x^2 \\ &= (1 - q) \cdot x^1 + q \cdot x^2. \end{align*}

If we’ve already seen heads on the first flip, then there’s no way to realize h=0 heads. The lowest we can go is h=1 heads now. So, the first term is now h=1. And, this outcome occurs if we see tails on the second flip, which happens with probability (1-q).

4. Cascade Lengths

Now, let’s return to our original problem. Let \mathrm{G}_c(x) be the generating function distribution of cascade lengths that we would start with an exogenous shock to stock s‘s fundamentals:

(9)   \begin{equation*} \mathrm{G}_c(x) \overset{\scriptscriptstyle \mathrm{def}}{=} {\textstyle \sum_{c=1}^S} \, q_c \cdot x^c. \end{equation*}

The coefficient q_c gives the probability that a shock to stock s‘s fundamentals would set off a cascade of length C_s = c. If stock s doesn’t have any vulnerable neighbors, then a shock to stock s‘s fundamentals can only affect stock s, c=1. Whereas, if a shock to stock s‘s fundamentals would set off a cascade affecting every other stock in the market, then c=S. Next, let \mathrm{G}_v(x) be the generating function for the number of vulnerable neighbors that stock s has:

(10)   \begin{equation*} \mathrm{G}_v(x) \overset{\scriptscriptstyle \mathrm{def}}{=} {\textstyle \sum_{v=0}^{S-1}} \, p_v \cdot x^v. \end{equation*}

So, the coefficient p_v is the probability that stock s has v vulnerable neighbors.

Notice how these two generating functions are linked. If stock s has v=1 vulnerable neighbor, s', then an initial shock to stock s‘s fundamentals will set off a cascade of length C_s = c if a shock to its one vulnerable neighbor will set off a cascade of length C_{s'} = c - 1 excluding stock s. If stock s has v=2 vulnerable neighbors, s' and s'', then an initial shock to stock s will set off a cascade of length C_s = c if shocks to its two vulnerable neighbors will set off cascades of combined length C_{s'} + C_{s''} = c - 1 excluding stock s. And, if stock s has v=3 vulnerable neighbors, then an initial shock to stock s will set off a cascade of length C_s = c if shocks to its three vulnerable neighbors will set off cascades of combined length C_{s'} + C_{s''} + C_{s'''} = c - 1.

diag-cascade-length-generating-function-20oct2016

We know from the previous section (property 2: multiple samples) that \mathrm{G}_c(x)^v is the generating function for the probability that v different shocks set of cascades of combined length c. And, we also know from the previous section (property 3: partial information) that we have to multiply through by x^1 if we want the generating function for the probability that v different shocks set of cascades of combined length (c-1). So, the generating function for the distribution of cascade lengths has to satisfy the following internal-consistency condition as the number of stocks gets large, S \to \infty:

(11)   \begin{align*} \mathrm{G}_c(x)  &= p_0 \cdot x + p_1 \cdot x \cdot \mathrm{G}_c(x) + p_2 \cdot x \cdot \mathrm{G}_c(x)^2 + p_3 \cdot x \cdot \mathrm{G}_c(x)^3 + \cdots \\ &= x \cdot \left\{ \, p_0 \cdot \mathrm{G}_c(x)^0 + p_1 \cdot \mathrm{G}_c(x)^1 + p_2 \cdot \mathrm{G}_c(x)^2 + p_3 \cdot \mathrm{G}_c(x)^3 + \cdots \, \right\} \\ &= x \cdot \mathrm{G}_v\left(\mathrm{G}_c(x)\right). \end{align*}

The outer function, \mathrm{G}_v(\cdot), gives the probability that the initial stock s has v vulnerable neighbors. The inner function, \mathrm{G}_c(x), gives the probability that shocks to these vulnerable neighbors would set of cascades of combined length c. And, the multiplication by x accounts for the fact that we want to compute the probability that shocks to these vulnerable neighbors would set of cascades of combined length (c-1) not c.

With this equation in hand, we can now compute the expected length of the rebalancing cascade that would follow from an initial shock to randomly selected stock s:

(12)   \begin{align*} \mathrm{E}[C_s] = \left. x \cdot \mathrm{G}_c'(x) \right|_{x=1}  &= 1 + \mathrm{G}_v'(1) \cdot \mathrm{G}_c'(1) \\ &= 1 + \mathrm{G}_v'(1) \cdot \mathrm{E}[C_s]. \end{align*}

Rearranging yields an expression for the expected cascade length:

(13)   \begin{align*} \mathrm{E}[C_s] = \frac{1}{1 - \mathrm{G}_v'(1)}. \end{align*}

And, in the exact same way that \mathrm{G}_c'(1) = \mathrm{E}[C_s] (property 1: derivatives are moments), the expected number of vulnerable neighbors that each stock has is given by \mathrm{G}_v'(1) = \mathrm{E}[V_s]. When stocks typically have less than 1 vulnerable neighbor, \mathrm{E}[V_s] < 1, we have an expression for the average rebalancing-cascade length as a function of the market’s connectivity, \lambda, and vulnerability threshold, \phi.

plot-expected-cascade-length-22oct2016

The figure above plots the average length of the rebalancing-cascade that would emerge if we selected an initial stock s at random and shocked its fundamentals. It’s got a really interesting shape. A little math shows exactly why. We should expect short rebalancing cascades whenever stocks don’t have that many vulnerable neighbors. Here’s the expression for the average number of vulnerable neighbors that each stock has:

(14)   \begin{align*}  \mathrm{E}[V_s] = \mathrm{G}_v'(1)  &= \lambda \times \mathrm{Pr}[N_s \leq N^\star] \\ &= \lambda \times \mathrm{Pr}[N_s < (\lfloor \sfrac{1}{\phi} \rfloor +1)] \\ &=\lambda \times \left\{ \, \sum_{n < (\lfloor \sfrac{1}{\phi} \rfloor+1)} e^{-\lambda} \cdot \frac{\lambda^n}{n!} \, \right\}. \end{align*}

Notice that stocks can have less than 1 vulnerable neighbor on average for either of two reasons. First, they could have very few neighbors—that is, \lambda could be less than 1. Think about this as a fragmented market where very few people trade. This is the region on the bottom of the figure. Second, even if there are lots of people trading, stocks could have fundamentals that aren’t very vulnerable to the effects of rebalancing—that is, \phi is large. This is the region in the upper right of the figure. But, if the market isn’t too fragmented and stocks’ fundamentals are a little vulnerable to the effects of rebalancing, then long rebalancing cascades can emerge. In fact, they can be infinitely long…

5. Infinite Cascades

…but what does that even mean? It’s actually much more reasonable than it first sounds. In a large market, S \to \infty, an infinitely long rebalancing cascade is just a cascade that affects a non-infinitesimal fraction of all stocks. If we now specify that \mathrm{G}_c(x) is the generating function for the distribution of finite-length rebalancing cascades, then we can define \theta as the fraction of all stocks affected by an infinitely long rebalancing cascade,

(15)   \begin{align*} \mathrm{G}_c(1) \overset{\scriptscriptstyle \mathrm{def}}{=} 1 - \theta. \end{align*}

Think back to the coin-flipping example where we said that \left. \mathrm{G}(x|\text{1 flip})\right|_{x=1} = 1 because the coin never landed on its edge or magically winked out of existence. If the coin didn’t obey the laws of physics and disappeared 20{\scriptstyle \%} of the time, then we would have said that \left. \mathrm{G}(x|\text{1 flip})\right|_{x=1} = 0.80. So, if \mathrm{G}_c(x) is the generating function for the distribution of finite-length rebalancing cascades, then realizing an infinitely long cascade is like realizing a magical event that’s not characterized by \mathrm{G}_c(x). And, this way of framing the problem, \mathrm{G}_c(1) = 1 - \theta = \mathrm{G}_v(\mathrm{G}_c(1)), gives us a way to solve for the fraction of the market that’s typically affected by an infinitely long rebalancing cascade, \theta = 1 - e^{\lambda \cdot \theta}.

plot-sharp-phase-transition-look-similar-23oct2016

Finally, notice how sharp the phase transition is. Tiny changes in the market’s connectivity, \lambda, and vulnerability, \phi, can make all the difference between expecting infinitely long rebalancing cascades and expecting 16-stock long rebalancing cascades. Take a look at the figure above. Each panel has S =1000 stocks (the dots) and represents a single realization of trading-strategy rebalancing rules (the lines) in markets where each stock has \lambda = 6.40 neighbors (left) and \lambda = 6.41 neighbors (right) respectively. Both these markets are observationally equivalent. But, as shown in the figure below, when \phi=0.30 an initial shock to stock s‘s fundamentals will yield a huge rebalancing cascade when \lambda = 6.40 (left) but not when \lambda = 6.41 (right). Small changes that push a market over the transition point where \mathrm{E}[V_s] = 1 can have huge effects on the cascade-length distribution. What’s more, right at this transition point where \mathrm{E}[V_s] = 1, the sizes of the rebalancing cascades follow a power-law distribution,

(16)   \begin{align*} \mathrm{Pr}[C_s = c] \sim c^{-\sfrac{3}{2}}, \end{align*}

as shown in Newman et al. (2002). Slight differences in how the market happens to be wired up today can affect whether or not a stock on the other side of the market will be affected by an initial shock to stock s‘s fundamentals.

plot-sharp-phase-transition-are-different-23oct2016

Filed Under: Uncategorized

Intuition Behind the Bayesian LASSO

September 24, 2016 by Alex

1. Motivating Question

Imagine you’ve just seen Apple’s most recent return, r, which is Apple’s long-run expected return, \mu^\star, plus some random noise, \epsilon \overset{\scriptscriptstyle \mathrm{iid}}{\sim} \mathrm{N}(0, \, 1):

(1)   \begin{align*} r &= \mu^\star + \epsilon. \end{align*}

You want to use this realized return, r, to estimate Apple’s long-run expected return, \mu^\star. The LASSO is a popular way to solve this problem. The LASSO estimates Apple’s long-run expected return, \mu^\star, by choosing a \hat{\mu} that’s as close as possible to the realized r while taking into account an absolute-value penalty,

(2)   \begin{align*} \hat{\mu}(r) =  \arg \min_{\mu \in \mathrm{R}}  \left\{  \, {\textstyle \frac{1}{2}} \cdot (r - \mu)^2 + \lambda \cdot  |\mu| \, \right\}, \end{align*}

where \lambda \geq 0 is the strength of this penalty. If you use the LASSO, then you’ll estimate:

(3)   \begin{align*} \hat{\mu}(r) = \begin{cases} \mathrm{Sign}(r) \cdot (|r| - \lambda) &\text{if } |r| > \lambda, \text{ and} \\ 0 &\text{if } |r| \leq \lambda. \end{cases} \end{align*}

Suppose that you chose \lambda = 1.0{\scriptstyle \%}. If Apple’s most recent stock return was r = 0.3{\scriptstyle \%}, then the LASSO will pick \hat{\mu} = 0{\scriptstyle \%}. And, if Apple’s most recent stock return was r = -0.7{\scriptstyle \%}, then the LASSO will still pick \hat{\mu} = 0{\scriptstyle \%}. But, if Apple’s most recent stock return was r = 1.2{\scriptstyle \%}, then the LASSO will give an estimate of \hat{\mu} = 0.2{\scriptstyle \%}.

plot-lasso-coefficient-estimates-24sep2016

The LASSO seems like it’s throwing away lots of information. In the example above, you didn’t adjust your estimate of Apple’s long-run expected return at all when you saw returns of 0.3{\scriptstyle \%} and -0.7{\scriptstyle \%}. So, it’s surprising that, if Apple’s long-run expected return, \mu^\star, was drawn from a Laplace distribution,

(4)   \begin{align*} \mathrm{Pr}( \mu^\star = \mu ) = {\textstyle \frac{\lambda}{2}} \cdot e^{- \lambda \cdot |\mu|}, \end{align*}

then using the LASSO to estimate \mu^\star would be the Bayesian thing to do (Park and Casella, 2008). If \mu^\star \overset{\scriptscriptstyle \mathrm{iid}}{\sim} \mathrm{Laplace}(\lambda = 1.0{\scriptstyle \%}), then it’s correct to just ignore any return smaller than 1.0{\scriptstyle \%} when estimating \mu^\star.

Why is this? If you cross your eyes and squint, you can sort of see why the Laplace distribution might be linked to the LASSO. Both use the Greek-letter \lambda and involve |\mu|. But, lot’s of distributions use the absolute-value operator (e.g., the Wishart distribution). And, there are lots of Greek letters. That’s how letters work. I could just as easily have called the scale parameter in the Laplace distribution \alpha, \beta, or \gamma instead of \lambda. So, what’s special about the Laplace distribution? What is it about the Laplace distribution that makes using the LASSO correct? How can it ever be Bayesian to throw information away?

2. Simpler Problem

To answer these questions, let’s start by looking at a simpler inference problem. Suppose that Apple’s long-run expected return is drawn from a Normal distribution, \mu^\star \overset{\scriptscriptstyle \mathrm{iid}}{\sim} \mathrm{N}(0, \, \sigma_\mu^2):

(5)   \begin{align*} \mathrm{Pr}(\mu^\star = \mu) &= {\textstyle \frac{1}{\sigma_\mu \cdot \sqrt{2 \cdot \pi}}} \cdot e^{- \frac{1}{2 \cdot \sigma_\mu^2} \cdot (\mu - 0)^2}. \end{align*}

If \mu^\star is drawn from a Normal distribution, then you definitely don’t want to use the LASSO.

Bayes’ rule tells you that:

(6)   \begin{align*} \mathrm{Pr}(\mu^\star = \mu|r) &\propto \mathrm{Pr}(r|\mu) \times \mathrm{Pr}(\mu) \\ &=  \left\{ \, {\textstyle \frac{1}{\sqrt{2 \cdot \pi}}} \cdot e^{- \frac{1}{2} \cdot (r - \mu)^2} \, \right\} \times  \left\{ \, {\textstyle \frac{1}{\sigma_\mu \cdot \sqrt{2 \cdot \pi}}} \cdot e^{- \frac{1}{2 \cdot \sigma_\mu^2} \cdot (\mu - 0)^2} \, \right\}. \end{align*}

\mathrm{Pr}(\mu^\star = \mu|r) is the posterior likelihood that Apple’s long-run expected return is \mu given that you’ve just seen a realized return of r. \mathrm{Pr}(r|\mu) is the probability that Apple realizes a return of r if its long-run expected return is \mu. And, \mathrm{Pr}(\mu) is the probability that Apple’s long-run expected return is \mu^\star = \mu in the first place.

You want to choose the \hat{\mu} that maximizes this posterior likelihood \mathrm{Pr}(\mu^\star = \mu|r), or equivalently, that minimizes the negative of the log of this posterior likelihood:

(7)   \begin{align*} \hat{\mu}(r) = \arg \min_{\mu \in \mathrm{R}} \left\{ \, (r - \mu)^2 +  (\sfrac{1}{\sigma_\mu^2}) \cdot (\mu - 0)^2 \, \right\}. \end{align*}

When Apple’s long-run expected return is drawn from a Normal distribution, you want to choose a \hat{\mu} that’s as close as possible to r while taking into account a quadratic penalty not an absolute-value penalty. When \mu^\star is drawn from a Normal distribution, you’re never going to ignore small realized returns.

On one hand, you could pick a \hat{\mu} that’s really close to Apple’s recent return to make (r - \hat{\mu})^2 small. On the other hand, you could pick a \hat{\mu} close to 0 to make (\sfrac{1}{\sigma_\mu^2}) \cdot (\hat{\mu} - 0)^2 small. Your priors determine what you do:

(8)   \begin{align*} \hat{\mu}(r) = \left(  {\textstyle \frac{\sigma_\mu^2}{1.0{\scriptstyle \%}^2 + \sigma_\mu^2}} \right) \cdot  r. \end{align*}

If you don’t have very strong priors about Apple’s long-run expected return (\sigma_\mu \gg 1.0{\scriptstyle \%}), then you’re going to pick \hat{\mu} \approx r since \sfrac{\sigma_\mu^2}{(1.0{\scriptstyle \%}^2 + \sigma_\mu^2)} \approx 1. By contrast, if you have very strong priors (\sigma_\mu \ll 1.0{\scriptstyle \%}), then you’re going to pick \hat{\mu} \approx 0{\scriptstyle \%} since \sfrac{\sigma_\mu^2}{(1.0{\scriptstyle \%}^2 + \sigma_\mu^2)} \approx 0. To illustrate, suppose that you’re really sure that Apple’s long-run expected return is close to 0{\scriptstyle \%} with \sigma_{\mu} = 0.1{\scriptstyle \%}. Then, if you see Apple realize a return of r = 6.0{\scriptstyle \%}, you’re going to think that this realization was probably due to a positive random shock, \epsilon = 5.94{\scriptstyle \%}, and only pick \hat{\mu} = 0.06{\scriptstyle \%}.

3. Mixture Model

Now, let’s tweak the setup slightly. Suppose that, instead of being constant, the standard deviation of Apple’s long-run expected return can be either high or low,

(9)   \begin{align*} \overline{\sigma}_{\mu} \gg \sigma_{\epsilon} = 1.0{\scriptstyle \%} \gg \underline{\sigma}_{\mu}, \end{align*}

with the high value much larger than \sigma_{\epsilon} = 1.0{\scriptstyle \%} and the low value much smaller than \sigma_{\epsilon} = 1.0{\scriptstyle \%}. Each case equally likely: \mathrm{Pr}(\sigma_\mu = \overline{\sigma}_{\mu} ) = \mathrm{Pr}( \sigma_\mu = \underline{\sigma}_{\mu} ) = \sfrac{1}{2}. It turns out that you’re going to behave a lot like someone using the LASSO when you estimate Apple’s long-run expected return in this mixture model.

Regardless of the model, if you want to estimate Apple’s long-run expected return, then you have to use Bayes’ rule. And, just like before, Bayes’ rule tells you that:

(10)   \begin{align*} \mathrm{Pr}(\mu^\star = \mu|r)  \propto  \mathrm{Pr}(r|\mu)  \times  \mathrm{Pr}(\mu). \end{align*}

But, now there’s an extra layer to the problem. The standard deviation of Apple’s long-run expected return can either be high or low,

(11)   \begin{align*} \mathrm{Pr}(\mu)  = {\textstyle \frac{1}{2}} \cdot \mathrm{Pr}(\mu|\sigma_\mu = \overline{\sigma}_\mu) +  {\textstyle \frac{1}{2}} \cdot \mathrm{Pr}(\mu|\sigma_\mu = \underline{\sigma}_\mu). \end{align*}

You don’t know which it is. But, if you knew that \sigma_{\mu} = \overline{\sigma}_{\mu} = 10{\scriptstyle \%} \gg 1.0{\scriptstyle \%}, then you’d pick \hat{\mu} = (\sfrac{100}{101}) \cdot r. Whereas, if you knew that \sigma_{\mu} = \overline{\sigma}_{\mu} = 0.10{\scriptstyle \%} \ll 1.0{\scriptstyle \%}, then you’d pick \hat{\mu} = (\sfrac{1}{101}) \cdot r. Your estimate when \sigma_\mu = \overline{\sigma}_{\mu} is going to really different from your estimate when \sigma_\mu = \underline{\sigma}_{\mu}.

Let’s flesh out what this means. You want to estimate Apple’s long-run expected return, \mu^\star, by choosing the \hat{\mu} that maximizes the posterior likelihood \mathrm{Pr}(\mu^\star = \mu|r),

(12)   \begin{align*} \hat{\mu}(r) =  \arg \max_{\mu \in \mathrm{R}} \left\{  \,  {\textstyle \frac{1}{\sqrt{2 \cdot \pi}}} \cdot e^{- \frac{1}{2} \cdot (r - \mu)^2}  \,  \right\} \times  \left\{  \,  {\textstyle \frac{1}{2}} \cdot {\textstyle \frac{1}{\overline{\sigma}_\mu \cdot \sqrt{2 \cdot \pi}}} \cdot e^{- \frac{1}{2 \cdot \overline{\sigma}_\mu^2} \cdot (\mu - 0)^2} + {\textstyle \frac{1}{2}} \cdot {\textstyle \frac{1}{\underline{\sigma}_\mu \cdot \sqrt{2 \cdot \pi}}} \cdot e^{- \frac{1}{2 \cdot \underline{\sigma}_\mu^2} \cdot (\mu - 0)^2}  \,  \right\}. \end{align*}

It’s hard to solve for \hat{\mu}(r) analytically when \overline{\sigma}_{\mu} and \underline{\sigma}_{\mu} can take on arbitrary values, but the assumption that \overline{\sigma}_{\mu} \gg 1.0{\scriptstyle \%} \gg \underline{\sigma}_{\mu} simplifies things nicely. And, the resulting analysis reveals why you’re going to do something LASSO-esque when learning about Apple’s long-run expected return in this mixture model.

There are 2 cases. First, consider the case where Apple realizes a really big return, |r| \gg 1.0{\scriptstyle \%}. This really big return would be really unlikely if \sigma_\mu = \underline{\sigma}_\mu because \underline{\sigma}_\mu \ll 1.0{\scriptstyle \%} is really small. So, you can safely assume that \sigma_\mu = \overline{\sigma}_{\mu} and just solve the optimization problem from Section 2:

(13)   \begin{align*} \hat{\mu}(r) = \arg \min_{\mu \in \mathrm{R}} \left\{ \, (r - \mu)^2 +  (\sfrac{1}{\overline{\sigma}_\mu^2}) \cdot (\mu - 0)^2 \, \right\}. \end{align*}

But, as we saw in Section 2 that, if your priors are really weak (\overline{\sigma}_\mu \gg 1.0{\scriptstyle \%}), then you should ignore them since \sfrac{\overline{\sigma}_\mu^2}{(1.0{\scriptstyle \%}^2 + \overline{\sigma}_\mu^2)} \approx 1. So, you’re going to set \hat{\mu}(r) \approx r whenever |r| \gg 1.0{\scriptstyle \%}, just like someone using the LASSO.

Now, consider the other case where Apple realizes a really small return, |r| \ll 1.0{\scriptstyle \%}. Again, this really small return would be really unlikely if \sigma_\mu = \overline{\sigma}_\mu because \overline{\sigma}_\mu \gg 1.0{\scriptstyle \%} is really big. So, you can assume that \sigma_\mu = \underline{\sigma}_{\mu} and just solve the optimization problem:

(14)   \begin{align*} \hat{\mu}(r) = \arg \min_{\mu \in \mathrm{R}} \left\{ \, (r - \mu)^2 +  (\sfrac{1}{\underline{\sigma}_\mu^2}) \cdot (\mu - 0)^2 \, \right\}. \end{align*}

But, now the opposite logic holds. If your priors are really strong (\underline{\sigma}_\mu \ll 1.0{\scriptstyle \%}), then you should ignore r since \sfrac{\underline{\sigma}_\mu^2}{(1.0{\scriptstyle \%}^2 + \underline{\sigma}_\mu^2)} \approx 0. So, you’re going to set \hat{\mu}(r) \approx 0 whenever |r| \ll 1.0{\scriptstyle \%}. This is the LASSO’s dead zone!

The figure below shows that, as the high and low standard deviations get more extreme, you’re going to behave more and more like someone using the LASSO when learning about Apple’s long-run expected return in this mixture model. But, the insight is more general than that. You’re going to behave like someone using the LASSO any time a small realized return, r, tells you that you should be using stronger priors about Apple’s long-run expected return, \mu^\star.

plot-bayesian-lasso-intuition-24sep2016

4. Laplace Distribution

If Apple’s long-run expected return is drawn from a Laplace distribution, then you face an estimation problem just like the one in the mixture model above. Andrews and Mallows (1974) shows that a Laplace distribution can be re-written as the weighted average of Normal distributions with different standard deviations,

(15)   \begin{align*} {\textstyle \frac{\lambda}{2}}  \cdot  e^{- \lambda \cdot |\mu|} = \int_0^\infty \, \left\{ \, {\textstyle \frac{1}{\sigma_\mu \cdot \sqrt{2 \cdot \pi}}} \cdot e^{- \frac{1}{2 \cdot \sigma_{\mu}^2} \cdot (\mu - 0)^2} \, \right\} \times \left\{ \, {\textstyle \frac{\lambda^2}{2}} \cdot e^{- \frac{\lambda^2}{2} \cdot \sigma_{\mu}^2} \, \right\} \times \mathrm{d}\sigma_\mu, \end{align*}

where the weights follow an Exponential distribution. The Exponential distribution has a really fat tail. If the standard deviation of Apple’s long-run expected return is distributed \sigma_{\mu} \overset{\scriptscriptstyle \mathrm{iid}}{\sim} \mathrm{Exponential}(\lambda^2), then these standard deviations could be either really large or really small. We just saw that this is exactly what needs to happen for a LASSO-like estimation strategy to be optimal. There are lots of distributions for \sigma_{\mu} that have this property—we just saw another one above. But, if you use \sigma_{\mu} \overset{\scriptscriptstyle \mathrm{iid}}{\sim} \mathrm{Exponential}(\lambda^2), then the probabilities of realizing large and small values of \sigma_\mu line up in such a way that it’s precisely optimal to use the LASSO.

In the original paper, there are a ton of extra hyper-parameters. For example, \sigma_{\epsilon} is a random variable. This clearly isn’t necessary. You just need the standard deviation of Apple’s long-run expected return to fluctuate wildly around \sigma_{\epsilon}. You can get a situation where the LASSO is really close to being optimal with just \overline{\sigma}_{\mu} \gg \sigma_{\epsilon} \gg \underline{\sigma}_{\mu}.

Also, in the original paper, there’s a lengthy discussion about properly “conditioning on \sigma_{\epsilon}.” The authors include this bizarre example of how the posterior distribution of \hat{\mu}(r) might not be unimodal if you don’t condition on \sigma_{\epsilon} that, for me anyways, always seems to come out of left field. And, textbooks typically brush this point under the rug, calling it a technical conditions. But, the analysis above shows that it’s not just a technical condition. It’s actually really important!

To see why, consider estimating Apple’s long-run expected return in a mixture model with

(16)   \begin{align*} \overline{\sigma}_{\mu} = 10{\scriptstyle \%} \gg \sigma_{\epsilon} = \underline{\sigma}_{\mu} = 0.10{\scriptstyle \%}. \end{align*}

The only difference from before is that \sigma_{\epsilon} = 0.10{\scriptstyle \%} instead of \sigma_{\epsilon} = 1.0{\scriptstyle \%}. If \sigma_{\epsilon} isn’t sufficiently large relative to \underline{\sigma}_{\mu}, then you’re never going to ignore the Apple’s realized return when |r| is small. With these new numbers, \hat{\mu}(0.50{\scriptstyle \%}) = \sfrac{0.10{\scriptstyle \%}^2}{(0.10{\scriptstyle \%}^2 + 0.10{\scriptstyle \%}^2)} \cdot 0.50{\scriptstyle \%} = 0.25{\scriptstyle \%} rather than 0.005{\scriptstyle \%}. When choosing a distribution for \sigma_\mu, you’ve got to make sure that the high standard-deviation outcomes are big enough and the low standard-deviation outcomes are small enough relative to \sigma_{\epsilon}. Otherwise, a LASSO-like estimation strategy can’t be optimal.

FYI: Here’s the code to create the figures.

Filed Under: Uncategorized

Inferring Trader Horizons from Trading Volume

July 13, 2016 by Alex

1. Motivating Example

This post shows that, if traders face convex transaction costs (i.e., it costs them more per share to buy 2 shares of stock than to buy 1 share of stock), then it is possible to infer traders’ investment horizons from trading-volume data. To see why, imagine you are a trader and you’ve just learned some positive news about Apple’s next earnings announcement, which is due out overnight. To take advantage of this revelation, you will need to buy shares of Apple stock at some point today. In order to minimize your transaction costs, you will want to space out your demand for Apple shares as much as possible. So, all else equal, the average demand for Apple’s shares will be slightly higher today than it was yesterday because of your earnings revelation. This same logic applies to information at other horizons. Thus, if a larger fraction of the variation in Apple’s trading volume comes from day-to-day differences, then more of Apple’s traders must be operating at the daily horizon. Whereas, if a larger fraction of the variation in Apple’s trading volume comes from week-to-week differences, then more of Apple’s traders must be operating at the weekly horizon.

In the past, when researchers have studied traders’ investment horizons, they have used data on portfolio positions rather than trading volume. Some have measured trading activity at a couple of investment horizons for a small number of stocks. e.g., Brogaard et al. (2014) use NASDAQ data on a randomly selected sample of 60 stocks that assigns each trader a typical investment horizon. Others have measured horizon-specific trading activity for a large number of stocks but only at a single horizon. e.g., Cella et al. (2013) sample the portfolio positions of institutional traders at the quarterly frequency using 13F filings. But, collecting data on traders’ portfolio positions is hard. While this approach works, it tends to restrict the analysis to only a handful of stocks (e.g., 60 randomly selected NASDAQ stocks) or to a single horizon (e.g., the quarterly horizon). Because we can use trading-volume data to infer traders’ investment horizons, we no longer face these data-collection restrictions since trading-volume data is publicly available. Broad cross-sectional studies of traders’ investment horizons are now possible.

2. Traders’ Problem

Let’s begin by outlining the data-generating process and describing the problem faced by traders with an H-period horizon. Suppose that returns are generated by a simple 1-factor model,

(1)   \begin{align*} r_{t+1} &= \phantom{-} \beta \cdot f_t + \epsilon_{t+1}, \end{align*}

where \epsilon_t \overset{\scriptscriptstyle \mathrm{iid}}{\sim} \mathrm{N}(0,\, \sigma_{\epsilon}^2) and the level of the factor evolves according to an \mathrm{AR}(1) model,

(2)   \begin{align*} \Delta f_{t+1} &= - \gamma \cdot f_t + \xi_{t+1}, \end{align*}

with \xi_t \overset{\scriptscriptstyle \mathrm{iid}}{\sim} \mathrm{N}(0,\, \sigma_{\xi}^2). Note that this is the exact same factor structure used in Garleanu and Pedersen (2013). The H-period returns in this model are are:

(3)   \begin{align*} r_{t+H} &= \beta \cdot (1 - \gamma)^{H-1} \cdot f_t + \beta \cdot {\textstyle \sum_{h=1}^{H-1}} (1 - \gamma)^{(H-1) - h} \cdot \xi_{t+h} + \epsilon_{t+H}. \end{align*}

And, conditional on knowing the current level of the factor, f_t, it’s possible to compute the conditional mean and variance of these H-period-ahead returns, r_{t+H} \mid f_t \sim \mathrm{N}(\mu_{H,t}, \, \sigma_H^2):

(4)   \begin{align*} \mu_{H,t} &= \beta \cdot (1 - \gamma)^{H-1} \cdot f_t \\ \text{and} \quad \sigma_H^2 &= \sigma_{\epsilon}^2 + 1_{\{ H \geq 2 \}} \times \left\{ \, \beta^2 \cdot {\textstyle \sum_{h=1}^{H-1}} (1 - \gamma)^{2 \cdot \{(H-1) - h\}} \cdot \sigma_{\xi}^2 \, \right\}. \end{align*}

Traders at time t observe the current level of the factor, f_t, and choose how many shares to buy over the course of the next H periods in order to maximize their mean-variance utility. Let \Delta_H[x_t] = \sum_{h=0}^{H-1} \Delta_1[x_{t-h}] denote the change in a trader’s portfolio position over the period from t to (t+H), and let \gamma denote traders’ risk-aversion parameter. Then, we can write the baseline utility function of a trader with no transaction costs as of time t as:

(5)   \begin{align*} v_t(\Delta_H[x_t]) &= \mu_{H,t} \cdot \Delta_H[x_t] - {\textstyle \frac{\gamma}{2}} \cdot \sigma_H^2 \cdot \Delta_H[x_t]^2. \end{align*}

The t subscript comes from the fact that traders can observe the level of the factor at time t, f_t, prior to making their investment decision. If we look at the trader’s decision at a different time, t, then he will have a different utility and choose a different portfolio because the level of the factor, f_t, will be different.

3. Convex Transaction Costs

The key assumption is that, on top of this baseline utility function, traders also face convex transaction costs. i.e., when a trader with an H-period horizon changes his position over H the course of periods, he pays a transaction cost,

(6)   \begin{align*} \mathit{tc}(\Delta_H[x_t]) &= \min_{\{\Delta_1[x_{t-h}] \}_{h=0}^{H-1}} \, \left\{ \, {\textstyle \frac{\kappa}{2} \cdot \sum_{h=0}^{H-1}} \Delta_1[x_{t-h}]^2 \, \middle| \, \Delta_H[x_t] \, \right\}, \end{align*}

where \kappa > 0 is a positive constant that captures the severity of these transaction costs. Thus, traders with an H-period investment horizon maximize the following objective function:

(7)   \begin{align*} \max_{\Delta_H[x_t]} \, \left\{ \, v_t(\Delta_H[x_t]) - \mathit{tc}(\Delta_H[x_t]) \, \right\}. \end{align*}

They choose the H-period change in portfolio positions that maximizes their mean-variance utility and they implement this change in a way that minimizes their transactions costs.

The convex transaction costs imply that traders smooth out their trading across periods over their H-period horizon. It’s easiest to see why this is the case when H = 2, since \Delta_1[x_{t+1}] = \Delta_2[x_t] - \Delta_1[x_t]. Suppose that traders know optimal final position \Delta_2[x_t]. Then, traders’ optimization problem from Equation (7) becomes:

(8)   \begin{align*} \max_{\Delta_1[x_t]} \left\{ \, \mu_{2,t} \cdot \Delta_2[x_t] - {\textstyle \frac{\gamma}{2}} \cdot \sigma_H^2 \cdot \Delta_2[x_t] - \sfrac{\kappa}{2} \cdot \left\{ \, \Delta_1[x_t]^2 + (\Delta_2 [x_t] - \Delta_1 [x_t])^2 \, \right\} \, \right\}. \end{align*}

Taking the first-order condition with respect to \Delta_1[x_t],

(9)   \begin{align*} 0 &= - \kappa \cdot \left[ \, \Delta_1 [x_t] - (\Delta_2[x_t] - \Delta_1[x_t]) \, \right], \end{align*}

then implies that \Delta_1[x_t] = \sfrac{\Delta_2[x_t]}{2}. This simple exercise verifies that, when there are convex transaction costs, traders will want to split their orders up evenly across their investment horizon. In general, if traders have an H-period horizon, then traders will choose \Delta_1[x_{t+h}] = \sfrac{\Delta_H[x_t]}{H}. Traders with an H-period investment horizon will have trading volume that is characterized by smooth H-period long intervals, like the ones described in the figure below.

smooth-trading-at-different-horizons

4. Fluctuations in Volume

If traders at horizon H are characterized by trading that’s smoothed over H periods, then we should be able to use inference tools like the wavelet-variance estimator to infer traders’ investment horizons. In a nutshell, this estimator computes the fraction of variation in a time series that comes from comparing successive H-period long intervals. See Percival and Walden (2000) for more information.

I run a pair of numerical experiments to show that this intuition is correct (code). First, using a data-generating process with \beta = 0.90, \gamma = 0.75, and \sigma_\epsilon = \sigma_\xi = 1, I simulate a long return time series. From Equation (7), it’s possible to compute the optimal portfolio position of trader with horizon H,

(10)   \begin{align*} x_{H,t} &= \frac{\mu_{H,t}}{\gamma \cdot \sigma_H^2 + \sfrac{\kappa}{H}}. \end{align*}

For 5 different horizons, H^\star \in \{ \, 1, \, 4, \, 16, \, 64, \, 256 \,  \}, I then simulate the trading-volume time series that would occur if all traders operated at horizon H^\star. The figure below shows the fraction of the trading-volume variance occurring at each horizon according to the wavelet-variance estimator. Just as you’d expect, there is a spike the fraction of trading-volume variance at the true horizon, H^\star… whatever that H^\star happens to be. i.e., there is a spike in the dashed green line, which corresponds to the trading-volume data where traders have a horizon of H^\star = 16, precisely at \log_2(H) = 4.

plot--all-trading-at-single-horizon--12jul2016

In addition to this single-horizon experiment, I also run a numerical experiment with traders operating at 2 different horizons. Specifically, using the same baseline parameters, I simulate a trading-volume time series where half of the volume comes from traders operating at the H^\star = 4-period horizon and half of the volume comes from traders operating at the H^\star = 64-period horizon. The figure below shows the fraction of the trading-volume variance occurring at each horizon according to the wavelet-variance estimator. Again, just as you’d expect, there’s is a spike the fraction of trading-volume variance at both the H^\star = 4– and H^\star = 64-period horizons.

plot--trading-at-two-different-horizons--12jul2016

Filed Under: Uncategorized

Investor Holdings, Naïve Beliefs, and Artificial Supply Constraints

June 24, 2016 by Alex

1. Motivation

In the standard model of house-price dynamics, there are two kinds of cities: supply constrained and supply unconstrained. In supply-constrained cities (e.g., New York, Boston, or San Francisco), it’s difficult and costly to build new houses because of geographic and regulatory hurdles. In supply-unconstrained cities (e.g., Las Vegas, Phoenix, or San Bernadino), these hurdles are much lower, and it’s much easier to build new houses. To get a sense of just how unconstrained places like Las Vegas are, take a look at this time-lapse video of Las Vegas from space. The number of houses balloons as housing demand in Las Vegas grows.

Now, suppose more people suddenly want to live in a particular city. If the city is supply constrained like New York, then people will have to outbid existing residents to move into that city, which will drive up the prices on existing houses in that city. More people want the same number of homes, so prices have to go up. By contrast, if the city is supply unconstrained like Las Vegas, then people who want to move into that city can just build new houses. In supply-unconstrained cities, the supply of housing adjusts to accommodate the additional demand. Why outbid an existing resident when you can just build the same house right next door? Thus, in the standard model, supply elasticity is a key determinant of house-price growth (Saiz, 2010).

But, during the mid 2000s things got weird. Supply-unconstrained cities like Las Vegas realized huge spikes in transaction volumes and house prices (Chinco and Mayer, 2016). What changed? Why did Las Vegas suddenly look like a supply-constrained city? Is there some economic mechanism that might make supply-unconstrained cities behave like supply-constrained cities when there is a lot of trading activity? This post outlines how the combination of trading volume by house flippers (i.e., people who buy and then quickly resell houses without living in them) and naïve beliefs can generate artificial supply constraints in housing markets with lots of trading volume.

2. Everyday Example

To see how this mechanism works, it’s helpful to start with an example from everyday life. I love bagels. Imagine you’re at a bagel shop and you want to buy an everything bagel. The shop has lots of different kinds of bagels displayed in bins that contain 20 bagels each. So, at the start of each morning, there’s a bin of 20 plain bagels, a bin of 20 poppyseed bagels, a bin of 20 everything bagels, and so on… There’s a line, and it takes each person several seconds to order. Each time someone orders a bagel, the clerk takes it from one of the bins. Whenever one of the bins runs out, a second clerk takes it to the back of the shop and refills it with 20 more bagels, a task that takes 1 minute to complete.

bagel-store-selection

Without any sort of naïvety, supply and demand work exactly like you’d expect in this setup. If there is only 1 everything bagel left, then you might be willing to pay more than the price listed on the menu for that last bagel. But, if there were lots of bagels left, then you would never do this sort of thing. You’d just wait your turn in line and pay the listed price on the menu when you got to the counter. If a bin happened to run out when you were at the front of the line, then you’d recognize that it’s going to be full in a minute and just wait until the second clerk got back.

Without any sort of naïvety, sales volume doesn’t have any affect on this equilibrium. To be sure, if there are lots of people in line and bagels are selling really quickly, then you’re more likely to find the everything-bagels bin empty when you get to the front of the line. It always takes the second clerk 1 minute to fill an empty bin. So, if there is a big line and more people pass by the front of the line per minute, then bins are more likely to run dry and more people arrive at the register when the second clerk is back in the kitchen. But, if bagel buyers are fully rational, then they’ll realize that each bin will be replenished in a minute and just wait till the fresh bagels come out before buying.

Introducing naïve beliefs changes things. If you don’t recognize that empty bins will be replenished in a minute, then you might be willing to outbid the guy in front of you for the 20th everything bagel in a bin—or, at the very least try to talk him into a different order. And, when you get to the register during a busy time of morning, it’s going to look like the whole bagel shop is running low on supply since each individual bin is more likely to be in the process of being filled. If you could linger in the bagel shop for a while, then this naïvety wouldn’t matter. Any empty bins would get replenished while you were standing around making your decision. But, when there is a line out the door and you have to make a quick decision, your naïve beliefs make it look like there is an artificially low number of bagels available.

3. Simple Model

Investors play the role of the clerk that takes 1 minute to replace a bin of bagels. They take houses off the market for a short period of time. If trading volume is low or home buyers are fully rational, then they shouldn’t affect equilibrium house prices too much. But, if trading volume is very high and home buyers don’t realize that investor homes will come back on the market in 6 months to a year, then home buyers might get the impression that the supply of houses is getting low. I now outline a simple model to make these ideas more concrete.

How many different houses can a home buyer see on the market if he looked for h months? Let \textit{houses}_t denote the total number of houses, \textit{owner}_t denote the number of owner-occupied houses, \textit{investor}_t denote the number of investor-owned houses, and \textit{forSale}_t denote the number of houses that are currently for sale in month t:

(1)   \begin{align*} \textit{houses}_t = \textit{owner}_t + \textit{investor}_t + \textit{forSale}_t. \end{align*}

In any given month, home buyers can only visit houses that are currently for sale. Owner-occupied and investor-owned houses are off the market. A house might be owner occupied one month, for sale the next month, and owned by an investor several months later. I write the probability of transitioning from one state to another in matrix form:

(2)   \begin{align*} \begin{pmatrix} \textit{owner}_{t+1} \\ \textit{investor}_{t+1} \\ \textit{forSale}_{t+1} \end{pmatrix} = \begin{bmatrix} \gamma_{o \to o} & 0 & \gamma_{\textit{fs} \to o} \\  0 & \gamma_{i \to i} & \gamma_{\textit{fs} \to i} \\  \gamma_{o \to \textit{fs}} & \gamma_{i \to \textit{fs}} & \gamma_{\textit{fs} \to \textit{fs}} \end{bmatrix} \begin{pmatrix} \textit{owner}_t \\ \textit{investor}_t \\ \textit{forSale}_t \end{pmatrix}. \end{align*}

Each entry in this matrix represents the probability that a house transitions from one state to another. For example, \gamma_{o \to \textit{fs}} represents the probability a house goes from being owner occupied one month to for sale the next. And, \gamma_{i \to i} represents the probability that a house is investor owned in month (t+1) given that it was investor owned in month t. The columns of this matrix sum to 1 and \gamma_{o \to i} = \gamma_{i \to o} = 0 since a house has to be for sale before it can pass from one owner to the next. The diagram below gives an alternative way of representing these transition probabilities that doesn’t use matrix notation.

state-diagram

The number of houses that are always owner occupied after h months of looking is \gamma_{o \to o}^h \cdot \textit{owner}_t. So, when there aren’t any investors, the number of houses that a home buyer can choose from after looking for h months is \widetilde{\textit{supply}}_h = 1 - \gamma_{o \to o}^h \cdot \textit{owner}_t. The number of houses that are always investor owned after h months is \gamma_{i \to i}^h \cdot \textit{investor}_t. So, when there are investors, the number of houses that a home buyer can choose from after looking for h months is given by:

(3)   \begin{align*} \textit{supply}_h = 1 - \gamma_{o \to o}^h \cdot \textit{owner}_t - \gamma_{i \to i}^h \cdot \textit{investor}_t. \end{align*}

Thus, the supply constraint imposed by investors on the number of houses that a home buyer can view in h months is given by:

(4)   \begin{align*} \textit{constraint}_h = \frac{ \gamma_{i \to i}^h \cdot \textit{investor}_t }{ 1 - \gamma_{o \to o}^h \cdot \textit{owner}_t }. \end{align*}

This term is just the change in the observed housing supply after h months due to the presence of investors, (1 - \textit{constraint}_h) \times \widetilde{\textit{supply}}_h = \textit{supply}_h. If \textit{constraint}_6 = 0.05, then investors decrease the number of houses for sale over the course of 6 months by 5{\scriptstyle \%}. If there would have been 100 houses for sale over the course of 6 months without investors, there are only 95 houses for sale with investors.

4. Plugging in Numbers

This model is nice because it’s easy to plug in numbers to see how investor holdings can affect the perceived housing supply for naïve home buyers. We can go back and forth between holding-period lengths and transition probabilities by using the negative binomial distribution. Investors typically hold onto their houses for 6 months, implying that \gamma_{i \to i} = \sfrac{6}{7}. Owners typically hold onto their houses for 10 years, implying that \gamma_{o \to o} = \sfrac{120}{121}. Owners and investors are equally likely to buy houses, \gamma_{\textit{fs} \to o} = \gamma_{\textit{fs} \to i}. Suppose that the typical house stays on the market for 1 year, implying that \gamma_{\textit{fs} \to \textit{fs}} = \sfrac{12}{13}.

plot--investors-and-supply-constraints--11jun2016

The figure above shows the fraction decrease in the housing supply perceived by naïve home buyers as a function of their search duration when 10{\scriptstyle \%} of the housing stock is for sale in any given month (code). e.g., if a home buyer would have seen 100 houses in 3 months in the absence of investors, then he sees only 91 houses in 3 months when 2{\scriptstyle \%} of houses are initially owned by investors. The dashed green line says that 2{\scriptstyle \%} investor holdings can lead to a 9{\scriptstyle \%} drop in the housing supply as perceived by naïve home buyers. As the number of months that a naïve home buyer searches drops, the impact of investor holdings rises sharply. When search durations are really short, like they were in Las Vegas during the mid 2000s, tiny amounts of investor ownership can have enormous impacts on the perceived housing supply.

Filed Under: Uncategorized

Asset-Pricing Implications of Dimensional Analysis

May 14, 2016 by Alex

1. Motivation

I have been trying to use dimensional analysis to understand asset-pricing problems. In many hard physical problems, it is possible to gain some insight about the functional form of the solution by examining the dimensions of the relevant input variables. In the canonical example of this brand of analysis, G.I. Taylor was able to tell the yield of the Trinity Test nuclear explosion from a few photographs via dimensional analysis (see Barenblatt 2003 and earlier post). So, maybe it is possible to better understand, say, the price impact of informed trading by studying the dimensions of this problem?

However, none of the asset-pricing problems I have looked at via dimensional analysis have yielded pretty solutions. It could be that the fundamental asset-pricing equations aren’t dimensionally consistent. Such equations do exist and they can be very helpful. For instance, Dolbear’s Law says that you can tell the outdoor temperature on a summer evening by counting the frequency of cricket chirps,

(1)   \begin{align*} \text{temperature in degrees celsius} = 37^{\circ} + \{ \, \text{cricket chirps every 15 seconds} \, \}. \end{align*}

But, I don’t think this is what’s going on. Instead, my sense is that, because asset-pricing models are built by researchers trying to convey economic intuition rather than dictated by the physical constraints of a particular real-world problem, there aren’t any interesting unexplored symmetries hiding in the asset-pricing models for dimensional analysis to uncover. A good economist never includes superfluous variables when constructing a model, but there is often unexpected redundancy in our initial formulations of hard physical problems that we find in nature. This post explains my (perhaps wrong) intuition in more detail.

2. Period of Pendulum

Let’s start by looking at a physical problem where dimensional analysis actually helps. Consider the problem of modeling the period of a pendulum with length \ell and mass m. Suppose that in order to get the pendulum swinging, I initially pull it a distance of a centimeters off to the side. In this setup, we can write the period of the pendulum as some function of these variables,

(2)   \begin{align*} p &= \mathrm{f}(\ell,m,a,g), \end{align*}

together with the acceleration due to gravity, g.

The key insight in dimensional analysis is that the pendulum shouldn’t behave differently if we measure its length in inches rather than centimeters. The marks on our ruler don’t matter. The period of the pendulum has dimensions of time, \mathrm{dim}[p] = T. The length of the pendulum and the amplitude of its swing have dimensions of length, \mathrm{dim}[\ell] = \mathrm{dim}[a] = L. The mass of the pendulum has dimensions of (wait for it…) mass, \mathrm{dim}[m] = M. And, the force of gravity has dimensions, \mathrm{dim}[g] = L \cdot T^{-2}. Suppose that we define new units of mass, length, and time so that 1 new unit of mass is equal to \mu old units of mass, 1 \overset{M}{\longrightarrow} \mu, 1 new unit of length is equal to \lambda old units of length, 1 \overset{L}{\longrightarrow} \lambda, and 1 new unit of time is equal to \tau old units of time, 1 \overset{T}{\longrightarrow} \tau. If our choice of units doesn’t affect the pendulum’s behavior, then we should be able to rewrite our old formula in these new units,

(3)   \begin{align*} {\textstyle \frac{p}{\tau}} &= \mathrm{f}\left( \, {\textstyle \frac{\ell}{\lambda}}, \, {\textstyle \frac{m}{\mu}}, \, {\textstyle \frac{a}{\lambda}}, \, {\textstyle \frac{g}{\sfrac{\lambda}{\tau^2}}} \, \right). \end{align*}

Now comes the trick. Notice that these new units can be anything we want. So, let’s get clever and pick \mu = m, \lambda = \ell, and \tau = \sqrt{\sfrac{\ell}{g}}. With these values the formula for the period of the pendulum becomes

(4)   \begin{align*} {\textstyle \frac{p}{\sqrt{\sfrac{\ell}{g}}}} &= \mathrm{f}(1,1,\sfrac{a}{\ell},1) = \mathrm{f}^\star(\sfrac{a}{\ell}), \end{align*}

where \mathrm{f}^\star(\cdot) is a new function of a dimensionless ratio, \sfrac{a}{\ell}. Thus, we know that the period of a pendulum is

(5)   \begin{align*} p &= \sqrt{\sfrac{\ell}{g}} \times \mathrm{f}^{\star}(\sfrac{a}{\ell}). \end{align*}

Without knowing anything except for the units that each variable is being measured in, we can see that 1) the period is unrelated to the mass and 2) the period of the pendulum is inversely proportional to the square-root of the force of gravity, \sqrt{g}. Functional forms without physics! We now know how to compute the period of the same pendulum on Mars.

3. Price Impact

What happens if we try to use these same trick to understand price impact in the stock market? That is, how much does the price of a stock move if traders demand an extra 100 shares on a particular day? Let’s use the standard terminology from information-based asset-pricing models and define price impact as a function of 3 variables,

(6)   \begin{align*} \lambda &= \mathrm{f}(\sigma_v, \sigma_z, \gamma), \end{align*}

where \mathrm{dim}[\lambda] = D \cdot S^{-1} with D denoting dollars and S denoting shares. Suppose that informed traders know the fundamental value of the stock, v, but uninformed traders don’t. Let \sigma_v denote the volatility of the stock’s value from the perspective of uninformed traders, \mathrm{dim}[\sigma_v] = D \cdot S^{-1}, and let \sigma_z denote the volatility of asset-supply noise, \mathrm{dim}[\sigma_z] = S. This is the noise term that keeps the asset’s price from being perfectly revealing. Finally, let \gamma denote the risk aversion of the informed traders, \mathrm{dim}[\gamma] = D^{-1}.

By the logic of dimensional analysis, it shouldn’t matter whether we measure a stock’s value in dollars or euros and it shouldn’t matter whether we measure changes in demand in shares or tens of shares. So, suppose that 1 new unit of value is equal to \delta old units of value, 1 \overset{D}{\longrightarrow} \delta and that 1 new unit of quantity is equal to \psi old units of quantity, 1 \overset{S}{\longrightarrow} \psi. If our choice of units doesn’t affect market behavior, then we should be able to rewrite our old formula for price impact in these new units,

(7)   \begin{align*} \frac{\lambda}{\sfrac{\delta}{\psi}} &= \mathrm{f}\left(\frac{\sigma_v}{\sfrac{\delta}{\psi}}, \frac{\sigma_z}{\psi}, \frac{\gamma}{\sfrac{1}{\delta}}\right), \end{align*}

just like before.

Now comes the trouble. If we get clever and choose our units to create a function of a single dimensionless variables, \psi = \sigma_z and \delta = \sfrac{1}{\gamma}, we find that:

(8)   \begin{align*} \lambda &= {\textstyle \frac{1}{\gamma \cdot \sigma_z}} \cdot \mathrm{f}^\star (\gamma \cdot \sigma_z \cdot \sigma_v). \end{align*}

In the pendulum problem above, the single dimensionless quantity only involved some of the relevant variables; however, in the price-impact problem the dimensionless quantity involves all 3 of the relevant variables. There is no progress. Before applying dimensional analysis we had an unknown function of 3 variables. After applying dimensional analysis we still have an unknown function of 3 variables. Dimensional analysis doesn’t provide any new insight about the functional form of the link between the quantity of interest (i.e., the price impact, \lambda) and any of the input parameters (i.e., the values \sigma_v, \sigma_z, or \gamma). I always seem to find this sort of non-result when applying dimensional analysis to asset-pricing problems.

4. Main Intuition

I think this particular non-result in the price-impact problem is suggestive of why dimensional analysis doesn’t help that much when trying to understand asset-pricing models more generally. What makes the canonical information-based asset-pricing papers great is that they pack a lot of economic intuition into a relatively simple model. There isn’t a lot of superfluous structure hanging around. When you look at the original formulation of the pendulum problem, there was a bunch of redundancy involved. The mass of the pendulum turned out to be irrelevant, and two of the variables, length and amplitude, turned out to have the exact same units. There is no such redundancy in the price-impact problem. As defined, the parameters \sigma_v, \sigma_z, and \gamma are all needed to define the dimensionless quantity. The elegance of models like Kyle (1985) makes them unsuited to dimensional analysis.

To illustrate, consider changing the original price-impact problem slightly. Suppose that we, as econometricians, could directly observe the inverse of the dollar-demand volatility from noise traders, \sfrac{1}{\sigma_y}, which has units of dollars \mathrm{dim}[\sfrac{1}{\sigma_y}] = D^{-1}, instead of the demand volatility, which has units of shares \mathrm{dim}[\sigma_z] = S. This is a less elegant model because it is needlessly complex. Demand volatility in shares now depends on both the equilibrium price and the shares demanded by noise traders. But, let’s go with it. In this new setup, the price impact is still an unknown function of 3 variables,

(9)   \begin{align*} \lambda &= \mathrm{f}(\sigma_v, \sfrac{1}{\sigma_y}, \gamma), \end{align*}

but now, because there is redundancy, we can make progress via dimensional analysis.

Again, suppose that 1 new unit of value is equal to \delta old units of value, 1 \overset{D}{\longrightarrow} \delta and that 1 new unit of quantity is equal to \psi old units of quantity, 1 \overset{S}{\longrightarrow} \psi. If our choice of units doesn’t affect market behavior, then we should be able to rewrite our old formula for price impact in these new units:

(10)   \begin{align*} \frac{\lambda}{\sfrac{\delta}{\psi}} &= \mathrm{f}\left(\frac{\sigma_v}{\sfrac{\delta}{\psi}}, \frac{\sfrac{1}{\sigma_y}}{\sfrac{1}{\delta}}, \frac{\gamma}{\sfrac{1}{\delta}}\right). \end{align*}

If we choose our units to create a function of a single dimensionless variables, \delta = \sigma_y and \psi = \sfrac{\sigma_y}{\sigma_v}, we find that:

(11)   \begin{align*} \lambda &= \sigma_v \cdot \mathrm{f}^\star(\sigma_y \cdot \gamma). \end{align*}

Before we had an unknown function with 3 variables and now we have an unknown function of only 2 variables. Progress! If we found situations where the product of informed traders’ risk aversion and dollar-demand noise was constant, \sigma_y \cdot \gamma = \text{Const.}, then we could actually test this relationship with a regression,

(12)   \begin{align*} \log(\lambda) &= \alpha + \beta \times \log(\sigma_v) + \epsilon, \end{align*}

and check whether or not \beta = 1. But, the only reason that we could make progress in this alternative setting was that the model was written down clumsily in the first place.

Filed Under: Uncategorized

« Previous Page
Next Page »

Pages

  • Publications
  • Working Papers
  • Curriculum Vitae
  • Notebook
  • Courses

Copyright © 2026 · eleven40 Pro Theme on Genesis Framework · WordPress · Log in

 

Loading Comments...
 

You must be logged in to post a comment.