Uncategorized – Page 14 – Research Notebook

Volatility Decomposition of a Typical Firm

May 3, 2013 by Alex

1. Introduction

This post reviews the analysis in Campbell, Lettau, Malkiel, and Xu (2001) who find that firm level volatility has been rising over the period from July 1962 to December 1997. I’ve posted the code I used here. What does this mean? The authors look at the day-to-day variations in the stock returns of all publicly listed firms on the NYSE, Amex, and NASDAQ exchanges in each month, $t$ , during this sample. They then show how to decompose the variance of the daily returns each month for a typical firm (e.g., a firm selected randomly with probability proportional to its market cap) into a market-specific component, an industry-specific component, and a firm specific component. Campbell, Lettau, Malkiel, and Xu (2001) find that this firm-specific variance has been steadily rising as plotted in the figure below.

Annualized variance within each month of daily firm returns relative to the firm’s industry’s value weighted return for the period from July 1962 to December 1997.

I begin by detailing how Campbell, Lettau, Malkiel, and Xu (2001) estimate their market-specific, industry-specific, and firm-specific (i.e., idiosyncratic) volatility components. A natural first approach would be to estimate $I$ industry-level regressions:

(1) $\begin{align*} r_{i,t} &= \beta_{i,m} \cdot r_{m,t} + \tilde{\epsilon}_{i,t} \end{align*}$

and $J$ firm-level regressions:

(2) $\begin{align*} \begin{split} r_{j,t} &= \beta_{j,i} \cdot r_{i,t} + \tilde{\eta}_{j,t} \\ &= \beta_{j,i} \cdot \beta_{i,m} \cdot r_{m,t} + \beta_{j,i} \cdot \tilde{\epsilon}_{i,t} + \tilde{\eta}_{j,t} \end{split} \end{align*}$

where $r_{m,t}$ denotes the value-weighted excess return on the market, $r_{i,t}$ denotes the value-weighted excess return on industry $i$ , and $r_{j,t}$ denotes the excess return on stock $i$ . You could then just insert the realized $\beta_{j,i}$ and $\beta_{i,m}$ terms into expressions for the variation in $r_{i,t}$ and $r_{j,t}$ to get the desired result, no?

(3) $\begin{align*} \begin{split} \mathrm{Var}[r_{i,t}] &= \beta_{i,m}^2 \cdot \mathrm{Var}[r_{m,t}] + \mathrm{Var}[\tilde{\epsilon}_{i,t}] \\ \mathrm{Var}[r_{j,t}] &= \beta_{j,m}^2 \cdot \mathrm{Var}[r_{m,t}] + \beta_{j,i}^2 \cdot \mathrm{Var}[\tilde{\epsilon}_{i,t}] + \mathrm{Var}[\tilde{\eta}_{j,t}] \end{split} \end{align*}$

The problem here is that $\beta_{j,m}$ and $\beta_{j,i}$ are hard to estimate and may well vary over time. Thus, a $\beta$ -independent procedure is necessary. After describing this procedure in Section 2, I then replicate the variance time series used in Campbell, Lettau, Malkiel, and Xu (2001) in Section 3. Finally, in Section 4 I conclude by extending the sample period to December 2012 and discussing the interpretation of the results.

The analysis in Campbell, Lettau, Malkiel, and Xu (2001) ties in closely with numerous other findings in asset pricing, macroeconomics, and behavioral finance. In the empirical asset pricing literature, Ang, Hodrick, Xing, and Zhang (2006) find a puzzling result that firms with high idiosyncratic volatility “have abysmally low average returns.” e.g., stocks with the highest idiosyncratic volatility have $-1.06{\scriptstyle \%/\mathrm{mo}}$ lower excess returns than stocks with the lowest idiosyncratic volatility. In a macroeconomic context, this analysis is directly supports the granular origins theory of Gabaix (2011) which proposes that the key source of aggregate macroeconomic fluctuations is idiosyncratic firm-specific shocks to large firms. Finally, in a behavioral finance setting, the fact that market and industry models explain so little of the variation in firm-level stock returns (somewhere between $20{\scriptstyle \%}$ and $30{\scriptstyle \%}$ ) suggests a new kind of problem for traders: scarce attention. “What this information consumes is rather obvious,” writes Herbert Simon. “It consumes the attention of its recipients. Hence a wealth of information creates a poverty of attention, and a need to allocate that attention efficiently among the overabundance of information sources that might consume it.” As highlighted in Chinco (2012), it takes time to sift through all of the competing information about each and every firm and subset of firms.

2. Statistical Model

In this section I explain how Campbell, Lettau, Malkiel, and Xu (2001) compute the market-wide, industry-level, and idiosyncratic contributions to the variation in a firm’s daily returns. The challenge to doing this is that estimating the firm-specific $\beta$ s empirically is noisy and unreliable. To get around this challenge, Campbell, Lettau, Malkiel, and Xu (2001) decompose the daily return variance of a “typical” firm rather than every firm. e.g., they think about the determinants of the daily return variance of a firm selected at random each month from the market with probability proportional to its relative market capitalization.

To see why looking at a typical firm might be helpful here, define the relative market capitalization of an entire industry, $w_{i,t}$ , and the relative market capitalization of a particular firm, $w_{j,t}$ , in month $t$ as follows:

(4) $\begin{align*} w_{i,t} &= \frac{\sum_{j \in J[i]} \mathrm{MCAP}_{j,t}}{\sum_{j \in J} \mathrm{MCAP}_{j,t}} \qquad \text{and} \qquad w_{j,t} = \frac{\mathrm{MCAP}_{j,t}}{\sum_{j \in J} \mathrm{MCAP}_{j,t}} \end{align*}$

where $\sum_{i \in I} w_{i,t} = 1$ and $\sum_{j \in J} w_{j,t} = 1$ . I use the notation that $j$ denotes a particular firm in the market, and use $J[i] \subset J$ to denote the subset of firms in industry $i$ : $J[i] = \{ j \in J \mid \mathrm{Industry}(j) = i \}$ . The key insight is that the value weighted sums of the industry-level and firm-specific $\beta$ s have to sum to unity:

(5) $\begin{align*} 1 &= \sum_{i \in I} w_{i,t} \cdot \beta_{i,m} \qquad \text{and} \qquad 1 = \sum_{j \in J[i]} \left( \frac{w_{j,t}}{w_{i,t}} \right) \cdot \beta_{j,i} \end{align*}$

This observation follows mechanically from the fact that the market return is just the value-weighted sum of its industry constituents and the industry returns are just the value-weighted sums of their firm constituents:

(6) $\begin{align*} r_{m,t} &= \sum_{i \in I} w_{i,t} \cdot r_{i,t} \qquad \text{and} \qquad r_{i,t} = \sum_{j \in J[i]} \left( \frac{w_{j,t}}{w_{i,t}} \right) \cdot r_{j,t} \end{align*}$

By sampling appropriately, you can eliminate the pesky $\beta$ s from Equation (3) by converting their weighted sums into $1$ s.

How would you go about decomposing the variance of a typical firm in practice? First, consider estimating the market-wide and industry-specific variance components in a $\beta$ -independent fashion. Instead of running the regression in Equation (1), consider computing the difference between each industry’s value weighted return, $r_{i,t}$ , and the value weighted return on the market, $r_{m,t}$ :

(7) $\begin{align*} r_{i,t} &= r_{m,t} + \epsilon_{i,t} \end{align*}$

where $\epsilon_{i,t}$ now lacks the tilde and is connected to $\tilde{\epsilon}_{i,t}$ via the relationship:

(8) $\begin{align*} \epsilon_{i,t} &= \tilde{\epsilon}_{i,t} + (\beta_{i,m} - 1) \cdot r_{m,t} \end{align*}$

Since $\epsilon_{i,t}$ is not a regression residual, it will not be orthogonal to the market return, $\mathrm{Cov}[r_{m,t},\epsilon_{i,t}] \neq 0$ . Thus, when computing the the variance of the value weighted industry return, $r_{i,t}$ , we get:

(9) $\begin{align*} \mathrm{Var}[r_{i,t}] &= \mathrm{Var}[r_{m,t}] + \mathrm{Var}[\epsilon_{i,t}] + 2 \cdot \mathrm{Cov}[r_{m,t},\epsilon_{i,t}] \\ &= \mathrm{Var}[r_{m,t}] + \mathrm{Var}[\epsilon_{i,t}] + 2 \cdot (\beta_{i,m} - 1) \cdot \mathrm{Var}[r_{m,t}] \end{align*}$

Applying the sampling trick described above then allows us to remove the $\beta$ s by averaging over all industries:

(10) $\begin{align*} \sum_{i \in I} w_{i,t} \cdot \mathrm{Var}[r_{i,t}] &= \sum_{i \in I} w_{i,t} \cdot \left\{ (2 \cdot \beta_{i,m} - 1) \cdot \mathrm{Var}[r_{m,t}] + \mathrm{Var}[\epsilon_{i,t}] \right\} \\ &= \mathrm{Var}[r_{m,t}] + \sum_{i \in I} w_{i,t} \cdot \mathrm{Var}[\epsilon_{i,t}] \end{align*}$

This result says that if you select an industry $i \in I$ with probability $\mathrm{Pr}[i] = w_{i,t}$ , then the expected variance of this industry’s daily returns in month $t$ will consist of a market component, $\mathrm{Var}[r_{m,t}]$ , and an industry-specific component, $\sum_{i \in I} w_{i,t} \cdot \mathrm{Var}[\epsilon_{i,t}]$ . When the market component is big:

(11) $\begin{align*} \frac{\mathrm{Var}[r_{m,t}]}{\mathrm{Var}[r_{m,t}] + \sum_{i \in I} w_{i,t} \cdot \mathrm{Var}[\epsilon_{i,t}]} \end{align*}$

then most of the variation in value weighted industry returns tends to come from broad market-wide shocks. Conversely if the industry-specific component is relatively large, then most fo the variation in value weighted industry returns tends to come from different industry-specific shocks which are only felt in their particular corner of the market.

Next consider estimating the firm-specific variance component in a $\beta$ -independent way using the same procedure. Instead of running the regression in Equation (2), I compute the difference between each firm’s excess returns and the value weighted excess returns on its industry:

(12) $\begin{align*} r_{j,t} &= r_{i,t} + \eta_{j,t} \\ \eta_{j,t} &= \tilde{\eta}_{j,t} + (\beta_{j,i} - 1) \cdot r_{i,t} \end{align*}$

Since $\eta_{j,t}$ is not a regression residual, it will no longer be orthogonal to the value weighted industry return, $\mathrm{Cov}[r_{i,t},\eta_{i,t}] \neq 0$ , and thus:

(13) $\begin{align*} \mathrm{Var}[r_{j,t}] &= \mathrm{Var}[r_{i,t}] + \mathrm{Var}[\eta_{j,t}] + 2 \cdot \mathrm{Cov}[r_{i,t},\eta_{i,t}] \\ &= \mathrm{Var}[r_{i,t}] + \mathrm{Var}[\eta_{i,t}] + 2 \cdot (\beta_{j,i} - 1) \cdot \mathrm{Var}[r_{i,t}] \end{align*}$

However, the same sampling trick means that the expression for the value weighted average variance over all stocks within each industry will be $\beta$ -independent:

(14) $\begin{align*} \sum_{j \in J[i]} \left( \frac{w_{j,t}}{w_{i,t}} \right) \cdot \mathrm{Var}[r_{j,t}] &= \sum_{j \in J[i]} \left( \frac{w_{j,t}}{w_{i,t}} \right) \cdot \left\{ (2 \cdot \beta_{j,i} - 1) \cdot \mathrm{Var}[r_{i,t}] + \mathrm{Var}[\eta_{j,t}] \right\} \\ &= \mathrm{Var}[r_{i,t}] + \sum_{j \in J[i]} \left( \frac{w_{j,t}}{w_{i,t}} \right) \cdot \mathrm{Var}[\eta_{j,t}] \end{align*}$

The interpretation of this equation is similar to the interpretation of the market-to-industry decomposition above. Putting both of these pieces together then gives the full decomposition as follows:

(15) $\begin{align*} \sum_{j \in J} w_{j,t} \cdot \mathrm{Var}[r_{j,t}] &= \sum_{i \in I} \mathrm{Var}[r_{i,t}] + \sum_{j \in J} w_{j,t} \cdot \mathrm{Var}[\eta_{j,t}] \\ &= \mathrm{Var}[r_{m,t}] + \sum_{i \in I} w_{i,t} \cdot \mathrm{Var}[\epsilon_{i,t}] + \sum_{j \in J} w_{j,t} \cdot \mathrm{Var}[\eta_{j,t}] \\ &= \sigma_{\mathrm{Mkt},t}^2 + \sigma_{\mathrm{Ind},t}^2 + \sigma_{\mathrm{Firm},t}^2 \end{align*}$

3. Trends in Volatility

I now replicate these $3$ variance measures from Campbell, Lettau, Malkiel, and Xu (2001) using daily and monthly CRSP data on NYSE, AMEX, and NASDAQ stocks over the sample period form July 1962 to December 1997. I restrict the data to include only common stocks with share prices above $\mathdollar 1$ . The riskless rate corresponds to the $30$ day T-Bill rate. First, to compute the empirical analogue of the market variance component in Equation (15), I compute the mean and variance of daily excess returns on the value-weighted market:

(16) $\begin{align*} \hat{\mu}_{\mathrm{Mkt},t} &= \frac{1}{S} \cdot \sum_{s = 1}^S r_{m,t-s} \\ \hat{\sigma}_{\mathrm{Mkt},t}^2 &= \frac{1}{S} \cdot \sum_{s = 1}^S \left( r_{m,t-s} - \hat{\mu}_{\mathrm{Mkt},t} \right)^2 \end{align*}$

where $S$ denotes the number of days in month $t$ . I plot the annualized market component of the variance of daily firm returns in the figure below.

Annualized variance within each month of the daily value weighted market return in excess of the 30 day T-bill rate for the period July 1962 to December 1997.

Next, I compute the industry-specific contribution to the variance of a typical firm’s daily excess returns using the Fama and French (1997) industry classification codes. There are $48$ industries in this classification system, and I code all stocks without a specified industry as their own group leading to $49$ total industries. Clicking on the image below gives a plot of the number of firms in each of these industries.

Click to embiggen. Number of firms in each industry from July 1962 to December 2012.

The industry-specific contribution to the variance in daily excess returns of a typical firm is then given by:

(17) $\begin{align*} \hat{\sigma}_{\mathrm{Ind},t}^2 &= \sum_{i \in I} w_{i,t} \cdot \left\{ \frac{1}{S} \cdot \sum_{s = 1}^S \left( r_{i,t-s} - r_{m,t-s} \right)^2 \right\} \end{align*}$

I plot the resulting time series in the figure below.

Annualized variance within each month of daily value weighted industry returns in excess of the value weighted market return for the period from July 1962 to December 1997.

Finally, I compute the idiosyncratic contribution to the variance of the daily excess returns of a typical firm as follows:

(18) $\begin{align*} \hat{\sigma}_{\mathrm{Firm},t}^2 &= \sum_{j \in J} w_{j,t} \cdot \left\{ \frac{1}{S} \cdot \sum_{s = 1}^S \left( r_{j,t-s} - r_{i,t-s} \right)^2 \right\} \end{align*}$

This is the time series I plotted in the introduction. Empirically, it seems that the overwhelming majority of the daily variation in firm-level excess returns comes from idiosyncratic shocks. One ways to quantify this statement is to look at the “model” fit each month:

(19) $\begin{align*} 1 - \mathrm{Err}_t &= \frac{\hat{\sigma}_{\mathrm{Mkt},t}^2 + \hat{\sigma}_{\mathrm{Ind},t}^2}{\hat{\sigma}_{\mathrm{Mkt},t}^2 + \hat{\sigma}_{\mathrm{Ind},t}^2 + \hat{\sigma}_{\mathrm{Firm},t}^2} \end{align*}$

where the model corresponds to a market model with industry factors. The $(1 - \mathrm{Err}_t)$ term captures the fraction of the daily variation in firm-level excess returns that is explained by model and industry factors in each month and is consistently below $0.30$ . i.e., more than $70{\scriptstyle \%}$ of the daily variation is explained by firm-specific shocks!

Fraction of the daily variation in firm-level excess returns that is explained by value-weighted market and industry factors in each month from July 1962 to December 1997.

4. Discussion

There are a couple of interesting take away facts from this analysis. First, the nature of the $3$ variance time series dramatically changes after December 1997 when the original sample period in Campbell, Lettau, Malkiel, and Xu (2001) ends. Specifically, the post 1997 time period is dominated by a pair of volatility spikes which are not firm-specific: the dot-com boom and the financial crisis. When compared to these events, the slow run up in the firm-specific variance component looks relatively minor.

Top Panel: Annualized variance within each month of the daily value weighted market return in excess of the 30 day T-bill rate for the period July 1962 to December 1997. Middle Panel: Annualized variance within each month of daily value weighted industry returns in excess of the value weighted market return for the period from July 1962 to December 2012. Bottom Panel: Annualized variance within each month of daily firm returns relative to the firm’s industry’s value weighted return for the period from July 1962 to December 2012.

Nevertheless, even with these large macroeconomic shocks, the majority of the variation in daily firm-level excess returns is driven by firm-specific information. e.g., even during this later period the model fit, $(1 - \mathrm{Err}_t)$ , scarcely crosses the $40{\scriptstyle \%}$ threshold in spite of all of the systemic risk in the market! What’s more, there is substantially cyclicality in the model fit at roughly the $5{\scriptstyle \mathrm{yr}}$ horizon. i.e., once every $5{\scriptstyle \mathrm{yr}}$ the predictive power of the value weighted market and industry factors grows and then shrinks by roughly $10{\scriptstyle \%}$ or around $1/2$ -to- $1/3$ of its baseline. One way to interpret this finding is that there should be $5{\scriptstyle \mathrm{yr}}$ year cycles in the profitability of technical analysis using market-wide factors.

Fraction of the daily variation in firm-level excess returns that is explained by value-weighted market and industry factors in each month from July 1962 to December 2012.

Effective Financial Theories

April 26, 2013 by Alex

1. Introduction

One of the most astonishing things about financial markets is that there is interesting economics operating at so many different scales. Yet, no one would ever guess this fact by looking at standard asset pricing theory. To illustrate, take a look at the canonical Euler equation:

(1) $\begin{align*} p_{n,t} &= \mathrm{E}_t \left[ m_{t+1} \cdot \left(p_{n,t+1} + d_{n,t+1}\right) \right] \end{align*}$

Here, $p_{n,t}$ and $d_{n,t}$ denote the ex-dividend price and dividend payout of the $n$ th asset in the economy at time $t$ , $m_{t+1}$ denotes the prevailing stochastic discount factor, and $\mathrm{E}_t(\cdot)$ denotes the conditional expectations operator given time $t$ information. Equation (1) says that the price of the $n$ th asset in the current period, $t$ , is equal to the expected discounted value of the asset’s price and dividend payout in the following period, $(t+1)$ . At first glance this formulation seems perfectly sensible, but a closer look reveals two striking features:

Time is dimensionless. i.e., Equation (1) is written in sequence time not wall clock time. Each period could equally well represent a millisecond, an hour, a year, a millenium, or anything in between. We usually think of the stochastic discount factor, $m_{t+1}$ , as a function of traders’ utility from aggregate consumption. Thus, as Cochrane (2001) points out, if “stocks go up between 12:00 and 1:00, it must be because (on average) we all decided to have a big lunch…. this seems silly.”
The total number of stocks doesn’t show up anywhere in Equation (1). Not only do traders have to know when there is a profitable arbitrage opportunity somewhere out there in the market, they also have to find out exactly where this opportunity is and deploy the necessary funds and expertise to exploit it. Where’s Waldo? puzzles are hard for a reason. Identifying and trading into arbitrage opportunities is a fundamentally different activity when searching through $10000$ rather than $10$ predictors. More is different. This is the key insight highlighted in Chinco (2012).

In this post, I start by writing down a simple statistical model of returns in Section 2 which allows for shocks at different time horizons and across asset groupings of various sizes. Then, in Sections 3 and 4, I show how shocks at vastly different scales are difficult for traders to spot (…let alone act on). Such shocks can look like noise to “distant” traders in a mathematically precise sense. In Section 5, I conclude with a discussion of these observations. The key take away is that financial theories do not necessarily need to be globally applicable to make effective local predictions. e.g., a theory governing the optimal behavior of a high frequency trader may not have any testable predictions at the quarterly investment horizon where institutional investors operate.

2. Statistical Model

I start by writing down a statistical model of returns that allows for shocks at different time scales and across asset groupings of different sizes. e.g., Apple’s stock returns might be simultaneously affected by not only bid-ask bounce at the $100{\scriptstyle \mathrm{ms}}$ investment horizon but also momentum at the $1{\scriptstyle \mathrm{mo}}$ investment horizon. Alternatively, at the $1{\scriptstyle \mathrm{qtr}}$ Apple might realize both an earnings announcement shock as well as a national economic shock felt by all US firms.

Let $\hbar$ denote the smallest investment horizon, so that all other time scales are indexed by an $A_h = 1,2,3,\ldots$ :

(2) $\begin{align*} h &= A_h \cdot \hbar \end{align*}$

For concreteness, you might think about $\hbar = (\mathrm{something}) \times 10^{-3}{\scriptstyle \mathrm{sec}}$ in modern asset markets. Thus, for a monthly investment horizon $A_{\mathrm{month}} = (\mathrm{something}) \times 10^9$ meaning that asset market investment horizons span somewhere between $9$ and $11$ orders of magnitude from high frequency traders to buy and hold value investors. This is a similar ratio to the ratio of the height of human to the diameter of the sun.

Click to Embiggen. Source: Delphix.

Let $r_n(t,h)$ denote the log price change of the $n$ th stock from time $t$ through time $(t + h)$ :

(3) $\begin{align*} r_n(t,h) &= \log p_n(t+h) - \log p_n(t) = \sum_{q=1}^Q \delta_q(t,h) \cdot x_{n,q} + \epsilon_n(t,h) \end{align*}$

where $x_{n,q} \in \{0,1\}$ denotes whether or not stock $n$ has attribute $q$ , $\delta_q(t,h)$ denotes the mean growth rate in the price of all stocks with attribute $q$ from time $t$ through time $(t+h)$ , and $\epsilon_n(t,h)$ denotes idiosyncratic noise in stock $n$ ‘s percent return from time $t$ through time $(t+h)$ . e.g., suppose that the mean growth rate of all technology stocks from January $1$ st, 1999 through the end of January $31$ st, 1999 was $120{\scriptstyle \%/\mathrm{yr}}$ or $10{\scriptstyle \%/\mathrm{mo}}$ . Then, I would write that:

(4) $\begin{align*} \delta_{\mathrm{technology}}(\mathrm{Jan}1999,1{\scriptstyle \mathrm{mo}}) &= 0.10 \end{align*}$

and Intel, Inc would realize a $10{\scriptstyle \%/\mathrm{mo}}$ boost in its January, 1999 returns since:

(5) $\begin{align*} x_{\mathrm{INTL},\mathrm{technology}} = 1 \end{align*}$

The price shocks, $\delta_q(t,h)$ , take on the form:

(6) $\begin{align*} \delta_q(t,h) &= \sum_{a=0}^{A_h-1} \delta_q(t + a \cdot \hbar,\hbar) \quad \text{with} \quad \delta_q(t,\hbar) = \begin{cases} s_q &\text{w/ prob} \quad \frac{1}{2} \cdot \left( 1 - e^{- f_q \cdot \hbar} \right) \\ 0 &\text{w/ prob} \quad e^{- f_q \cdot \hbar} \\ - s_q &\text{w/ prob} \quad \frac{1}{2} \cdot \left( 1 - e^{- f_q \cdot \hbar} \right) \end{cases} \end{align*}$

The summation captures the idea that all shocks occur in a particular instant and then cumulate over time. e.g., there is a particular time interval, $\hbar$ , during which a news release hits the wire or a market order flashes across the screen. Changes over time intervals longer than $\hbar$ reflect the accumulation of changes across these tiny time intervals. The parameters $s_q$ and $f_q$ control the size and frequency of the $q$ th shock. Each attribute’s size parameter has units of percent per $\hbar$ , and the bigger the $s_q$ the bigger the impact of the $q$ th shock on the returns of all stocks with that attribute. Each attribute’s frequency parameter has units of shocks per $\hbar$ , and the bigger the $f_q$ the more often all stocks with attribute $q$ realize a shock of size $s_q$ . The idiosyncratic return noise is the summation of Gaussian shocks at each $\hbar$ interval:

(7) $\begin{align*} \epsilon_n(t,h) &= \sum_{a=0}^{A_h-1} \epsilon_n(t + a \cdot \hbar,\hbar) \quad \text{with} \quad \epsilon_n(t,\hbar) \overset{\scriptscriptstyle \mathrm{iid}}{\sim} \mathrm{N}\left( 0, \sigma_u \cdot \sqrt{\hbar}\right) \end{align*}$

3. Time Series

Very different financial theories can operate at vastly different time scales. e.g., attributes that are relevant at the millisecond time horizon will completely wash out by the monthly horizon and vice versa. In this section, I look at only the time series properties of one stock, so I suppress the $n$ subscript and write Equation (3) as:

(8) $\begin{align*} r(t,h) &= \sum_{a = 0}^{A_h-1} r(t+a \cdot \hbar,\hbar) = \sum_{q=1}^Q \delta_q(t,h) \cdot x_q + \epsilon(t,h) \end{align*}$

To see why, consider the problem of a value investor, Alice, operating at the monthly investment horizon. Suppose that she wants to know whether or not her arch nemesis Bill, a high frequency trader operating at the millisecond investment horizon, is actively trading in her asset. e.g., suppose that she is worried that Bill might have found some really clever new predictor that flits in and out of existence before she can take advantage of it. From Alice’s point of view, the random variable $\delta_q(t,\hbar)$ has the unconditional distribution:

(9) $\begin{align*} \begin{split} \mathrm{E}\left[ \delta_q(t,\hbar) \right] &= 0 \\ \mathrm{E}\left[ \delta_q(t,\hbar)^2 \right] &= \left( 1 - e^{- f_q \cdot \hbar} \right) \cdot s_q^2 = \sigma_q^2 \\ \mathrm{E}\left[ \left| \delta_q(t,\hbar) \right|^3 \right] &= \left( 1 - e^{- f_q \cdot \hbar} \right) \cdot s_q^3 = \rho_q \end{split} \end{align*}$

Let $F_{A_h}(x)$ denote the cumulative distribution function of $\delta_q(t,h)/(\sigma_q \cdot \sqrt{A_h})$ . e.g., $F_{A_h}(x)$ governs the cumulative distribution of the average of the shocks that Bill sees over the length of each period from Alice’s perspective. Then, via the Berry-Esseen theorem we have that at the monthly investment horizon:

(10) $\begin{align*} \left| F_{A_h}(x) - \Phi(x) \right| &\leq \frac{0.7655 \cdot \rho_q}{\sigma_q^3 \cdot \sqrt{A_h}} = \frac{1}{\sqrt{A_h}} \cdot \left( \frac{0.7655}{\sqrt{1 - e^{- f_q \cdot \hbar}}} \right) = (\mathrm{something}) \times 10^{-5} \end{align*}$

Equation (10) says that the maximum vertical distance between the CDF of the monthly mean of the variable fluctuating at the $\hbar$ time scale is identical to the normal distribution to within one part in one-hundred thousand.

Click to embiggen. This image shows the distance between the cumulative distribution functions of the standard normal distribution, $\Phi(x)$ , and the empirical distribution, $F_{A_h}(x)$ , as computed above.

There are a couple of ways to put this figure in perspective. First, note that trading strategies have to generate well above $0.5{\scriptstyle \%}$ abnormal returns per month in order to outpace trading costs. Second, note that Alice would need around $10^{10}{\scriptstyle \mathrm{mo}}$ of data to distinguish between a variable drawn from the standard normal distribution and $F_{A_h}(x)$ at this level of granularity via the Kolmogorov–Smirnov test. Thus, Bill’s behavior at the $\hbar$ investment horizon is effectively noise to Alice when looking only at monthly data. In order to figure out what Bill is doing, she has to stoop down to his investment horizon.

4. Cross Section

In the same way that different financial theories can operate at different time scales, different financial theories can also operate at vastly different levels of aggregation. On one hand, this statement is a bit obvious. After all, modern financial theory is built on the idea of risk minimization through portfolio diversification, and traders talk about strategies being “market neutral”. On the other hand, diversification is not the only force at work. Financial markets have many assets and traders use a vast number of predictors. What’s more, only a few of these predictors are useful at any point in time. As Warren Buffett says, “If you want to shoot rare, fast-moving elephants, you should always carry a loaded gun.” Pulling the trigger is easy. Finding the elephant is hard. Traders face a difficult search problem when trying to parse new shocks.

Suppose that Alice is a value investor specializing in oil and gas stocks and now wants to figure out where her other arch nemesis, Charlie, is trading in her market. Even if she knows that he is trading at roughly her investment horizon, it may still be hard for her to spot his price impact due to the vast number of possible strategies that he could be employing. In this section I study the $1{\scriptstyle \mathrm{mo}}$ returns of $N$ stocks with $Q=7$ attributes:

(11) $\begin{align*} r_n &= \sum_{q=1}^7 \delta_q \cdot x_{n,q} + \epsilon_n \end{align*}$

where I suppress all the time horizon arguments since I am concerned with the cross-section. For simplicity, suppose that Alice knows that Charlie is making a bet on only $1$ of the $7$ attributes so that:

(12) $\begin{align*} 1 &= \Vert {\boldsymbol \delta} \Vert_{\ell_0} = \sum_{q=1}^7 1_{\{\delta_q \neq 0\}} \end{align*}$

where if $\delta_q \neq 0$ , then $\delta_q = s \gg \sigma_\epsilon$ for all $q =1,2,\ldots,7$ . e.g., Alice is worried that Charlie’s spotted the one way of sorting all oil and gas stocks so that all the stocks with that attribute (e.g., operations in the Chilean Andes) have high returns and all of the stocks without the attribute have low returns. How many stocks does Alice have to follow in order for her to spot the sorting rule—i.e., the non-zero entry in $({\boldsymbol \delta})_{7 \times 1}$ ?

It turns out that Alice only needs to examine $3$ stocks so long as she gets to pick exactly which ones:

Stock $1$ : Has attributes $1$ , $3$ , $5$ , $7$
Stock $2$ : Has attributes $2$ , $3$ , $6$ , $7$
Stock $3$ : Has attributes $4$ , $5$ , $6$ , $7$

The fact that Alice can identify the correct attribute even though she has fewer observations than possible attributes, $Q \gg N$ , is known as compressive sensing and was introduced by Candes and Tao (2005) and Donoho (2006). See Terry Tao’s blog post for an excellent introduction. For example, suppose that only the first stock had high returns of $r_1 \approx s$ :

(13) $\begin{align*} \underbrace{\begin{bmatrix} s \\ 0 \\ 0 \end{bmatrix}}_{(\mathbf{r})_{3 \times 1}} &\approx \underbrace{\begin{bmatrix} 1 & 0 & 1 & 0 & 1 & 0 & 1 \\ 0 & 1 & 1 & 0 & 0 & 1 & 1 \\ 0 & 0 & 0 & 1 & 1 & 1 & 1 \end{bmatrix}}_{(\mathbf{X})_{3 \times 7}} \underbrace{\begin{bmatrix} s \\ 0 \\ \vdots \\ 0 \end{bmatrix}}_{({\boldsymbol \delta})_{7 \times 1}} + \underbrace{\begin{bmatrix} \epsilon_1 \\ \epsilon_2 \\ \epsilon_3 \end{bmatrix}}_{({\boldsymbol \epsilon})_{3 \times 1}} \end{align*}$

then Alice can be sure that Charlie has been sorting using the first of the $7$ stock attributes. The interesting part is Alice can’t identify Charlie’s strategy using any less than $N = 3$ stocks since:

(14) $\begin{align*} 7 = 2^3 - 1 \end{align*}$

e.g., $3$ stocks gives Alice just enough combinations to answer $7$ yes or no questions.

What’s more, this result generalizes to the case where the data matrix, $\mathbf{X}$ , is stochastic rather than deterministic. i.e., in real life Alice can’t decide how many oil and gas stocks with each attribute are traded each period in order to make it easiest to decipher Charlie’s trading strategy. Donoho and Tanner (2009) show that in a world where $\mathbf{X}$ is a random matrix with Gaussian entries, $x_{n,q} \overset{\scriptscriptstyle \mathrm{iid}}{\sim} \mathrm{N}(0,1/N)$ , there is a maximum number of predictors, $K^*$ , above which it is impossible for Alice to spot $K > K^*$ relevant attributes from among $Q$ possibilities using only $N$ stocks given by:

(15) $\begin{align*} N &= 2 \cdot K^* \cdot \log(Q/N) \cdot (1 + \mathrm{o}(1)) \end{align*}$

and is summarized in the figure below replicated from Donoho and Stodden (2006). The $x$ -axis runs from $0$ to $1$ and gives values for $N/Q$ summarizing the relative amount of data available to Alice. The $y$ -axis also runs from $0$ to $1$ and gives values for $K/N$ summarizing the level of sparsity in the model. The underlying model is:

(16) $\begin{align*} r_n = \mathbf{x}_n {\boldsymbol \delta} + \epsilon_n \end{align*}$

where $\epsilon_n \overset{\scriptscriptstyle \mathrm{iid}}{\sim} \mathrm{N}(0, 1/20)$ , ${\boldsymbol \delta}$ is zero everywhere except for $K$ entries which are $1$ , and each $x_{n,q} \overset{\scriptscriptstyle \mathrm{iid}}{\sim} \mathrm{N}(0,1/\sqrt{Q})$ with columns normalized to unit length. The forward stepwise regression procedure enters variables into the model in a sequential fashion, according to greatest $t$ -statistic value. The procedure iteratively takes the single regressor with the highest $t$ -statistic until reaching the $\sqrt{2 \cdot \log Q}$ threshold (i.e., the Bonferroni threshold) which is roughly $3.25$ when $Q = 200$ . The $K^*$ threshold given by Donoho and Tanner (2009) then corresponds to the white diagonal line cutting through the phase space above which linear regression procedure fails and below which it succeeds.

Click to embiggen. This figure shows the average prediction error $\Vert {\boldsymbol \delta} - \hat{\boldsymbol \delta} \Vert_{\ell_2}^2/\Vert {\boldsymbol \delta} \Vert_{\ell_2}^2$ from the forward stepwise regression procedure described above.

The interesting part about this result is that this bound on $K^*$ comes from a deep theorem in high-dimensional geometry which relates both compressive sensing and error correcting code as suggested by the deterministic example above. It is not due to any knitty gritty details of Alice’s search problem. Notice how the original bound in the $Q=7$ and $N=3$ example has an information theoretic interpretation! Thus, Charlie can hide behind the sheer number of possible explanations in the cross section in the same way that Bill can hide behind the sheer number of observations in the time series.

5. Discussion

The speed at which traders interact has greatly increased over the past decade. e.g., Spread Networks invested approximately $\mathdollar 300{\scriptstyle \mathrm{mil}}$ in a new fiber optic cable linking New York and Chicago via the straightest possible route saving about $100$ miles and shaving $6{\scriptstyle ms}$ off their delay. Table 5 in Pagnotta and Philippon (2012) documents the many investments in speed made by exchanges around the world. What’s more, trading behavior at this time scale seems to be decoupled from asset fundamentals. i.e., it’s unlikely that a stock’s value truly follows any of the patterns found in one of Nanex’s crop-circle-of-the-day plots. Motivated by events such as the flash crash there has been a great deal of discussion in recent years about the impact of high frequency trading on asset prices and welfare.

However, the rough calculations above suggest that traders with a monthly investment horizon might not even care about second-to-second fluctuations in asset prices. e.g., think of how high and low frequency bands of the same radio wave can carry rock and classical music to your FM radio receiver without interfering with one another. High frequency trading may be revealing nothing about the fundamental value of the companies in the market place, but just because these traders make short-run returns behave strangely doesn’t mean that they will ruin the market for institutional investors trading at a longer horizon. In this light, perhaps the canonical Euler equation needs to have some additional input parameters, $N$ , $Q$ and $h$ :

(17) $\begin{align*} p_n(t) &= \mathrm{E}_t\left[ m_{N,Q}(t,h) \cdot \left\{ p_n(t+h) + d_n(t+h) \right\} \right] \end{align*}$

which define the range over which the theory is effective?

Origins of Macroeconomic Fluctuations

January 10, 2013 by Alex

1. Introduction

Where do macroeconomic fluctuations come from? Does common variation in firms’ output necessarily come from a single source? In this post, I work through a model which suggests that productivity “factors” might be the result of weak interactions between lots of otherwise independent production decisions. I start with a real world example which has little to do with macroeconomics, then develop the model with this concrete example in mind, and finally conclude by relating this work back to the original questions.

Example (Millenium Bridge):
On June 10th 2000 the Millennium footbridge over the River Thames in London opened to thousands of excited visitors. While engineers carefully designed the bridge to support the total mass of people and survive high wind speeds, the bridge nevertheless started vibrating uncontrollably. The source of this vibration was clear. As shown in the video below at the $37$ second mark, all of the people on the bridge locked step. Of course, the visitors didn’t do this on purpose, so how did this happen? How did many random footsteps turn into a coherent left-right-left-right march?

Here is the rough story:

Oscillators: A person’s center of mass which drifts back and forth from right to left as she walks.
Scale: There were so many people that a small group was really likely to lock step by pure chance.
Weak Interaction: Each step makes the bridge wobble slightly and forces others to compensate by shifting their weight.
Positive Feedback: People respond to an unexpected bridge shift by moving their center of mass in the opposite direction.

With that many people on the bridge, it was really likely that a handful of people standing right next to each other would happen to lock step for a few strides by pure chance. When this happened, the entire bridge shifted a little bit from right to left. This effect was at first imperceptible. However, because the bridge moved a little bit to the left, the people around this initial cluster shifted their weight a bit to the right in the following instant to compensate. Thus, this initial imperceptible shift got amplified until the entire bridge was vibrating violently and everyone on the bridge was marching in lock step.

Strogatz (2005) shows how to study this phenomenon more generally as a group of weakly interacting oscillators using the Kuramoto Model. In Section $2$ , I begin by laying out the basic mathematical framework. Then, in Section $3$ , I solve this model up to a constant parameter. In Section $4$ , I then show how to solve for this constant. Finally, in Section $5$ , I conclude by discussing some potential applications of this idea to macrofinance.

2. Framework

How could I model this idea? In this section, I lay out the mathematical framework for the Kuramoto model which captures these $4$ key elements. Consider a world with $N$ oscillators, and let $\Theta_n(t)$ denote the phase of the $n^{th}$ oscillator at time $t$ in units of radians. For instance, when people walk they shift their weight slightly from left to right and then back again. Thus, the side to side movement in a person’s center of mass while they walk can be thought of as a simple oscillator. $\Theta_t(n) = 0$ might then correspond to person $n$ leaning all the way to the left while $\Theta_t(n) = \pi$ might correspond to person $n$ leaning all the way to the right. The time derivative of oscillator $n$ ‘s phase, $d \Theta_n(t)/dt = \theta_n(t)$ , is known as its frequency and has units of radians per second. I use $\theta_n^*$ to denote the natural frequency at which the phase of the $n^{th}$ oscillator changes in units of radians per second. For example, $\theta_n^*$ corresponds to the rate at which person $n$ would shift her weight from side to side in the absence of any other walkers.

I then look only at the following functional form for the interactions of each oscillator:

(1) $\begin{align*} \theta_n(t) &= \theta_n^* + \sum_{n'=1}^N f\left(\Theta_{n'}(t) - \Theta_n(t) \right) \\ &= \theta_n^* + \frac{K}{N} \cdot \sum_{n'=1}^N \sin\left(\Theta_{n'}(t) - \Theta_n(t) \right) \end{align*}$

This formulation means that the frequency of the $n^{th}$ oscillator at time $t$ equals its natural frequency, $\theta_n^*$ , plus a constant $K \geq 0$ which has units of radians per second times the average of the sine of the distances between the phase of oscillator $n$ and the phase of every other oscillator. Thus, if most of the other $(N-1)$ people on the bridge just got done taking their left step when your right foot hits, then you will slow down your walking speed. Conversely, if most of the other $(N-1)$ people on the bridge just picked up their right foot when yours hits the ground, then you will increase your walking speed. The coupling constant, $K$ , then determines exactly how much you are going to change your walking speed in each of these instances. I assume each oscillator’s natural frequency is drawn from a symmetric unimodal distribution, $\theta_n^* \overset{\scriptscriptstyle \mathrm{iid}}{\sim} g(\theta^*)$ , centered around $\mu_{\theta^*}$ :

(2) $\begin{align*} g(\mu_{\theta^*} + \theta^*) &= g(\mu_{\theta^*} -\theta^*) \end{align*}$

In summary, this section proposes a simple model which captures the $4$ key components of the introductory story. The key state variables, $\Theta_n(t)$ , represent the phases of $N$ oscillators. The rates at which each of the oscillators changes its phase are weakly interacting and governed by a coupling constant, $K$ . The sinusoidal functional form implies a positive feedback effect. “Solving” the model means determining if there are values of the coupling constant $K$ such that, for any collection of $N$ oscillators with natural frequencies selected independently and identically from the distribution $g(\theta^*)$ , all of the oscillators will lump together and take on the similar frequencies. i.e., if there are a bunch of people with different baseline walking speeds on a bridge, will they all end up locking step? This problem is hard because the phase of every single oscillator is pulling on the frequency of every other oscillator at the same time. It is not obvious that a coherent lump will emerge.

3. Solution Strategy

Is this model tractable? In this section, I now show how to “solve” this model up to the determination of a threshold value of the coupling constant $K = \underline{K}$ . i.e., I show that if the bridge is sufficiently wobbly, all of the people on the bridge will lock step; whereas, if it is stiffer than this threshold value, no synchronization will occur. I leave the task of solving for this threshold value to the next section.

The trick is look for a change of variables that simplifies the problem. I do this by introducing the order parameter $Z(t)$ :

(3) $\begin{align*} Z(t) &= \frac{1}{N} \cdot \sum_{n=1}^N e^{i \cdot \Theta_n(t)} = R(t) \cdot e^{i \cdot \Psi(t)} \end{align*}$

What is this equation saying? $\Theta_n(t)$ is a scalar quantity on the $[0,\pi]$ interval. In the first line, the expression $e^{i \cdot \Theta_n(t)}$ then maps this value onto the circumference of the unit circle. Thus, the order parameter $Z(t)$ denotes the average of a group of unit vectors starting. i.e., $Z(t)$ captures the average amount that each person on the bridge has shifted their weight. In the second line, $R(t)$ denotes the phase coherence and $\Psi(t)$ denotes the average phase—i.e., $Z(t)$ ‘s polar coordinates. Where does the name “phase coherence” come from? Intuitively, if all of the $N$ unit vectors are pointing in the same direction, then $R(t) = 1$ ; conversely, if each of the $N$ unit vectors is uniformly spaced around the circle, then $R(t) = 0$ . Thus, for instance, at the $37$ second mark in the video, everyone on the bridge is leaning in the same direction so that $R(37)=1$ . $\Psi(t)$ then denotes the direction which people are leaning towards. By embedding the phases of each of the $N$ oscillators on a unit circle, I can characterize their average phase with a pair of order statistics.

Graphical depiction of order statistics $R$ and $\Psi$ summarizing the average phase of the $N$ oscillators.

I now show how to simplify the problem by using these order statistics and the sinusoidal functional form of the link between each pair of oscillator’s phases. First, I multiply both sides of the equation by $\exp \left\{ - i \cdot \Theta_n(t) \right\}$ to give:

(4) $\begin{align*} R(t) \cdot e^{i \cdot (\Psi(t) - \Theta_n(t) )} &= \frac{1}{N} \cdot \sum_{n'=1}^N e^{i \cdot \left\{ \Theta_{n'}(t) - \Theta_n(t) \right\}} \end{align*}$

Then, I apply Euler’s formula to rewrite the expression:

(5) $\begin{align*} R(t) \cdot &\left\{ \cos\left(\Psi(t) - \Theta_n(t) \right) + i \cdot \sin\left(\Psi(t) - \Theta_n(t) \right) \right\} \\ &= \frac{1}{N} \cdot \sum_{n'=1}^N \left\{ \cos\left(\Theta_{n'}(t) - \Theta_n(t)\right) + i \cdot \sin\left(\Theta_{n'}(t) - \Theta_n(t) \right) \right\} \end{align*}$

Equating the imaginary components then yields:

(6) $\begin{align*} R(t) \cdot \sin\left( \Psi(t) - \Theta_n(t) \right) &= \frac{1}{N} \cdot \sum_{n'=1}^N \sin\left(\Theta_{n'}(t) - \Theta_n(t)\right) \end{align*}$

Notice that the left hand side depends only on the $2$ order statistics, $R(t)$ and $\Psi(t)$ , as well as the phase of oscillator $n$ . Importantly, it does not depend on the phase of any of the other $(N-1)$ oscillators. Thus, I can substitute this simplified form into Equation (1) to get:

(7) $\begin{align*} \theta_n(t) &= \theta_n^* + K \cdot R(t) \cdot \sin\left(\Psi(t) - \Theta_n(t)\right) \end{align*}$

This new equation says that rather than depending on the phase of every single other oscillator, the frequency of oscillator $n$ depends on its natural frequency, its current phase, and order statistics characterizing the average phase of all of the other oscillators. In the language of the Millenium bridge example, this equation says that my walking speed is determined by my natural walking speed, my current weight distribution, and the average weight distribution of everyone else on the bridge. I don’t have to worry about whether Bob from Hertfordshire who is standing $100{\scriptstyle \mathrm{ft}}$ in front of me happens to be leaning a bit more to his left. All I care about is the average weight distribution for everyone on the bridge.

I then look for solutions where $R(t)=R^*$ and $\Psi(t)=\mu_{\theta^*} \cdot t$ . By choosing the right reference frame, I can then set $\mu_{\theta^*} = 0$ without loss of generality to get:

(8) $\begin{align*} \theta_n(t) &= \theta_n^* - K \cdot R^* \cdot \sin\left( \Theta_n(t) \right) \end{align*}$

These $2$ assumptions are necessary to solve the model in closed form, but are somewhat at odds with the introductory example. $R(t) = R^*$ constant means that people on the bridge are not transitioning from incoherent steps (start of the video) to synchrony (the $37$ second mark). Rather, their level of phase coherence is constant. $\Psi(t) = \mu_{\theta^*} \cdot t$ means that their average weight distribution is evolving at the mean natural frequency. There will be some fast walkers who shift their weight quickly from left to right; there will be some slow walkers who shift their weight gradually from left to right; but, the average location of everyone’s weight must change according to the average natural walking speed. In this setting, the $N$ oscillators can be split into $2$ groups: locked and drifting. Oscillators in the first group are phase locked at the original frequency $\Psi^* = \mu_{\theta^*} \cdot t$ . Oscillators in the second group, on the other hand, drift around the unit circle in a non-uniform way.

(9) $\begin{align*} \begin{cases} |\theta_n^*| \leq K \cdot R^* &\text{ locked} \\ |\theta_n^*| > K \cdot R^* &\text{ drifting} \end{cases} \end{align*}$

The conditions above show why I refer to the constant $K$ as the model’s coupling strength. As $K$ increases, more and more of the oscillators will become locked and rotate around the circle at the natural frequency $\Psi^* = \mu_{\theta^*} \cdot t$ . i.e., as the bridge becomes more wobbly, more and more people will start to deviate from the natural walking speed and synchronize their steps with the rest of the people on the bridge.

I then look only for solutions where the drifting oscillators form a stationary distribution on the circle. This stationarity condition means that the probability that a fast walker is leaning left or right does not change over time. Let $\rho(\Theta,\theta^*) \cdot d \theta$ denote the fraction of oscillators with the natural frequency $\theta^*$ that lie in the interval $[\Theta,\Theta + d\Theta)$ . Then, stationarity implies that:

(10) $\begin{align*} \rho(\Theta,\theta^*) &= \frac{C(\theta^*)}{\left\vert \theta^* - K \cdot r \cdot \sin(\Theta) \right\vert} \end{align*}$

Click image to view animation. Simulation parameters: $N = 100$ , $K=\pi/4$ , $\mu_{\theta^*} = \pi/4$ , and $\sigma_{\theta^*}=\pi/8$ .

In summary, in this section solves this model up to the determination of the coupling constant $K$ . I first propose a pair of order parameters $R(t)$ and $\Psi(t)$ , and then show that sinusoidal link between the frequencies of each pair of oscillators means that each oscillator’s frequency only depends on these $2$ order statistics. The only free parameter that I have not used yet is the coupling constant, $K$ . I tie my hands and propose that any solution must be such that both order parameters are constants, $R(t) = R^*$ and $\Psi(t) = 0$ , and also that the distribution of frequencies, $\rho(\Theta,\theta^*)$ , must be stationary.

4. Coupling Threshold

In this section, I ask the question: “Does there exist a $K$ which solves the model above subject to these $3$ constraints?” I do this in $3$ steps. First, I solve for the stationary density. Then, I enforce the conditions $R(t) = R^*$ and $\Psi(t) = 0$ . Finally, I look for satisfactory solutions of $K$ .

Step 1: What does $\rho(\Theta,\theta^*)$ look like? $C(\theta^*)$ is pinned down by the fact that all oscillators have to lie somewhere on the circle:

(11) $\begin{align*} 1 &= \int_{-\pi}^{\pi} \rho(\Theta,\theta^*) \cdot d\Theta = \int_{-\pi}^{\pi} \left( \frac{C(\theta^*)}{\left\vert \theta^* - K \cdot R^* \cdot \sin \Theta \right\vert} \right) \cdot d\Theta = \frac{(2 \cdot \pi) \cdot C(\theta^*)}{\sqrt{\left(\theta^*\right)^2 - \left( K \cdot R^*\right)^2}} \end{align*}$

where $\left(\theta^*\right)^2 > \left( K \cdot R^*\right)^2$ by definition for all of the drifting oscillators. This constraint basically says that everyone who is walking on the bridge has to be leaning in some direction. It may be right; it may be left; it may be right down the center; but, it is somewhere. $C(\theta^*)$ is then given by:

(12) $\begin{align*} C(\theta^*) &= \frac{1}{2 \cdot \pi} \cdot \sqrt{\left(\theta^*\right)^2 - \left( K \cdot R^*\right)^2} \end{align*}$

Step 2: What are the consequences of imposing $R(t) = R^*$ and $\Psi(t) = 0$ ? To answer this question, I return to Equation (3):

(13) $\begin{align*} R(t) \cdot e^{i \cdot \Psi(t)} &= \sum_{n=1}^N e^{i \cdot \Theta_n(t)} = \sum_{n \in \mathbf{N}_{\mathrm{Lock}}} e^{i \cdot \Theta_n(t)} + \sum_{n \in \mathbf{N}_{\mathrm{Drift}}} e^{i \cdot \Theta_n(t)} \end{align*}$

Since $\Psi(t) = 0$ , then in order to have $R(t) = R^*$ be constant it must be the case that:

(14) $\begin{align*} R^* &= \sum_{n \in \mathbf{N}_{\mathrm{Lock}}} e^{i \cdot \Theta_n(t)} + \sum_{n \in \mathbf{N}_{\mathrm{Drift}}} e^{i \cdot \Theta_n(t)} \end{align*}$

So, in order to $R^*$ to be constant, either both of these summations have to be constant, or their changes have to exactly offset each other as time marches forward. Let’s examine each of these summations in turn. Looking first at the locked state, I have that for all $n$ such that $|\theta_n^*| \leq K \cdot R^*$ :

(15) $\begin{align*} \sin (\Theta_n(t)) &= \frac{\theta_n^*}{K \cdot R^*} \end{align*}$

Thus for all locked oscillators $\Theta_n(t)$ is an implicit function of $\theta_n^*$ rather than $t$ . Since the distribution of locked phases is symmetric about $0$ by assumption, I have that:

(16) $\begin{align*} \sum_{n \in \mathbf{N}_{\mathrm{Lock}}} e^{i \cdot \Theta_n(\theta^*)} &= \sum_{n \in \mathbf{N}_{\mathrm{Lock}}}\left\{ \cos\left(\Theta_n(\theta^*)\right) + i \cdot \sin\left(\Theta_n(\theta^*)\right) \right\} \\ &= \sum_{n \in \mathbf{N}_{\mathrm{Lock}}}\left\{ \cos\left(\Theta_n(\theta^*)\right) \right\} \\ &= \int_{-K \cdot R^*}^{K \cdot R^*} \cos(\Theta_n(\theta^*)) \cdot g(\theta^*) \cdot d\theta^* \\ &= K \cdot R^* \cdot \int_{-\pi/2}^{\pi/2} \cos(\Theta)^2 \cdot g(K \cdot R^* \cdot \sin(\Theta)) \cdot d\Theta \end{align*}$

Looking at the drifting state, I have that:

(17) $\begin{align*} \sum_{n \in \mathbf{N}_{\mathrm{Drift}}} e^{i \cdot \Theta_n(t)} &= \int_{-\pi}^{\pi} \int_{|\theta^*| > K \cdot R^*} e^{i \cdot \Theta} \cdot \rho(\Theta,\theta^*) \cdot g(\theta^*) \cdot d\theta^* \cdot d\Theta \end{align*}$

However, this integral has to vanish since $g(\theta^*) = g(-\theta^*)$ and $\rho(\Theta + \pi,-\theta^*) = \rho(\Theta,\theta^*)$ . Thus, in order for $R(t) = R^*$ to be a constant, I have to have that:

(18) $\begin{align*} R^* &= K \cdot R^* \cdot \int_{-\pi/2}^{\pi/2} \cos(\Theta)^2 \cdot g(K \cdot R^* \cdot \sin(\Theta)) \cdot d\Theta \end{align*}$

Step 3: So, what values of $K$ solve this equation? An obvious solution is that $R^*=0$ . This solution corresponds to the completely incoherent state with $\rho(\Theta,\theta^*) = 1/2 \cdot \pi$ . A second set of solutions comes from the solution to the equation:

(19) $\begin{align*} 1 &= K \cdot \int_{-\pi/2}^{\pi/2} \cos(\Theta)^2 \cdot g(K \cdot R^* \cdot \sin(\Theta)) \cdot d\Theta \end{align*}$

This equation bifurcates continuously from $R^* = 0$ at a value $K = \underline{K}$ obtained by letting $R^* \searrow 0$ . Thus, there is a critical value, $\underline{K}$ , such that for all $K \geq \underline{K}$ the $N$ drifting oscillators lump together. I can compute $\underline{K}$ as follows:

(20) $\begin{align*} \underline{K} &= \frac{2}{\pi \cdot g(0)} \end{align*}$

In other words, when the bridge is sufficiently stiff there is only one outcome: incoherence. People with randomly selected natural walking speeds will place their steps at random. However, at some critical level of “wobbliness” there suddenly exists a second possibility. As a result of the weak interaction of each person’s leaning from left to right as they walk, people on the bridge might synchronize their steps. An aggregate behavior might emerge from seemingly independent private decisions.

Relationship between phase coherence, $R^*$ , and the coupling constant, $K$ , which contains a bifurcation point at $\underline{K}$ .

5. Conclusion

What does any of this have to do with macroeconomics? Instead of $N$ people leaning left to right on a bridge as they walk, think about $N$ firms making lumpy investment decisions between $2$ complimentary inputs to their production technology. For instance, think about a firm hiring people (labor) and purchasing equipment (capital). Because these are complimentary inputs, firms want to always have them in roughly equal proportions. Thus, they oscillate between increasing their stock of labor and capital.

First, think about a world where the firms’ decisions to invest are completely independent. Suppose that on average $10{\scriptstyle \%}$ of all firms increase either their labor or their capital stock in any given period. If each addition adds $\delta$ to the firm’s output $y_{n,t}$ , then growth in aggregate output, $Y_t$ , is given by:

(21) $\begin{align*} \frac{\Delta Y_{t+1}}{Y_t} &= \frac{1}{Y_t} \cdot \sum_{n=1}^N \Delta y_{n,t+1} = \sum_{n=1}^N \frac{(1 + \delta \cdot \varepsilon_{n,t+1}) \cdot y_{n,t}}{Y_t} \end{align*}$

where $\varepsilon_{n,t+1}$ is an indicator variable if a firm makes an investment in either labor or capital. Thus, the standard deviation of output growth would be:

(22) $\begin{align*} \sigma_{\mathrm{GDP}} &= \left( \mathrm{Var}\left[ \frac{\Delta Y_{t+1}}{Y_t} \right] \right)^{1/2} = \frac{3 \cdot \delta}{10 \cdot \sqrt{N}} \end{align*}$

Thus, if firm volatility is on the order of $0.30 \cdot \delta = 12{\scriptstyle \%}$ per year and there are $10^3$ firms in the economy, the aggregate volatility of the economy should be on the order of $0.012{\scriptstyle \%}$ per year as discussed in Gabaix (2011).

With this benchmark in mind, now think about a different world where each firm’s investment decision’s weakly interact. If I am running a gas station and you are running a convenience store across the street, I will be quicker to add an extra pump to my station in the months after you add a few more aisles to your store. After all, this means that more people will be parked out front of your store, and each of these cars needs gas to run. This weak interaction will exist even if I don’t know about your extra capacity. I don’t need to understand exactly why more drivers are stopping by my station.

In this world, suppose that a fraction $\phi$ of all firms are phase locked to make investments at the same time. Then, the standard deviation of output growth would be:

(23) $\begin{align*} \sigma_{\mathrm{GDP}} &\approx 0.30 \cdot \delta \cdot \sqrt{\phi} \end{align*}$

where the approximation holds for $N$ grows large. The idea is that, as the number of firms grows large, the dominant component of aggregate volatility will come from common investment decisions made by the $\phi \cdot N$ phase locked firms. These decisions will happen in $1$ out of every $10$ periods. Nevertheless, none of these phase locked firms need to be actively trying to coordinate investment decisions in the exact same way that none of the people on Millenium bridge was trying to march in step.

Summary: Trading on Coincidences

December 24, 2012 by Alex

1. Motivating Example

This post gives a non-technical summary of the results in my job market paper, Trading on Coincidences (2012). I start with a simple example. Suppose you see Apple among the $10$ stocks with the highest returns over the past quarter from October to December. Should you bother checking the tech industry more closely for a shock to fundamentals? Well, probably not. After all, there are always going to be $10$ stocks in the list of $10$ stocks with the highest returns. Investigating the tech industry would mean investigating $9$ or $10$ industries every single trading period.¹ What’s more, Apple’s extreme performance might not have anything to do with its industry. Perhaps a California-specific shock was the culprit? So, if you are going to investigate the tech industry, you are going to end up analyzing every single attribute of every single stock in the top $10$ returns. That’s a lot of work!

However, if you see both Apple and Microsoft in the top $10$ returns, this calculus changes. Now, you might actually want to investigate the tech industry. Coincidences like this one—i.e., $2$ or more stocks with the same attribute in the top or bottom $10$ returns—are much less likely to occur by pure chance.² In addition, using $2$ firms allows you to narrow the list of possible explanations. For instance, you can rule out any California-specific explanation since Microsoft is headquartered in Washington. Pushing this logic further, if Apple, Microsoft, and Research in Motion all realize top $10$ returns, then you should definitely look for a deeper explanation. Multi-stock coincidences like this are very rare, and these $3$ companies only share a handful of common attributes.³

2. Outline of Results

First, I start from the assumption that traders face an attention allocation problem. I build a model with $N$ stocks that realize attribute-specific cash flow shocks. Traders can’t sort all $N$ stocks over and over again on each of the $H$ characteristics (e.g., industry, customer, country, accounting firm, etc…) to check for these attribute-specific shocks by brute force.⁴ They must use an attention allocation heuristic. Second, I then propose one such heuristic: trading on coincidences. i.e., traders only update their beliefs about an attribute after observing a coincidence. I characterize the information content of a coincidence in this model and solve for asset prices. Third, I show that if traders use this heuristic, then stock returns will display post-coincidence comovement. i.e., regardless of when the attribute-specific shock to fundamentals occurs, asset prices will only respond after traders observe a coincidence. Fourth, I take this prediction to the data. I find that post-coincidence comovement at the industry level generates an $11{\scriptstyle \%/\mathrm{yr}}$ excess return that is not explained by any of the canonical factors models, other boundedly rational stories, or well known behavioral biases. Fifth and finally, I use computational complexity to give a physical basis for the scarcity of attention.

Roadmap:

Assume traders face an attention allocation problem and build consistent model.

Propose that traders use coincidences to solve this problem.

Show that returns will display post-coincidence comovement as a result.

Give empirical evidence of post-coincidence comovement suggesting that traders:

Use coincidences to direct their attention, and

Face an attention allocation problem with first order asset pricing implications.

Give interpretation of scarce attention using on computational bounds.

3. Detailed Description

I now discuss the nuts and bolts of each of these $5$ steps.

3.1. Model

First, I propose that traders face an attention allocation problem. Here is how I model this problem:

Assets: I consider a discrete time, infinite horizon economy. There are $N$ stocks and each stock has $H$ different attributes. For example, Apple is headquartered in Cupertino, is in the technology industry, is a major customer of Gorilla Glass, etc… Stocks realize persistent, rare, attribute-specific cash flow shocks. For example, if an innovation suddenly makes Gorilla Glass less expensive to manufacture in October, then all tablet making tech firms will realize a positive cash flow shock starting in October and lasting for several months (e.g., October, November, December, etc…). Firms pay out all of their earnings as dividends. Thus, the payout space is a giant $(N \times H)$ -dimensional matrix with all the attributes of all the stocks.

Agents: Traders are risk neutral and have priors about whether or not any particular attribute has realized a cash flow shock. I refer to these priors as a “mental model.” Because there are so many different ways to sort and groups stocks, I assume it is computationally infeasible for traders to check every single attribute-specific cluster of firms.⁵⁶ Instead, they must use some attention allocation heuristic.⁷

3.2. Heuristic

Second, as one way to solve this problem, I show that traders can use coincidences as an attention allocation device.⁸⁹ If agents trade on coincidences, then they only update their mental model about attribute-specific cash flows after they observe $2$ or more stocks with that attribute among the $10$ firms with the highest or lowest past returns. For instance, if the tech industry realizes a positive cash flow shock in October, most traders won’t immediately notice this change. There are too many possible stories explaining market returns for traders to investigate them all. Even though most traders haven’t noticed this industry-specific cash flow shock, all tech stocks will have slightly higher returns in October, November, December, etc… because of higher dividend payouts or because a few constrained specialists have started to incorporate this information into prices. However, most traders will only update their beliefs about the tech industry in December when Apple and Microsoft realize top $10$ returns and attract their attention. I solve for asset prices and give asymptotic expressions for the amount of information contained in coincidences.

3.3. Prediction

Third, I show that if traders use this heuristic, then stock returns will display post-coincidence comovement. For example, suppose again that the tech industry realizes a positive cash flow shock in October. Because there are so many things that could possibly affect stock returns at any one instant, most traders won’t immediately notice this event. The bomb has burst, but the blast wave hasn’t arrived yet. If people trade on coincidence, they will only notice this shock after Apple and Microsoft earn top $10$ returns in December. Thus, the prices of all tech stocks will rise in January after this coincidence—i.e., stock returns will display post-coincidence comovement. Crucially, this prediction holds “pointwise” across all characteristics that traders consider. For instance, suppose that traders look for both industry-specific coincidences and country-specific coincidences. Evidence of post-coincidence comovement at either the industry level or the country level is evidence that people are trading on coincidences.

3.4. Empirics

Fourth, I find that post-coincidence comovement at the industry level generates an $11{\scriptstyle \%/\mathrm{yr}}$ excess return:

Is It Tradable? Suppose that Apple and Microsoft both earn top $10$ returns in December while Ford, GM, and Toyota all realize bottom $10$ returns in December. I show that a trading strategy that is long all tech stocks except for Apple and Microsoft in January and short all auto stocks except for Ford, GM, and Toyota in January generates an $11{\scriptstyle \%/\mathrm{yr}}$ excess return with an annualized Sharpe ratio of $0.6$ . The excess returns to this trading strategy are not explained by industry momentum or other canonical factor models.¹⁰ Post-coincidence comovement is a necessary but not a sufficient explanations for the trading strategy returns. In the theoretical model, traders should update their beliefs about the tech industry in the first nanosecond of January. Thus, there would be no scope for trading profits. Any trading profits are due unmodeled trading frictions or limits to arbitrage.

Is It Measurable? In response to this concern, I also use a panel regression specification. Specifically, I look at the returns to IBM in January after Apple and Microsoft realized top $10$ returns and then also in, say, August after Oracle and Cisco realized bottom $10$ returns. I then compute the average difference between these two numbers across all firms after taking out firm-specific and month-specific fixed effects. Again, I find a spread of $11{\scriptstyle \%/\mathrm{yr}}$ which is on the same order of magnitude as the trading strategy $\alpha$ . This specification yields three additional results. First, I show that the size of this spread is largest after fresh coincidences. For instance, Apple and Microsoft might earn top $10$ returns in December, January, and February. I find that all of the post-coincidence comovement occurs in January in the month immediately after the first coincidence in December. This evidence is consistent with the idea that traders update their mental model after a coincidence attracts their attention. Next, I show that the size of this post-coincidence spread is increasing in the size of the coincidence. For instance, Apple and Microsoft might end up in the top $10$ returns in January by pure chance. However, if you see Apple, Microsoft, Research in Motion, Intel, and Cisco all in the top $10$ , something must have happened to the tech industry. Thus, coincidences involving more firms from an attribute are better signals and should yield a larger price reaction. Finally, I show that the cumulative abnormal returns following a coincidence are persistent out to $12$ months. Perhaps traders see Apple and Microsoft in the top $10$ , think to themselves: “Oh, wow! The tech industry must be doing fantastic.”, and then bid up the price of all tech stocks way too high. If this were the case, then the cumulative abnormal returns following a coincidence should spike and then revert back. This is not what happens.

3.5. Complexity

Finally, I give a physical interpretation for why traders’ attention is so scarce using tools from computational complexity. Suppose that you want to check every attribute-specific cluster of firms for a cash flow shock. I show that with $7000$ stocks, this brute force search strategy would take over $22{\scriptstyle \mathrm{days}}$ to complete at $1{\scriptstyle \mathrm{MIPS}}$ . By contrast, I find that by trading on coincidences people can dramatically reduce their time costs and still uncover a large fraction of all attribute-specific cash flow shocks. For instance, using the same parameters, I find that trading on coincidences requires less than a $1{\scriptstyle \mathrm{min}}$ of processing time. There are good reasons to be uneasy about the absolute level of the $22{\scriptstyle \mathrm{days}}$ estimate. Nevertheless, this estimate does suggest that traders attention allocation problem is non-trivial. After all there are only $21$ trading days in month on average. What’s more, there is an order of magnitude gap between the time cost of following the brute force inference strategy and trading on coincidences.

4. Key Implications

Let me now briefly outline $3$ interesting takeaways from this paper. First, this paper highlights a completely new and empirically relevant layer to traders’ inference problem—i.e., how do traders direct their attention? After all, traders sit in front of $4+$ computer monitors for a reason. Finding the right inference problem to solve is hard. For another example, Warren Buffett justified Berkshire Hathaway’s cash holdings in his 1987 Annual Letter to shareholders by writing: “Our basic principle is that if you want to shoot rare, fast-moving elephants, you should always carry a loaded gun.” The lesson is clear: Pulling the trigger is easy. Finding the elephant is hard.

Second, this paper suggests that the innate pattern recognition skills that make people good doctors, lawyers, and engineers from 9-to-5 can be used to uncover subtle changes in the market. Pattern recognition skills are hardcoded and universal, and coincidences are just one particularly salient pattern. The machinery developed in this paper can be used to analyze traders’ reactions to streaks or regular cycles. For instance, traders might ask themselves: “What are the odds that gold futures prices would rise for $6$ straight months by pure chance? Perhaps I should investigate this contract further?”

Finally, this paper poses the question: “How often should we see extreme price patterns?” Lots of other papers have looked for explanations for particular extreme events. For example, “How could we possibly rationalize the tech boom of the later 1990s?”, or “Why did house prices rise so much in Las Vegas in 2004 but not in Austin or Albuquerque?” By contrast, this paper takes an entirely different approach and asks: “How often should traders see some asset with an extreme price path?” For instance: “How unlikely is it that traders cycled their attention from biotech stocks to junk bonds back to dotcom stocks then to housing and most recently to gold futures over the course of a $30$ year period?”

When there are $50$ industries, you should expect to observe $9.15$ distinct industries when looking at $10$ randomly selected stocks. ↩
When there are $50$ industries, you should expect to see a $2$ -way coincidence in some industry $9$ out of every $10$ periods by pure chance. This is by no means rare, but it is an order of magnitude less frequent than when looking at every single industry represented in the top $10$ returns. ↩
When there are $50$ industries, you should expect to see a $3$ -way coincidence in some industry once every $20$ periods by pure chance. ↩
i.e., traders face an “unsupervised learning problem” (see Hastie, Tibshirani, and Friedman (2009, Ch. 14)). They don’t just have to solve a really hard but well-defined inference problem; rather, they also have to figure out which inference problem to analyze in the first place. Punchline: Searching for the right problem to solve requires cognitive/computational resources. ↩
This difficulty exists regardless of how easy it is to solve the resulting inference problems. e.g., the Sunday New York Times crossword puzzle is hard because it is difficult to search through all the words you know to come up with reasonable solutions to each clue. It is actually really easy to verify that the solution posted on Monday is indeed correct. Punchline: Search is harder than verification. ↩
In Gabaix (2012), traders only pay a cost for thinking about the impact of Peruvian copper discoveries on Apple’s dividend if they actively include this particular factor in their predictive model. By contrast, in the current paper, the initial step of considering the impact of Peruvian copper discoveries on Apple’s future dividends and then deciding not to include this factor in a predictive model comes with a cost. Traders can’t do this preprocessing step for every single obscure factor that might possibly affect Apple’s future dividends. They have to limit their attention to a manageable subset of factors. ↩
Contrary to popular belief, computer chess programs don’t use brute force. This is infeasible. Instead, they mine a huge database of past chess games known as “the book.” This database is big, but no where near as large as the universe of all possible games. Human players actually have the advantage when games “go off book.” For instance, Garry Kasparov famously went off book early in his games against Deep Blue. Punchline: Computers are really fast, but search is really really hard. ↩
This particular heuristic is motivated by anecdotal evidence. Open up any new site and you will find quotes like: “half of the top $10$ performers on the Nikkei 225 this year are domestic-oriented.” —The Wall Street Journal. Japan Commands New Respect. Jun 15, 2004. ↩
The trading on coincidences heuristic that I propose is not optimized in any sense. If a trader wanted to optimize this heuristic, he would surely have to consider engineering details like variation in the speed of computers, the length of the trading period, and the dimensions of the market. However, my goal is not optimality. I simply propose a plausible heuristic, derive a unique empirical prediction, and then ask the data whether or not people actually take this approach. ↩
Why is this not the same as industry momentum? The timing is different. To illustrate, let’s think about this tech stock example and compare the post-coincidence comovement trading strategy to an industry momentum trading strategy that is long/short the industries with the highest/lowest returns over the last $6$ months. Tech stocks realize a positive cash flow shock in October. This shock raises the mean return of all tech stocks slightly, but not enough to push the tech stocks to the top of the industry return rankings. People trading on coincidences only notice this shock in December when Apple and Microsoft earn top $10$ returns. As a result, the price of all tech stocks jumps up in January. An industry momentum trading strategy is then going to be long the tech industry in February at the earliest. Post-coincidence comovement actually triggers inclusion in an industry momentum portfolio. ↩

The Law of Small Numbers

May 9, 2012 by Alex

1. Introduction

The “law of small numbers” is the name given to the well documented empirical regularity that people tend to overinfer from small samples in Tversky and Kahneman (1971). This post discusses a few of the results from Rabin (2002) which applies the law of small numbers to the beliefs of stock market traders. This paper is particularly nice because it captures this behavioral bias and its many interesting implications using only a small tweak to a simple Bayesian learning problem.

This post contains two parts: First, in Section $2$ , I characterize the biased beliefs of a trader who is suffering from the law of small numbers. For brevity, I refer to this traders as Bob in the text below. Then, in Section $3$ , I show how returns in a market populated by Bobs would display excess volatility.

2. The Core Idea

First, I define our hero’s problem. Suppose that Bob watches a sequence of signals $s_t \in \{a,b\}$ for $t = \{1, 2,\ldots\}$ . The signal Bob sees $s_t = a$ each period is an iid draw from a binomial distribution with intensity, $\theta$ :

(1) $\begin{align*} \mathtt{Pr}[s_t=a] &= \theta, \qquad \theta \in [0,1] \end{align*}$

There are a finite number of possible $\theta$ ‘s and Bob doesn’t know which $\theta$ governs the stream of signals he observes. Let $\Theta$ denote the set of all rates that could occur with positive probability, $\pi(\theta) > 0$ , so that $\sum_{\theta \in \Theta}\pi(\theta) = 1$ . Bob’s challenge is to infer which $\theta$ is governing the string of signals he is observing.

Next, I define Bob’s inference strategy in light of his bias due to the law of small numbers. Suppose that he has correct beliefs about the distribution of $\pi$ ‘s and is fully Bayesian; however, he believes that there is some positive integer $N$ such that signals are drawn without replacement from an urn containing $\theta \cdot N$ signals of $s_t = a$ and $(1 - \theta) \cdot N$ signals of $s_t = b$ . Finally, so that the game does not end after $N$ periods, Bob thinks that this urn is refilled every two draws. Thus, while odd and even draws are correlated, pairs of draws are iid.

In order for this inference strategy to be well defined, it has to be the case that Bob believes there is some $\theta \in \Theta$ such that there are at least two $a$ and $b$ signals that can be drawn at each point in time. Thus, there exists $\theta \in \theta$ such that:

(2) $\begin{align*} \min \left\{ \theta \cdot N, (1 - \theta) \cdot N \right\} &\geq 2 \end{align*}$

implying that $N \geq 4$ . Let $\pi_t^N(h_t)$ represents Bob’s posterior beliefs about the probability of each $\theta \in \Theta$ governing his string of signals after a history of signals $h_t = \{s_1, s_2,\ldots,s_t\}$ given that he is a type- $N$ sufferer of the law of small numbers. As a clarifying example, note that $\pi_t^\infty(h_t)$ beliefs represent the beliefs of a fully rational agent. In the text below, I will can this fully rational agent Alice for concreteness.

With his problem and inference strategy in place, I now prove two results characterizing Bob’s beliefs. I first compute Bob’s beliefs immediately after seeing either $a$ or $b$ for a signal $s_t$ on an odd period:

Proposition: For all $N$ , $\pi$ and $\theta$ :

(3) $\begin{align*} \pi_1^N(\theta|s_1=a) &= \frac{\theta \cdot \pi(\theta)}{\sum_{\theta' \in \Theta} \theta' \cdot \pi(\theta')} \\ \pi_1^N(\theta|s_1=b) &= \frac{(1 - \theta) \cdot \pi(\theta)}{\sum_{\theta' \in \Theta} (1 - \theta') \cdot \pi(\theta')} \end{align*}$

so that both $\pi_1^N(s_2 = a|s_1 = a)$ and $\pi_1^N(s_2 = b|s_1=b)$ are increasing in $N$ .

Proof: The expressions for $\pi_1^N(\theta|s_1=a)$ and $\pi_1^N(\theta|s_1=b)$ follow immediately from Bayes’ rule as, for example:

(4) $\begin{align*} \pi_1^N(\theta|s_1=a) &= \frac{\mathtt{Pr}[s_1 = a|\theta] \cdot \mathtt{Pr}[\theta]}{\mathtt{Pr}[s_1 = a]} \\ &= \frac{\theta \cdot \pi(\theta)}{\sum_{\theta' \in \Theta} \theta' \cdot \pi(\theta')} \end{align*}$

The fact that $\pi_1^N(s_2 = a|s_1 = a)$ is increasing in $N$ follows from a Markov clever rewriting:

(5) $\begin{align*} \pi_1^N(s_2=a|s_1=a) &= \sum_{\theta \in \Theta} \pi_1^N(\theta|s_1=a) \cdot \pi_1^N(s_2=a|\theta,s_1=a) \\ &= \sum_{\theta \in \Theta} \pi_1^N(\theta|s_1 = a) \cdot \left( \frac{\theta \cdot N - 1}{N-1} \right) \end{align*}$

$\pi_1^N(s_2=a|\theta,s_1=a) = (\theta \cdot N - 1)/(N-1)$ follows from the fact that Bob believes the signals are drawn from an urn $N$ signals deep without replacement where one signal has already been removed. Since $\pi_1^N(\theta|s_1=a)$ is independent of $N$ and $(\theta \cdot N - 1)/(N-1)$ is increasing in $N$ then $\pi_1^N(s_2=a|s_1=a)$ is increasing in $N$ . The result for $\pi_1^N(s_2 = b|s_1 = b)$ follows from symmetry.

There are two interesting features of this result. First, note that Bob’s beliefs are identical to an agent with proper Bayesian beliefs in the first period. Second, because he believes that the signals are draw from an urn without replacement, Bob underestimates drawing two $a$ ‘s in a row or two $b$ ‘s in a row in a manner that decreases in the size of the urn.

Next, I characterize Bob’s posterior beliefs about two different $\theta$ ‘s given an extreme set of signals:

Proposition: Let $h_t^a$ be a history of $a$ signals and let $h_t^b$ be a history of $b$ signals. For all $t > 1$ and $\theta, \theta' \in \Theta$ such that $\theta > \theta'$ , both $\pi_t^N(\theta|h_t^a)/\pi_t^N(\theta'|h_t^a)$ and $\pi_t^N(\theta'|h_t^b)/\pi_t^N(\theta|h_t^b)$ are strictly decreasing in $N$ .

Proof: For even $t$ , note that:

(6) $\begin{align*} \frac{\pi_t^N(\theta|h_t^a)}{\pi_t^N(\theta'|h_t^a)} &= \left( \frac{\theta \cdot (\theta \cdot N - 1)}{\theta' \cdot (\theta' \cdot N - 1)} \right)^{\frac{t}{2}} \end{align*}$

Thus, this ratio is decreasing if and only if $\theta > \theta'$ . Extending the argument to odd values of $t$ only changes the counting convention and symmetry yields the same result for $\pi_t^N(\theta'|h_t^b)/\pi_t^N(\theta|h_t^b)$ .

This proposition implies that, following an extreme sequence of signals, Bob overinfers that he is facing an extreme rate. Intuitively, if Bob thinks that the signals are drawn from an urn without replacement, then he is too surprised when he sees extreme signals because once a signal of $s_t = a$ has been drawn in an odd period he believes that same signal cannot be drawn again in the following even period.

3. Excess Volatility

I now apply this reasoning to the behavior of returns in a market populated by Bobs. First, I describe the assets. Consider a market with countably infinitely many stocks indexed by $i \in \{1,2,\ldots\}$ . Each month, every stock realizes either a positive or negative return denoted by $r_{i,t} = a$ for positive returns or $r_{i,t} = b$ for negative returns which is drawn iid from a binomial distribution with parameter $\theta_i \in [0,1]$ . Thus, in this market, positive returns for stock $i$ today do not in fact predict positive returns tomorrow or vice versa. Suppose that a fraction $\phi(1/2) = 5/7$ of the stocks have $\theta_i = 1/2$ , a fraction $\phi(0) = 1/7$ of the stocks have $\theta_i = 0$ and the remaining fraction $\phi(1) = 1/7$ of the stocks have $\theta_i = 1$ .

Next, I describe the trading strategy of the Bobs which I index with $j \in \{1,2,\ldots\}$ . Let $z_t^{(j)}$ denote the list of stocks not chosen by Bob $j$ from month $0$ up to but not including month $t$ . Each Bob then adheres to the following trading strategy:

At month $t=0$ , Bob $j$ picks one stock $i$ at random and holds onto one share for the next four months, $t \in \{1,2,3,4\}$ .
Then, in month $t=4$ , Bob $j$ sells this share and picks a new stock at random at random from $z_4^{(j)}$ . He buys a shares and holds onto it for the next four months, $t \in \{5,6,7,8\}$ .
Then, in month $t=8$ , Bob $j$ sells this share and picks a new stock at random at random from $z_8^{(j)}$ . He buys a shares and holds onto it for the next four months, $t \in \{9,10,11,12\}$ .
And so on $\ldots$

Thus, via the law of large numbers, each stock will have the same number of Bobs holding is at each point in time with exactly $1/4$ of the Bobs exchanging the stock for another each period.

I now consider the average beliefs of traders in a market populated by Bobs who suffer from the law of large numbers. First, I compute the probability that Bayesian traders and traders suffering from the law of small numbers believe that stock $i$ ‘s return parameter is $\theta_i = 1$ after observing different strings of returns in the left two columns of the table below. Then, I compute the probability that these two types of traders beliefs that the next return will be $r_{i,t}=a$ given these previous return realizations in the right two columns of the same table.

Rendered by QuickLaTeX.com

Consistent with the second proposition in Section $2$ above, note that the Bobs overestimate the probability that an asset’s returns are generated by the parameter $\theta_i=1$ following a string of positive returns. Next, in the table below, I conclude by computing the average belief about the probability that $r_{i,t}=a$ among both Bayesian traders (i.e., Alices) and traders suffering from the law of small numbers (i.e., Bobs) computed over the four groups of traders who have seen no signals, one signal, two signals and three signals for asset $i$ respectively. Again, this table reveals that for extremely positive return histories, the Bobs overinfer the probability of $\theta_i = 1$ and thus $r_{i,t}=a$ ; however, for more balanced histories the Bobs underestimate the probability that $r_{i,t} = a$ relative to the Bayesian Alices.

Rendered by QuickLaTeX.com

Thus, if all traders were Bobs, they would overreact to strings of positive returns and generate excess volatility.

« Previous Page