Hong, Stein, and Yu (2007)

1. Motivation

It’s absolutely essential that people ignore most contingencies when making predictions in everyday life. Dennett (1984) makes this point quite colorfully by asking: “How is it that I can get myself a midnight snack? I suspect there is some leftover sliced turkey and mayonnaise in the fridge, and bread in the breadbox… and a bottle of beer in the fridge as well… I forthwith put the plan into action and it works! Big deal.” The punchline of the story is that in order to put the plan into action, Dennett actually needs to ignore a great number of hypotheses: “that mayonnaise doesn’t dissolve knives on contact, that a slice of bread is smaller than Mount Everest, and that opening the refrigerator doesn’t cause a nuclear holocaust in the kitchen.” If he didn’t ignore all of these possibilities, he’d never be able to get anything done.

In this note, I work through the asset-pricing model in Hong, Stein, and Yu (2007) which posits that traders use an overly simple model of the world to make predictions about future payouts. The model predicts that there will be sudden shifts in asset prices when traders switch mental models in the same way that there would be a sudden shift in your midnight snacking behavior if you switched mental models and started believing that an open refrigerator door lead to armageddon. Thus, the authors refer to this setup as a model of simple forecasts and paradigm shifts.

2. Asset Structure

There is a single asset which pays out a dividend, $D_t$ , at each point in time $t = 0,1,2\ldots$ . This dividend payout is the sum of $3$ components: component $A$ , component $B$ , and noise. Thus, I can write the dividend payout as:

(1) $\begin{align*} D_t &= A_t + B_t + \sigma_D \cdot \varepsilon_{D,t} \end{align*}$

where $\varepsilon_{D,t} \overset{\scriptscriptstyle \mathrm{iid}}{\sim} \mathrm{N}(0,1)$ . For simplicity, suppose that components $A$ and $B$ both follow $\mathrm{AR}(1)$ processes:

(2) $\begin{align*} A_{t+1} = \rho \cdot A_t + \sigma_A \cdot \varepsilon_{A,t} \qquad \text{and} \qquad B_{t+1} = \rho \cdot B_t + \sigma_B \cdot \varepsilon_{B,t} \end{align*}$

with $\rho \in (0,1)$ and $\varepsilon_{A,t}, \varepsilon_{B,t} \overset{\scriptscriptstyle \mathrm{iid}}{\sim} \mathrm{N}(0,1)$ . Thus, each of these variables has mean $0$ and variance given by:

(3) $\begin{align*} \mathrm{Var}[A_t] = \frac{\sigma_A^2}{1 - \rho^2} \qquad \text{and} \qquad \mathrm{Var}[B_t] = \frac{\sigma_B^2}{1 - \rho^2} \end{align*}$

Crucially, each period $t$ traders can see both $A_t$ and $B_t$ as well as $\varepsilon_{A,t+1}$ and $\varepsilon_{B,t+1}$ . Thus, they know the next period’s realizations of $A_{t+1}$ and $B_{t+1}$ even if they choose not to use this information in their simple model. Define the parameter:

(4) $\begin{align*} \theta &= ( 1 + \delta - \rho )^{-1} \end{align*}$

Then, a fully rational trader—i.e., someone who takes into consideration both $A_t$ and $B_t$ —with risk neutral preferences would price this asset:

(5) $\begin{align*} P_t^R = V_t^R &= \theta \times (A_{t+1} + B_{t+1}) \end{align*}$

This price in this setting is just the discounted present value of the expected future dividend stream.

3. Benchmark Model

Let’s now consider a benchmark model where traders use an overly simplified model, but never update this model. Specifically, assume traders believe that dividends are determined by only component $A$ and noise:

(6) $\begin{align*} D_t &= A_t + \sigma_D \cdot \varepsilon_{D,t} \end{align*}$

i.e., they ignore the fact that $B_t$ actually affects dividends in any way. Let $M_t \in \{A,B\}$ denote the model that traders use to predict dividends. In this benchmark setting, traders’ beliefs on the likelihood that the true model will remain in state $A$ :

(7) $\begin{align*} \mathrm{Pr}\left[ \, M_{t+1} = A \, \middle| \, M_t = A \, \right] &= 1 \end{align*}$

Prices in this world are then given by:

(8) $\begin{align*} P_t^A &= V_t^A = \theta \times A_{t+1} \end{align*}$

They are the discounted present value of the dividends implied by only component $A$ .

This setup makes it easy to compute the dollar returns for the asset:

(9) $\begin{align*} R_t^A &= D_t + P_t^A - (1 + \delta) \cdot P_{t-1}^A \\ &= \theta \cdot \sigma_A \times \varepsilon_{A,t+1} + \left\{ \, B_t + \sigma_D \cdot \varepsilon_{D,t} \, \right\} \end{align*}$

If I define the variable $Z_t^A = B_t + \sigma_D \cdot \varepsilon_{D,t}$ representing the traders’ prediction error, then this formula becomes short and sweet:

(10) $\begin{align*} R_t^A &= \theta \cdot \sigma_A \times \varepsilon_{A,t+1} + Z_t^A \end{align*}$

i.e., the returns to holding this asset are the discounted present value of the future innovations to component $A$ plus the prediction error incurred by using only model $A$ instead of the full model.

Asset returns will appear predictable to a more sophisticated trader who knows that both components $A$ and $B$ affect the asset’s dividends. The auto-covariance of of the dollars returns is given by:

(11) $\begin{align*} \mathrm{Cov}\left[ R_t^A,R_{t-1}^A\right] &= \mathrm{Cov}\left[ B_t , B_{t-1}\right] = \rho \cdot \left( \frac{\sigma_B^2}{1 - \rho^2} \right) \end{align*}$

Thus, there will be more persistence in asset returns traders’ prediction error from not including model $B$ is more persistent—i.e., when $\rho$ is closer to $1$ .

4. Belief Updating

Now, let’s move away from this benchmark model and consider the case where traders might switch between simple models. e.g., they might start out exclusively using component $A$ to predict dividends, but then switch over to exclusively using component $B$ after model $A$ does a really bad job. Note that traders are wrong in both cases; however, switching models can still generate better predictions. e.g., think about switching over to model $B$ when component $B_t$ is really large and component $A_t$ is close to $0$ . Because both $A_t$ and $B_t$ are positively auto-correlated, exclusively using model $B$ will give higher fidelity predictions about the dividend level in the next few periods.

Let $\pi_A$ denote traders’ belief that the true model will remain in state $A$ next period given that it’s in state $A$ now:

(12) $\begin{align*} \mathrm{Pr}\left[ \, M_{t+1} = A \, \middle| \, M_t = A \, \right] &= \pi_A \end{align*}$

Similarly, let $\pi_B$ denote traders’ belief that the true model will remain in state $B$ next period given that it’s in state $B$ now:

(13) $\begin{align*} \mathrm{Pr}\left[ \, M_{t+1} = B \, \middle| \, M_t = B \, \right] &= \pi_B \end{align*}$

This setup means that, for instance, traders believe that the fraction of time the market spends in model $A$ is given by:

(14) $\begin{align*} \frac{1 - \pi_B}{2 - \pi_A - \pi_B} \end{align*}$

For simplicity, I assume a symmetric setting such that $\pi_A = \pi_B = \pi \in (\sfrac{1}{2},1)$ . This rule has to be consistent with the true transition probability of their beliefs in equilibrium; however, it’s important to emphasize that having any beliefs about $\pi$ is in some sense wrong since components $A$ and $B$ always contribute to dividend payouts.

While traders always exclusively use either component $A_t$ or component $B_t$ to predict dividend payouts, somewhere in the dark recesses of their mind they have beliefs about when they should switch mental models. e.g., if you started making a midnight snack, you might not immediately know what to do when your first knife dissolved in the mayonnaise jar, but you wouldn’t ruin several knives in a row this way. Let $f^A(D_t)$ denote traders’ beliefs about the distribution of dividends in period $t$ given that they entered the period using only component $A_t$ to predict dividend payouts:

(15) $\begin{align*} f^A(D_t) &= \frac{1}{\sigma_D} \cdot \phi\left( \frac{D_t - A_t}{\sigma_D} \right) = \frac{1}{\sigma_D} \cdot \phi\left( \frac{1}{\sigma_D} \cdot Z_t^A \right) \end{align*}$

Traders’ Bayesian posterior going into period $(t + 1)$ about whether or not model $A$ is still the correct model is then given by:

(16) $\begin{align*} Q_{t+1} &= \sfrac{1}{2} + (2 \cdot \pi - 1) \cdot (X_{t+1} - \sfrac{1}{2}) \end{align*}$

The parameter $\pi$ is just traders’ priors on the model switching probability. The variable $X_t$ is given by:

(17) $\begin{align*} X_{t+1} &= \frac{Q_t \cdot L_t}{1 - Q_t \cdot (1 - L_t)} \end{align*}$

where $L_t$ denotes the likelihood ratio as:

(18) $\begin{align*} L_t &= \frac{f^A(D_t)}{f^B(D_t)} = \exp\left\{ \, - \, \frac{(Z_t^A)^2 - (Z_t^B)^2}{2 \cdot \sigma_D^2} \, \right\} \end{align*}$

Note that this ratio is always non-negative, and is increasing in the difference $|Z_t^A| - |Z_t^B|$ . i.e., traders tilt their beliefs toward model $A$ after seeing that $|Z_t^A|$ is smaller than $|Z_t^B|$ and vice versa.

5. Model with Learning

From here on out, solving a model where traders learn from their past errors and switch between simplified mental models is quite straight-forward. Without loss of generality, let’s consider the case where traders enter period $t$ using only component $A$ to predict dividends. Then, traders switch models if:

(19) $\begin{align*} M_{t+1} &= \begin{cases} A &\text{if } Q_{t+1} \geq q \\ B &\text{else} \end{cases} \end{align*}$

for $q < \sfrac{1}{2}$ . e.g., if $q = 0.05$ , then traders will continue to make forecasts exclusively with component $A$ until it is rejected at the $5{\scriptstyle \%}$ confidence level. Once this happens, they will switch over to exclusively using component $B$ . The smaller is $q$ , the stronger is the degree of resistance to model change.

In this setup, there are then $2$ different regimes to consider when computing returns: i) no shift ( $\mathit{NS}$ ) and ii) shift ( $S$ ). The returns in the no shift regime are the exact same as before:

(20) $\begin{align*} R_t^{\mathit{NS}} &= Z_t^A + \theta \times \varepsilon_{A,t+1} \end{align*}$

since the traders ignore the possibility of there ever being another component $B$ when using model $A$ . The returns in the shift regime are more complicated:

(21) $\begin{align*} R_t^S &= Z_t^A + \theta \times \varepsilon_{B,t+1} + \rho \cdot \theta \times (B_t - A_t) \end{align*}$

The returns when traders shift from model $A$ to model $B$ differ from the no shift regime because traders purge all current and lagged model $A$ -information from prices and replace it with model $B$ -information.