Digesting the Hansen and Scheinkman Multiplicative Decomposition of the SDF

Introduction¹

I give some intuition behind the multiplicative decomposition of the stochastic discount factor $M_{t \to t+h}$ introduced in Hansen and Scheinkman (2009). The economics underlying the original Hansen and Scheinkman (2009) results was not clear to me during my initial readings. This post collects my efforts to interpret these mathematical ideas in a sensible way.

Below I formally state the decomposition.

Theorem (Hansen and Scheinkman Decomposition): Suppose that $\phi_M$ is a principal eigenfunction with eigenvalue $\lambda_M$ for the extended generator of the stochastic discount factor $M$ . Then this multiplicative functional can be decomposed as:

$\begin{align*} M_{t \to t+h} \ &= \ e^{\lambda_M \cdot h} \cdot \left( \frac{\phi_M(X_t)}{\phi_M(X_{t+h})} \right) \cdot \hat{M}_{t \to t+h} \end{align*}$

where $\hat{M}_{t \to t+h}$ is a local martingale.

The stochastic discount factor $M_{t \to t+h}$ dictates how to discount cashflows occurring $h$ periods in the future in state $X_{t+h}$ . Roughly speaking, Hansen and Scheinkman (2009) factors $M_{t \to t+h}$ into $3$ different pieces: a state independent component $e^{\lambda_M \cdot h}$ , an investment horizon independent component $\phi_M(X_t)/\phi_M(X_{t+h})$ , and a white noise component $\hat{M}_{t \to t+h}$ .

Thus, you should think about $\lambda_M$ as a generalized time preference parameter. $\lambda_M$ will generally be negative, so $e^{\lambda_M \cdot h}$ is the continuous time representation of the state independent discount rate dictated by an asset pricing model. The ratio $\phi_M(X_t) / \phi_M(X_{t+h})$ captures the rate at which I discount payments at time $t+h$ given the state today at time $t$ and the state at time $t+h$ . This ratio is independent of $h$ meaning that if $X_{t+h} = X_{t+h'}$ , then for any $h$ and $h'$ we have:

$\begin{align*} \frac{\phi_M(X_t)}{\phi_M(X_{t+h})} \ &= \ \frac{\phi_M(X_t)}{\phi_M(X_{t+h'})} \end{align*}$

Finally, $\hat{M}_{t+h}$ represents a random noise component with $\mathbb{E}\hat{M}_{t+h} = 1$ and independent increments.

Motivation

The Hansen and Scheinkman decomposition generalizes the binomial options pricing framework for use in standard asset pricing applications by allowing for more complicated state space features like jumps and time averaging.² The main advantages of casting the stochastic discount factor as a multiplicative functional are $a)$ the use of the binomial pricing intuition to understand more complicated asset pricing models and $b)$ the streamlining of the econometrics needed to compare excess returns at different horizons.³

To illustrate the basic intuition behind this analogy, I work through the Black, Derman and Toy (1990) model.

Example (Binomial Model): Consider a discrete time, binomial world with states $X_t \in \{d,u\}, \ \forall t \geq 0$ in which traders have an independent probability $\pi(x)$ of entering state $x$ in the next period regardless of the current state. In this world, the price $P_{t \to t+1}$ at time $t$ of a risk free bond that pays out $1 at time $t+1$ is given by the expression:

$\begin{align*} P_{t \to t+1} \ &= \ \frac{\pi(u) \cdot 1 + \pi(d) \cdot 1}{1 + r^f_{t+1}} \end{align*}$

This $1$ step ahead pricing rule applies at each and every starting date $t$ . All pricing computations at longer horizons are built up from this local relationship based on the prevailing short rate $r_{t+1}^f$ .

To solve the model, I need to assume that the short rate $r_{t+1}^f$ process has independent log-normal increments. I could then use the volatility of this process to pin down the values of the short rate for the entire binomial tree.

In general, models of this sort are easy to solve analytically if the short rate process has log-normal increments. The recent papers Lettau and Wachter (2007), Van Binsbergen, Brandt and Koijen (2010) and Backus, Chernov and Zin (2011) adopt similar approaches and try to extend these insights to equity markets.

Nevertheless, most asset pricing models are not log-normal and will not suffer pen and paper analysis of their term structure using existing methods. Thus, in order to use cross-horizon predictions to discriminate between alternative models, we must adopt new mathematical tools.

Example (Binomial Model, Ctd…): We use operator methods to factor the discount factor process $M_{t \to t+h}$ which deflates payments in state $X_{t+h}$ at time horizon $t+h$ back to time $t$ into $3$ pieces, $e^{\lambda_M \cdot h}$ , $\tilde{\phi}_M(X_{t+h},X_t)$ and $\hat{M}_{t \to t+h}$ , where the first factor only depends on the investment horizon $h$ , the second factor only depends on the realized states and the third factor is noise, so that $M_{t \to t+h} = e^{\lambda_M \cdot h} \cdot \tilde{\phi}_M(X_{t+h},X_t) \cdot \hat{M}_{t \to t+h}$ .

By visual analogy to the Black, Derman and Toy (1990) model, in a binomial world we can use this decomposition to rewrite the $h=1$ Euler equation below where the dependence on $X_t$ is implicit:

$\begin{align*} 1 \ &= \ \mathbb{E}_t \left[ \ M_{t \to t+1} \cdot R_{t \to t+1} \ \right] \\ &= \ \frac{\pi(u) \cdot \tilde{\phi}_M(u) \cdot \varepsilon(u) \cdot R(u) + \pi(d) \cdot \tilde{\phi}_M( d) \cdot \varepsilon(u) \cdot R(d)}{1 - \lambda_M} \end{align*}$

Thus, in the Hansen and Scheinkman (2009) decomposition, $- \lambda_M$ serves as a synthetic risk free rate and the $\pi(x) \cdot \tilde{\phi}_M(x)$ serve as the twisted martingale measure.

In my work with Anmol Bhandari⁴ we look at a class of models for which $\ln \tilde{\phi}_M(x)$ is affine⁵ and show how to use this decomposition to compute a cross-horizon analogue to the Hansen and Jagannathan (1991) volatility bound. This new bound can be used to discriminate between different models which make identical predictions at a particular horizon. This exponentially affine structure is useful as it permits closed form solutions for the moments of $M_{t \to t+h}$ :

$\begin{align*} \mathbb{E}_t[M_{t \to t+h}] \ &\approx \ e^{\lambda_M \cdot h} \cdot \mathbb{E}_0 \left[ \frac{\phi_M(X_t)}{\phi_M(X_{t+h})} \right] \cdot 1 \\ \mathbb{E}_t[M_{t \to t+h}^2] \ &\approx \ e^{\lambda_{M^2} \cdot h} \cdot \mathbb{E}_0 \left[ \frac{\phi_{M^2}(X_t)}{\phi_{M^2}(X_{t+h})} \right] \cdot 1 \end{align*}$

In the next $2$ sections, I walk through the economics governing the $\lambda_M$ and $\phi_M$ terms.

Time Preference

Where does $\lambda_M$ come from? In the original article, the authors refer to $\lambda_M$ as the principle eigen-value of the extended generator of $M$ ; however, $\lambda_M$ has a well defined meaning without ever subscribing to Perron-Frobenius theory. $\lambda_M$ is a generalization of the time preference parameter dictated by an asset pricing model.

Consider the following thought experiment which casts the $\lambda_M$ term as the time preference parameter plus an extra Jensen inequality term.

Example (Generalized Time Preference): Suppose that an agent has preferences over a stream of consumption $C_1, C_2, C_3, ...$ and that for each period $t$ , $C_t = 100$ with probability $0.95$ and the remaining $5\%$ of the time $C_t = 50$ or $C_t = 150$ with equal probability. While $\mathbb{E}_t[C_{t+1}] = 100$ , the certainty equivalent is $\mathbb{E}_t^{c.e.}[C_{t+1}] < 100^{1-\gamma} = \mathbb{E}_t^*[C_{t+1}]$ .

In fact, with probability $0.05$ the agent will get a payout worth:

$\begin{align*} \mathbb{E}_t^{c.e.}[C_{t+1} \mid C_{t+1} \neq 100 ] \ &= \ \frac{50^{1-\gamma}}{2} + \frac{150^{1-\gamma}}{2} \end{align*}$

Let’s call this certainty equivelant gap $\delta$ :

$\begin{align*} \delta \ &= \ \mathbb{E}_t^{c.e.}[C_{t+1} \mid C_{t+1} \neq 100 ] \ - \ 100^{1-\gamma} \end{align*}$

$\lambda_M$ should then include both time preference, $\rho$ , and also the expected Jensen’s inequality loss:

$\begin{align*} \lambda_M \ &= \ \rho \ + \ 0.05 \cdot \delta \end{align*}$

Thus, in a more general framework, we should expect $\lambda_M$ to have roughly the following form:

$\begin{align*} \lambda_M \ &= \ \rho \ + \ f(\sigma_M^2, \sigma_X^2, \sigma_{M \times X}) \end{align*}$

where $f$ is an affine function. Heuristically, the $\sigma_X$ component will capture how volatile the state space is while the $\sigma_M$ component will capture how badly I need to discount this consumption stream due to Jensen’s inequality.

State Dependence

Next, in order to capture the dependence of the discount factor $M_{t \to t+h}$ on the current and future state $(X_t,X_{t+h})$ , Hansen and Scheinkman (2009) downshift to continuous time and apply the Perron-Frobenius theorem to the infinitesimal generator of the discount factor. When applied to the transition probability matrices, the Perron-Frobenius theory implies the largest eigen-pair dominates the behavior of a stochastic process as $h \to \infty$ . Hansen and Scheinkman use this $h \to \infty$ limiting result to argue that the ratio of $\phi_M(X_t)/\phi_M(X_{t+h})$ , the largest eigen-functions of the generator of the discount factor $M$ , is a good choice for the state dependent component of $M_{t \to t+h}$ .

It is important to note that Perron-Frobenius theory is only a modeling tool in the Hansen and Scheinkman (2009) construction, not a critical feature of their results. There may well be other reasonable choices for the state dependent component of $M_{t \to t+h}$ . In its simplest form⁶, the result can be written as:

Theorem (Perron-Frobenius): The largest eigen-value $\lambda$ of a positive square matrix $A$ is both simple and positive and belongs to a positive eigenvector $\phi$ . All other eigen-values are smaller in absolute value.⁷

In order to use this theorem, I need to have a positive square matrix to operate on. While strictly positive, $M_{t \to t+h}$ is not a square matrix; however, its infinitesimal generator is. Heuristically, you can think about the infinitesimal generator as encoding the transition probability matrix under the equivalent martingale measure deflated by the time preference parameter.

Definition (Infinitesimal Generator): The infinitesimal generator $\mathbb{A}$ of an Ito diffusion $\{ X_t \}$ in $\mathcal{R}^n$ is defined by:

$\begin{align*} \mathbb{A}[ f(x)] \ &= \ \lim_{h \searrow 0} \ \frac{\mathbb{E}_0[ f(X_h) ] - f(x)}{h}, \end{align*}$

where the set of functions $f: \mathcal{R}^n \mapsto \mathcal{R}$ such that the limit exists at $x$ is denoted by $\mathcal{D}_A(x)$ .

In words, the infinitesimal generator of the discount factor $M_{t \to t+h}$ captures how my valuation of a $1 payment in, say, the up state $u$ will change if I move the payment from $h=1$ period in the future to $h=2$ periods in the future. To get a feel for what the infinitesimal generator captures, consider the following short example using a $2$ state Markov chain. First, I define the physical transition intensity matrix for the Markov process $X_t$ .

Example (Markov Process w/ $2$ States): Consider a $2$ state Markov chain with states $X_t \in \{u,d\}$ . First, consider the physical evolution of the stochastic process $X_t$ which is governed by an $2 \times 2$ intensity matrix $\mathbb{T}$ . An intensity matrix encodes all of the transition probabilities. The matrix $e^{h \cdot \mathbb{T}}$ is the matrix of transition probabilities over a horizon $h$ . Since each row of the transition probability matrix $e^{h \cdot \mathbb{T}}$ must sum to $1$ , each row of the transition intensity matrix $\mathbb{T}$ must sum to $0$ .

$\begin{align*} \mathbb{T} \ &= \ \begin{bmatrix} \tau(u \mid u) & \tau(d \mid u) \\ \tau(u \mid d) & \tau(d \mid d) \end{bmatrix} \end{align*}$

The diagonal entries are nonpositive and represent minus the intensity of jumping from the current state to a new one. The remaining row entries, appropriately scaled, represent the conditional probabilities of jumping to the respective states. For concreteness, the following parameter values would be suffice:

$\begin{align*} \mathbb{T} \ &= \ \begin{bmatrix} -0.10 & 0.10 \\ 0.05 & -0.05 \end{bmatrix} \end{align*}$

Next, I want to show how to modify this transition intensity matrix $\mathbb{T}$ to describe the local evolution of the discount factor process $M_t$ . To do this, I first need to have an asset pricing model in mind, and I use a standard CRRA power utility model with risk aversion parameter $\gamma$ as in Breeden (1979) where $X_t$ is the log of the expected consumption growth.

Example (Markov Process w/ $2$ States, Ctd…): Intuitively, I know that every period I push the payment out into the future, I will end up discounting the payment by an additional $e^{\lambda_M}$ . However, I know that I will also have to twist $\mathbb{T}$ from the physical measure over to the risk neutral measure. Thus, the resulting generator will look something like:

$\begin{align*} \mathbb{A} \ &= \ \begin{bmatrix} \tau(u \mid u) \cdot \tilde{\phi}_M(u \mid u) & \tau(u \mid d) \cdot \tilde{\phi}_M(u \mid d) \\ \tau(d \mid u) \cdot \tilde{\phi}_M(d \mid u) & \tau(d \mid d) \cdot \tilde{\phi}_M(d \mid d) \end{bmatrix} \ - \ \lambda_M \end{align*}$

If we (correctly) assume that $\tilde{\phi}_M(s' \mid s) = 1$ , then we have:

$\begin{align*} \alpha(s' \mid s) \ &= \ \begin{cases} \tau(s' \mid s) - \lambda_M &\text{ if } s' = s \\ \tau(s' \mid s) \cdot \tilde{\phi}_M(s' \mid s) - \lambda_M &\text{ if } s' \neq s \end{cases} \end{align*}$

Note that the rows of $\mathbb{A}$ will in general not sum to $0$ as in the physical transition intensity matrix $T$ .

An Example

I conclude by working through an extended example showing how to solve for each of the terms in a simple model. Think about a Vasicek (1977) interest rate model. Let $X_t$ be a risk factor with the following scalar Ito diffusion. I choose this model so that I can verify all of my solutions by hand using existing techniques.

$\begin{align*} dX_t \ &= \ \beta_X(X_t) \cdot dt \ + \ \sigma_X(X_t) \cdot dB_t \\ \beta_X(x) \ &= \ \bar{\beta}_X \ - \ \beta_X \cdot x \\ \sigma_X(x) \ &= \ \sigma_X \end{align*}$

Let $M_t=\exp \{A_t\}$ and $A_t$ solves the following Ito diffusion.

$\begin{align*} dA_t \ &= \ \beta_A(X_t) \cdot dt \ + \ \sigma_A(X_t) \cdot dB_t \\ \beta_A(x) \ &= \ \bar{\beta}_A \ - \ \beta_A \cdot x \\ \sigma_A(x) \ &= \ \sigma_A \end{align*}$

Thus $(X_t,M_t)$ are described by parameter vector $\Theta$ :

$\begin{align*} \Theta \ &= \ \begin{bmatrix} \beta_X & \beta_A & \bar{\beta}_X & \bar{\beta}_A & \sigma_X & \sigma_A \end{bmatrix} \end{align*}$

We need to restrict $\Theta$ to ensure stationarity. Matching coefficients to ensure that $\lambda_M$ does not move with $x$ yields the following characterization of $\kappa_M$ .

$\begin{align*} \kappa_M \ &= \ - \ \frac{\beta_A}{\beta_X} \end{align*}$

Substituting back into the formula for $\lambda_M$ yields.

$\begin{align*} \begin{split} \lambda_M \ &= \ \left( \ \bar{\beta}_A \ + \ \frac{\sigma_A^2}{2} \ \right) \\ &\qquad \qquad + \ \left( \ \bar{\beta}_X \ + \ \sigma_A \cdot \sigma_X \ \right) \cdot \kappa_M \\ &\qquad \qquad \qquad + \ \left( \ \frac{\sigma_X^2}{2} \ \right) \cdot \kappa_M^2 \end{split} \end{align*}$

We know that $M_t^2 =\exp\{2 \cdot A_t\}$ .

$\begin{align*} \kappa_{M^2} \ &= \ - \ \frac{2 \cdot \beta_A}{\beta_X} \\ \lambda_{M^2} \ &= \ 2 \cdot \lambda_M \ + \ \sigma_A^2 \ + \ \left( \ \sigma_A \cdot \sigma_X \ \right) \cdot \kappa_{M^2} \ + \ \left( \ \frac{\sigma_X^2}{4} \ \right) \cdot \kappa_{M^2}^2 \end{align*}$

Exercise (Offsetting Shocks): If $\rho$ is the standard time preference parameter, when would $\lambda_M = \rho$ ?

Exercise (Stochastic Volatility): Think about a Feller square root term to allow for stochastic volatility a lá Cox, Ingersoll and Ross (1985) interest rate model.

$\begin{align*} dX_t \ &= \ \beta_X(X_t) \cdot dt \ + \ \sigma_X(X_t) \cdot dB_t \\ \beta_X(x) \ &= \ \bar{\beta}_X \ - \ \beta_X \cdot x \\ \sigma_X(x) \ &= \ \sigma_X \cdot \sqrt{x} \end{align*}$

$\begin{align*} dA_t \ &= \ \beta_A(X_t) \cdot dt \ + \ \sigma_A(X_t) \cdot dB_t \\ \beta_A(x) \ &= \ \bar{\beta}_A \ - \ \beta_A \cdot x \\ \sigma_A(x) \ &= \ \sigma_A \cdot \sqrt{x} \end{align*}$

What are $\kappa_M$ and $\lambda_M$ ?

Note: The results in this post stem from joint work I am conducting with Anmol Bhandari for our paper “Model Selection Using the Term Structure of Risk”. In this paper, we characterize the maximum Sharpe ratio allowed by an asset pricing model at each and every investment horizon. Using this cross-horizon bound, we develop a macro-finance model identification toolkit. ↩
e.g., think of the state space needed in the Campbell and Cochrane (1999) habit model. ↩
Investment horizon symmetry is an unexplored prediction of many asset pricing theory. Asset pricing models characterize how much a trader needs to be compensated in order to hold 1 unit of risk for 1 unit of time. The standard approach to testing these models is to fix the unit of time and then look for incorrectly priced packets of risk. e.g., Roll (1981) looked at the spread in 1 month holding period returns on 10 portfolios of NYSE firms sorted by market cap and found that small firms earned abnormal excess returns relative to the CAPM. Yet, I could just as easily ask the question: Given a model, how much more does a trader need to be compensated for her to hold the same 1 unit of risk for an extra 1 unit of time? This inversion is well defined as asset pricing models possess investment horizon symmetry. Models hold at each and every investment horizon running from $1$ second to 1 year to 1 century and everywhere in between. To illustrate this point via an absurd case, John Cochrane writes in his textbook (Asset Pricing (2005), Section 9.3.) that according to the consumption CAPM ‘…if stocks go up between 12:00 and 1:00, it must be because (on average) we all decided to have a big lunch.’ ↩
See Model Selection Using the Term Structure of Risk. ↩
This class of models allows for features such as rare disasters, recursive preferences and habit formation among others… ↩
Really, this is just the Oskar Perron version of the theorem. ↩
For an introduction to Perron-Frobenius theory, see MacCluer (2000). ↩

Introduction1

Motivation

Time Preference

State Dependence

An Example

Introduction¹