Using the Cross-Section of Returns

1. Introduction

The empirical content of the discount factor view of asset pricing can all be derived from the equation below:

(1) $\begin{align*} 0 = \mathrm{E}[m \cdot r_n] \quad \text{for all } n=1,2,\ldots,N \end{align*}$

where $m$ denotes the prevailing stochastic discount factor and $r_n$ denotes an asset’s excess return. Equation (1) reads: “In the absence of margin requirements and transactions costs, it costs you $\mathdollar 0$ today to borrow at the riskless rate, buy a stock, and hold the position for $1$ period.” The question is then why average excess returns, $\mathrm{E}[r_n]$ , vary across the $N$ assets even though they all have the same price today by construction.

The answer hinges on the behavior of the stochastic discount factor, $m$ , in Equation (1). What is this thing? Everyone knows that it is better to have $\mathdollar 1$ today than $\mathdollar 1$ tomorrow, and the present value of an asset that pays out $\mathdollar 1$ tomorrow is the called the discount rate. Sometimes important stuff will happen in the next $24$ hours that changes how awesome it is to have an additional $\mathdollar 1$ tomorrow. As a result, the realized discount rate is a random variable each period (i.e., follows a stochastic process). e.g., if agents have utility, $\mathrm{U}_0 = \mathrm{E}_0 \sum_{t \geq 0} e^{\rho \cdot t} \cdot c_t^{1-\theta}$ , then the stochastic discount factor is $m = e^{-\rho - \theta \cdot \Delta \log c}$ and the stuff (i.e., risk factor) is changes in log consumption.

An asset pricing model is a machine which takes as inputs a) each agent’s preferences, b) each agent’s information, and c) a list of the relevant risk factors affecting how agents discount the future and produces a stochastic discount factor as its output. In this post, I show how to test an asset pricing model using the cross-section of asset returns. i.e., by linking how average excess returns vary across assets to each asset’s exposure to the risk factors governing the behavior of the stochastic discount factor.

2. Theoretical Predictions

The key to massaging Equation (1) into a form that can be taken to the data is noticing that for any $2$ random variables $u$ and $v$ , the following identity holds:

(2) $\begin{equation*} \mathrm{E}[u\cdot v] = \mathrm{Cov}[u,v] - \mathrm{E}[u] \cdot \mathrm{E}[v] \end{equation*}$

Thus, if I let $u$ denote the stochastic discount factor and $v$ denotes any of the $N$ excess returns, I can link the expected excess return to holding an asset to its covariance with the stochastic discount factor:

(3) $\begin{align*} \mathrm{E}[r_n] &= \frac{\mathrm{Cov}[m, r_n]}{\mathrm{Var}[m]} \cdot \left( - \frac{\mathrm{Var}[m]}{\mathrm{E}[m]} \right) \end{align*}$

The first term is dimensionless and represents the amount of exposure asset $n$ has to the risk factor $x$ . The second term has dimension $\sfrac{1}{\Delta t}$ , is common across all assets, and represents the price of exposure to the risk factor $x$ since it has the same units as the expected return $\mathrm{E}[r_n]$ . Asset pricing theories say that each asset’s expected return should be proportional to the market-wide prices of risk where the constant on proportionality is the asset’s “exposure” to that risk factor.

What does “exposure” mean here? To answer this question I need to put a bit more structure on the stochastic discount factor, $m$ , and the excess return, $r_n$ . I remain agnostic about which asset pricing model actually governs returns and which risk factors that affect discount rates, but to avoid writing out lots of messy matrices I do assume that there is only a single factor, $x$ , with $\mathrm{E}[x] = \mu_x$ and $\mathrm{Var}[x] = \sigma_x^2$ . I then write the stochastic discount factor as the sum of a function of $x$ , $\mathrm{M}(x)$ , and some noise, $y \overset{\scriptscriptstyle \mathrm{iid}}{\sim} \mathrm{N}(0,\sigma_y^2)$ :

(4) $\begin{align*} m &= \mathrm{M}(x) + y \\ &= \mathrm{M}(\mu_x) + \mathrm{M}'(\mu_x) \cdot (x - \mu_x) + \frac{1}{2} \cdot \mathrm{M}''(\mu_x) \cdot (x - \mu_x)^2 + \text{``h.o.t.''} + y \\ &\approx \phi + \chi \cdot (x - \mu_x) + \frac{\psi}{2} \cdot (x - \mu_x)^2 + y \end{align*}$

where I use a Taylor expansion to linearize the function $\mathrm{M}(x)$ around the point $x = \mu_x$ and assume terms of order $\mathrm{O}(x - \mu_x)^3$ are negligible so that $\mathrm{E}[m] = \phi + \sfrac{\psi}{2} \cdot \sigma_x^2$ and $\mathrm{Var}[m] = \chi^2 \cdot \sigma_x^2 + \sigma_y^2$ . This means that if the risk factor is $\sigma_x$ larger than expected, $(x - \mu_x) = \sigma_x$ , then agents value having an additional $\mathdollar 1$ tomorrow $\chi \cdot \sigma_x$ more than usual. Similarly, suppose each excess return is the sum of an asset-specific function of $x$ , $\mathrm{R}_n(x)$ , and some asset-specific noise, $z_n \overset{\scriptscriptstyle \mathrm{iid}}{\sim} \mathrm{N}(0,\sigma_z^2)$ :

(5) $\begin{align*} r_n &= \mathrm{R}_n(x) + z_n \\ &= \mathrm{R}_n(\mu_x) + \mathrm{R}_n'(\mu_x) \cdot (x - \mu_x) + \frac{1}{2} \cdot \mathrm{R}_n''(\mu_x) \cdot (x - \mu_x)^2 + \text{``h.o.t.''} + z_n \\ &\approx \alpha_n + \beta_n \cdot (x - \mu_x) + \frac{\gamma_n}{2} \cdot (x - \mu_x)^2 + z_n \end{align*}$

where I use a Taylor expansion to linearize the function $\mathrm{R}_n(x)$ around the point $x = \mu_x$ and assume $\mathrm{O}(x - \mu_x)^3$ terms are negligible so that $\mathrm{E}[r_n] = \alpha_n + \sfrac{\gamma_n}{2} \cdot \sigma_x^2$ and $\mathrm{Var}[r_n] = \beta_n^2 \cdot \sigma_x^2 + \sigma_z^2$ . This means that if the risk factor is $\sigma_x$ larger than expected, $(x - \mu_x) = \sigma_x$ , then asset $n$ ‘s realized excess returns will be $\beta_n \cdot \sigma_x$ larger than average.

Plugging Equations (4) and (5) into Equation (3) then shows exactly what “exposure” to the risk factor means:

(6) $\begin{equation*} \begin{split} \mathrm{E}[r_n] &= \frac{\mathrm{Cov}[m,r_n]}{\mathrm{Var}[m]} \cdot \left( - \, \frac{\mathrm{Var}[m]}{\mathrm{E}[m]} \right) \\ &= \frac{\chi \cdot \beta_n \cdot \sigma_x^2}{\chi^2 \cdot \sigma_x^2 + \sigma_y^2} \cdot \left( - \, \frac{\chi^2 \cdot \sigma_x^2 + \sigma_y^2}{\phi + \frac{\psi}{2} \cdot \sigma_x^2} \right) \\ &= - \, \left( \frac{\chi \cdot \sigma_x^2}{\phi + \frac{\psi}{2} \cdot \sigma_x^2} \right) \cdot \beta_n \\ &= \text{Constant} \times \beta_n \end{split} \end{equation*}$

Each asset’s exposure to the risk factor $x$ is summarized by the coefficient $\beta_n$ . Assets which have higher realized returns when the risk factor is high (have a large $\beta_n$ ) will have lower average returns (high prices) since these assets are good hedges against the risk factor. i.e., these assets look like insurance. Equation (1)’s empirical content is then that an asset’s average excess returns, $\langle r_n \rangle$ , is proportional to its exposure to the risk factor, $\beta_n$ , where the constant of proportionality is the same for all assets:

(7) $\begin{align*} \mathrm{E}[r_n] = \underbrace{\alpha_n + \frac{\gamma_n}{2} \cdot \sigma_x^2}_{\text{Realized } \langle r_n \rangle} = \underbrace{- \, \left( \frac{\chi \cdot \sigma_x^2}{\phi + \frac{\psi}{2} \cdot \sigma_x^2} \right) \cdot \beta_n}_{\text{Predicted}} \end{align*}$

By letting $y,z_n \searrow 0$ we can interpret this relationship as a realization of the first Hansen-Jagannathan bound:

(8) $\begin{align*} \frac{\mathrm{StD}[m_{t+1}]}{\mathrm{E}[m_{t+1}]} = \frac{\chi \cdot \sigma_x}{\phi + \frac{\psi}{2} \cdot \sigma_x^2} = \frac{\alpha_n + \frac{\gamma_n}{2} \cdot \sigma_x^2}{\beta_n \cdot \sigma_x} = \left| \frac{\mathrm{E}[r_{n,t+1}]}{\mathrm{StD}[r_{n,t+1}]} \right| \end{align*}$

3. Empirical Strategy

To test Equation (7), an econometrician has to estimate $(2 \cdot N + 2)$ unknown parameters:

(9) $\begin{align*} \widehat{\boldsymbol \theta} = \begin{bmatrix} \widehat{\mu}_x & \widehat{\alpha}_1 & \cdots & \widehat{\alpha}_N & \widehat{\beta}_1 & \cdots & \widehat{\beta}_N & \widehat{\lambda} \end{bmatrix}^{\top} \end{align*}$

using $T$ periods of observations. i.e., $2$ parameters for each asset (its average excess returns and its factor exposure) as well as $2$ market-wide parameters (the risk factor mean and the market price of risk). There are $(3 \cdot N + 1)$ equations to estimate these parameters with via GMM so that the system is over-identified whenever there are $N > 1$ assets:

(10) $\begin{align*} \begin{pmatrix} 0 \\ 0 \\ \vdots \\ 0 \\ 0 \\ \vdots \\ 0 \\ 0 \\ \vdots \\ 0 \end{pmatrix} &= \mathrm{E}[\mathrm{G}(\widehat{\boldsymbol \theta};\mathbf{r}_t,x_t)] = \mathrm{E} \begin{bmatrix} x_t - \widehat{\mu}_x \\ r_{1,t} - \left\{ \widehat{\alpha}_1 + \widehat{\beta}_1 \cdot (x_t - \widehat{\mu}_x) \right\} \\ \vdots \\ r_{N,t} - \left\{ \widehat{\alpha}_N + \widehat{\beta}_N \cdot (x_t - \widehat{\mu}_x) \right\} \\ \left( r_{1,t} - \left\{ \widehat{\alpha}_1 + \widehat{\beta}_1 \cdot (x_t - \widehat{\mu}_x) \right\} \right) \cdot (x_t - \widehat{\mu}_x) \\ \vdots \\ \left( r_{N,t} - \left\{ \widehat{\alpha}_N + \widehat{\beta}_N \cdot (x_t - \widehat{\mu}_x) \right\} \right) \cdot (x_t - \widehat{\mu}_x) \\ r_{1,t} - \widehat{\beta}_1 \cdot \widehat{\lambda} \\ \vdots \\ r_{N,t} - \widehat{\beta}_N \cdot \widehat{\lambda} \end{bmatrix} \end{align*}$

The first equation pins down the mean of the factor $x$ . The following $(2 \cdot N)$ equations identify the $\{\widehat{\alpha}_n,\widehat{\beta}_n\}_{n \in N}$ parameters governing the relationship between the risk factor and each asset’s excess returns. The final $N$ equations pin down the market price of risk, $\widehat{\lambda}$ , for exposure to the risk factor $x$ . A risk is “priced” if $\widehat{\lambda} \neq 0$ .

Note that this empirical strategy doesn’t pin down every single one of the parameters governing the relationship between the stochastic discount factor and each asset’s excess returns. e.g., the parameter estimates $\widehat{\alpha}_n$ and $\widehat{\lambda}$ are composites of several deep parameters:

(11) $\begin{align*} \widehat{\alpha}_n &= \alpha_n + \frac{\gamma_n}{2} \cdot \sigma_x^2 \\ \widehat{\lambda} &= - \, \left( \frac{\chi \cdot \sigma_x^2}{\phi + \frac{\psi}{2} \cdot \sigma_x^2} \right) \end{align*}$

The underlying parameters $\alpha_n$ and $\gamma_n$ as well as $\phi$ , $\chi$ , and $\psi$ are not identifiable from this approach since they satisfy conservation laws which leave the estimates for $\widehat{\alpha}_n$ and $\widehat{\lambda}$ unchanged:

(12) $\begin{align*} \frac{\partial \widehat{\alpha}_n}{\partial \alpha_n} \cdot \Delta \alpha_n + \frac{\partial \widehat{\alpha}_n}{\partial \gamma_n} \cdot \Delta \gamma_n = 0 &= \Delta \alpha_n + \frac{\sigma_x^2}{2} \cdot \Delta \gamma_n \\ \frac{\partial \widehat{\lambda}}{\partial \phi} \cdot \Delta \phi + \frac{\partial \widehat{\lambda}}{\partial \chi} \cdot \Delta \chi + \frac{\partial \widehat{\lambda}}{\partial \psi} \cdot \Delta \psi = 0 &= \left( \frac{\chi}{\phi + \frac{\psi}{2} \cdot \sigma_x^2} \right) \cdot \{\Delta \phi + \frac{\sigma_x^2}{2} \cdot \Delta \psi\} - \Delta \chi \end{align*}$

e.g., if you increase $\alpha_n$ by $\epsilon \approx 0^+$ and decrease $\gamma_n$ by $\frac{2}{\sigma_x^2} \cdot \epsilon$ , then the estimate of $\widehat{\alpha}_n$ remains unchanged.

4. Time Scale Considerations

There is a hidden assumption floating around behind the empirical strategy outlined in Section $3$ above. Namely, that each asset’s factor exposure is constant and the market price of risk is constant. In practice, this is surely not the case as is documented in Jagannathan and Wang (1996) and Lewellen and Nagel (2006). OK… so constant factor exposures and prices of risk is an approximation. Fine. How good/bad an approximation is it? e.g., Fama and MacBeth (1973) use rolling $T = 60$ month windows to estimate each asset’s $\widehat{\beta}_n$ . Is this too long a window relative to how much factor exposures vary over time? Alternatively, should we be using a longer window to more accurately pin down these parameters? It turns out that the estimation strategy gives some guidance about the relationship between the optimal estimation window and parameter persistence which I discuss below.

First, I model the evolution of the true parameters. To test an asset pricing model using the cross-section of excess returns, we are interested in knowing whether or not $\widehat{\lambda} = 0$ . Suppose the true market price of risk, $\lambda$ , follows a random walk:

(13) $\begin{align*} \lambda_T = \lambda + \sum_{t=1}^T l_t \end{align*}$

where $l_t \overset{\scriptscriptstyle \mathrm{iid}}{\sim} \mathrm{N}(0,\sigma_l^2)$ so that the final $\lambda_T$ is a random variable with distribution:

(14) $\begin{align*} \lambda_T \sim \mathrm{N}(\lambda, T \cdot \sigma_l^2) \end{align*}$

Second, I note that the estimation strategy outlined in Section $3$ above gives signal, $\widehat{\lambda}$ , about the average market price of risk with distribution:

(15) $\begin{align*} \widehat{\lambda} \sim \mathrm{N}\left(\lambda, \sfrac{\sigma_s^2}{T}\right) \end{align*}$

where $s_t \overset{\scriptscriptstyle \mathrm{iid}}{\sim} \mathrm{N}(0,\sigma_s^2)$ denotes estimation error from the GMM procedure. There is an additional complication to consider. Namely, if the true market price of risk is floating around during the estimation period, it will add additional noise to the parameter estimates and increase $\sigma_s^2$ . To keep things simple, suppose that nature sets the market price of risk to $\lambda$ at the beginning of the estimation sample and it remains constant during estimation period. Then, $\lambda_T$ is revealed at the end of time $T$ and prevails afterwards. This will mean that the derivations below will be inequalities due to the underestimate of $\sigma_s^2$ .

What I really care about is the distance between the true $\lambda_T$ at the end of the sample which governs the market going forward and the GMM estimate of $\widehat{\lambda}$ . Thus, I should choose out sample period length, $T$ , to minimize:

(16) $\begin{align*} T = \arg \min_{T \geq 0} \mathrm{E}\left[ (\lambda_T - \widehat{\lambda})^2 \right] = \arg \min_{T \geq 0} \mathrm{E}\left[ (\lambda_T - \lambda)^2 + (\lambda - \widehat{\lambda})^2 \right] \end{align*}$

As a result, to find the optimal $T$ I take the first order condition:

(17) $\begin{align*} 0 = \frac{d}{dT} \left[ T \cdot \sigma_l^2 + \left(\frac{1}{\sigma_{\lambda}^2} + \frac{T}{\sigma_s^2} \right)^{-1} \right] \end{align*}$

where $\sigma_{\lambda}^2$ denotes the variance of my priors about the market price of risk governing the estimation sample $\lambda$ . The solution to this equation defines the window length, $T$ , which optimally trades off the benefit of getting a more precise estimate of $\lambda$ with the cost of decreasing the relevance of this estimate due to the evolution of $\lambda_T$ .

GMM maps $\sigma_s^2$ onto a parameter of the underlying model. To keep things simple, suppose there is only $1$ asset and $4$ unknown parameters:

(18) $\begin{align*} \widehat{\boldsymbol \theta} = \begin{bmatrix} \widehat{\mu}_x & \widehat{\alpha} & \widehat{\beta} & \widehat{\lambda} \end{bmatrix}^{\top} \end{align*}$

so that the system of estimation equations reduces to:

(19) $\begin{align*} \begin{pmatrix} 0 \\ 0 \\ 0 \\ 0 \end{pmatrix} &= \mathrm{E}[\mathrm{G}(\widehat{\boldsymbol \theta};r_t,x_t)] = \mathrm{E} \begin{bmatrix} x_t - \widehat{\mu}_x \\ r_t - \left\{ \widehat{\alpha} + \widehat{\beta} \cdot (x_t - \widehat{\mu}_x) \right\} \\ \left( r_t - \left\{ \widehat{\alpha} + \widehat{\beta} \cdot (x_t - \widehat{\mu}_x) \right\} \right) \cdot (x_t - \widehat{\mu}_x) \\ r_t - \widehat{\beta} \cdot \widehat{\lambda} \end{bmatrix} \end{align*}$

This assumption means that I don’t have to consider how learning about one asset affects my beliefs about another asset. In this world, if $x_t \overset{\scriptscriptstyle \mathrm{iid}}{\sim} \mathrm{N}(\mu_x,\sigma_x^2)$ , then GMM reduces to OLS and $\sigma_s^2 = \sfrac{\sigma_z^2}{\beta_n^2}$ since:

(20) $\begin{align*} r_{n,t} = \beta_n \cdot \lambda + \beta_n \cdot (x_t - \mu_x) + z_{n,t} \end{align*}$

Evaluating the first order condition then gives:

(21) $\begin{align*} 0 = \sigma_l^2 - \left(\frac{1}{\sigma_{\lambda}^2} + \frac{T}{\sfrac{\sigma_z^2}{\beta_n^2}} \right)^{-2} \cdot \frac{1}{\sfrac{\sigma_z^2}{\beta_n^2}} \end{align*}$

Solving for $T$ yields:

(22) $\begin{align*} T &\geq \min\left\{ \, 0, \, \frac{\sigma_z}{\beta_n \cdot \sigma_l} - \frac{\sigma_z^2}{\beta_n^2 \cdot \sigma_{\lambda}^2} \, \right\} \end{align*}$

Let’s plug in some values to make sure this formula makes sense. First, notice that if the market price of risk is constant, $\lambda_T = \lambda$ , then $\sigma_l = 0$ and you should pick $T = \infty$ or as large as possible. Second, notice that if you already know the true $\lambda$ , then $\sigma_{\lambda}^2 = 0$ and you should pick $T = 0$ . Finally, notice that if the test asset has no exposure to the risk factor, $\beta_n = 0$ , then the equation is undefined since any window length gives you the same amount of information—i.e., none.