Alex – Page 13 – Research Notebook

Scaling Up “Iffy” Decisions

September 3, 2013 by Alex

1. Introduction

Imagine you are an algorithmic trader, and you have to set up a trading platform. How many signals should you try to process? How many assets should you trade? If you are like most people, your answer will be something like: “As many as I possibly can given my technological constraints.” Most people have the intuition that the only thing holding back algorithmic trading is computational speed. They think that left unchecked, computers will eventually uncover every trading opportunity no matter how small or obscure. e.g., as Scott Patterson writes in Dark Pools, these computerized trading platforms are just “tricked-out artificial intelligence systems designed to scope out hidden pockets in the market where they can ply their trades.”

This post highlights another concern. As you use computers to discover more and more “hidden pockets in the market”, the scale of these trades might grow faster than the usefulness of your information. Weak and diffuse signals might turn out to be really risky to trade on. How might this work? e.g., suppose that in order for your computers to recognize the impact of China’s GDP on US stock returns you need to feed them data for $500$ assets; however, in order for your computers to recognize the impact of recent copper discoveries in Guatemala on electronics manufacturing company returns, you need to feed your machines data on $5000$ assets. Even if the precision of the signal that your computers spit out is the same, its magnitude will be smaller. Copper discoveries in Guatemala just don’t matter as much. Thus, you would take on more risk by aggressively trading the second signal because it’s weaker and you would have to take on a position in $10$ times as many assets! Thus, this post suggests that even if you might want to maximize the number of inputs your machines get, you might want to limit how broadly you apply them.

First, in Section 2 I illustrate how the scale of a trading opportunity might increase faster than the precision on your signal via an an example based on the well known Hodges’ Estimator. You should definitely check out Larry Wasserman‘s excellent post on the this topic for an introduction to the ideas. In Section 3 I then outline the basic decision problem I have in mind. In Section 4 I show how the risk associated with trading weak signals might explode as described above. Finally, in Section 4 I conclude by suggesting an empirical application of these ideas.

2. Hodges’ Estimator

Think about the following problem. Suppose that $m_1, m_2, \ldots, m_Q \overset{\scriptscriptstyle \mathrm{iid}}{\sim} N(\mu,1)$ are random variables denoting (say) innovations in the quarterly dividends of a group of $Q$ stocks:

(1) $\begin{align*} m_q &= d_{q,t+1} - \mathrm{E}_t[d_{q,t+1}] \end{align*}$

You want to know if the mean of the dividend changes is non-zero. i.e., has this group of stocks realized a shock? Define the estimator $\mathrm{Hdg}[Q]$ as follows:

(2) $\begin{align*} \mathrm{Hdg}[Q] &= \begin{cases} \mathrm{Avg}[Q] &\text{if } \mathrm{Avg}[Q] \geq Q^{-1/4} \\ 0 &\text{else } \end{cases} \end{align*}$

where $\mathrm{Avg}[Q] = \frac{1}{Q} \cdot \sum_{q=1}^Q m_q$ . This estimator says: “If the average of the dividend changes is sufficiently big, I’m going to assume there has been a shock of size $\mathrm{Avg}[Q]$ ; otherwise, I’ll assume that there’s been no shock.” This is an example of a Hodges-type estimator. $\mathrm{Hdg}[Q]$ is a consistent estimator of $\mu$ in the sense that:

(3) $\begin{align*} \sqrt{Q} \cdot (\mathrm{Hdg}[Q] - \mu) &\overset{\scriptscriptstyle \mathrm{Dist}}{\to} \begin{cases} N(0,1) &\text{if } \mu \neq 0 \\ 0 &\text{if } \mu = 0 \end{cases} \end{align*}$

Thus, as you examine more and more stocks from this group, you are guaranteed to discover the true mean. However, the worst case expected loss of the estimator, $\mathrm{Hdg}[Q]$ , is infinite!

(4) $\begin{align*} \sup_\mu \mathrm{E}\left[ Q \cdot (\mathrm{Hdg}[Q] - \mu)^2 \right] &\to \infty \end{align*}$

This is true even though the worst case loss for the sample mean, $\mathrm{Avg}[Q]$ , is flat:

(5) $\begin{align*} \sup_\mu \mathrm{E}\left[ Q \cdot (\mathrm{Avg}[Q] - \mu)^2 \right] &\to 1 \end{align*}$

I plot the risk associated with Hodges’ estimator in the figure below. What’s going on here? Well, as $Q$ gets bigger and bigger, there remains a region around $\mu = 0$ where you are quite certain that the mean is $0$ . However, at the edge of this region, there is a band of values of $\mu$ where if you are wrong and $\mu \neq 0$ , then your prediction error summed across every one of the $Q$ stocks you examined turns out to be quite big.

3. Decision Problem

Now, think about a decision problem that generates a really similar decision rule—namely, try to figure out how many shares, $a$ , to purchase where the stock’s payout is determined by $N$ different attributes via the coefficients ${\boldsymbol \mu}$ :

(6) $\begin{align*} \max_a V(a;{\boldsymbol \mu},\mathbf{x}) &= \max_a V(a) = \min_a \frac{\gamma}{2} \cdot \left( a - \sum_{n=1}^N \mu_n \cdot x_n \right)^2 \end{align*}$

Here, you are trying to maximize your value, $V$ , by choosing the right number of shares to hold—i.e., the risk action. Ideally you would take the action which is exactly equal to the ideal action $a = \sum_{n=1}^N \mu_n \cdot x_n$ ; however, it’s hard to figure out the exact loadings, ${\boldsymbol \mu}$ , for every single one of the $N$ relevant dimensions. As a result, you take an action $a$ which isn’t exactly perfect:

(7) $\begin{align*} a &= A(\mathbf{m};\mathbf{x}) = A(\mathbf{m}) = \sum_{n=1}^N m_n \cdot x_n \end{align*}$

I use the function $L(\cdot)$ to denote the loss in value you suffer by choosing a suboptimal asset holding:

(8) $\begin{align*} L(\mathbf{m};{\boldsymbol \mu}) = L(\mathbf{m}) &= \mathrm{E}\left[ \ V(A({\boldsymbol \mu}); {\boldsymbol \mu}, \mathbf{x}) - V(A(\mathbf{m}); {\boldsymbol \mu}, \mathbf{x}) \ \right] \\ &= - \frac{\gamma}{2} \cdot \mathrm{E}\left[ \left( \sum_{n=1}^N (m_n - \mu_n) \cdot x_n \right)^2 \right] \\ &= \frac{\gamma}{2} \cdot \mathrm{E}\left[ \sum_{n=1}^N (m_n - \mu_n)^2 \right] \end{align*}$

where the $3$ rd line follows from assuming that $x_n \overset{\scriptscriptstyle \mathrm{iid}}{\sim} N(0,1)$ .

So which of the $N$ different details should you pay attention to? Which of them should you ignore? One way to frame this problem would be to look for a sparse solution and only ask your computers to trade on the signals that are sufficiently important. Gabaix (2012) shows how to do this using an $\ell_1$ -program, I outline the main ideas in an earlier post. Basically, you would try to minimize your loss from taking a suboptimal action subject to an $\ell_1$ -penalty:

(9) $\begin{align*} L(\mathbf{m}) &= \min_{\mathbf{m} \in \mathrm{R}^N} \left\{ \frac{\gamma}{2} \cdot \sum_{n=1}^N (m_n - \mu_n)^2 + \kappa \cdot \sum_{n=1}^N |m_n| \right\} \end{align*}$

so that your optimal choice of actions is given by:

(10) $\begin{align*} m_n &= \begin{cases} \mu_n &\text{if } |\mu_n| \geq \kappa \\ 0 &\text{if } |\mu_n| < \kappa \end{cases} \end{align*}$

That’s the decision problem. So far everything is old hat.

4. Maximum Risk

The key insight in this post is that these $\mu_n$ terms don’t come down from on high. As a trader, you don’t just know what these terms are from the outset. They weren’t stitched onto the forehead of your favorite teddy bear. Instead, you have to use data on lots of stocks to estimate them as you go. In this section, I think about a world where you feed data on $Q$ different stocks to your machines, and each of these assets has the appropriate action $\sum_{n=1}^N \mu_{n,q} \cdot x_{n,q}$ . I then investigate what happens when you use the estimator:

(11) $\begin{align*} \widetilde{m}_n &= \begin{cases} \mathrm{Avg}_n[Q] &\text{if } |\mathrm{Avg}_n[Q]| \geq \kappa \\ 0 &\text{if } |\mathrm{Avg}_n[Q]| < \kappa \end{cases} \end{align*}$

instead of the estimator in Equation (10) where $\mathrm{Avg}_n[Q] = \frac{1}{Q} \cdot \sum_{q=1}^Q \mu_{n,q}$ and $\kappa$ isn’t growing that fast as $Q$ gets larger and larger. i.e., we have that:

(12) $\begin{align*} \widetilde{m}_n &= \mathrm{Avg}_n[Q] \cdot 1_{\{ |\mathrm{Avg}_n[Q]| \geq \kappa \}} \quad \text{with} \quad \kappa = \mathrm{O}(\sqrt{2 \log Q}) \end{align*}$

In some senses, $\widetilde{\mathbf{m}}$ is still a really good estimator of ${\boldsymbol \mu}$ . To see this, let $\mu_n$ denote the true effect for a particular attribute. Then, clearly for $|\mu_n| \geq \kappa$ , we have that:

(13) $\begin{align*} \sqrt{Q} \cdot (\widetilde{m}_n - \mu_n) &\overset{\scriptscriptstyle \mathrm{Dist}}{\to} \mathrm{N}(0,1) \end{align*}$

Similarly, for $\mu_n = 0$ , we have that:

(14) $\begin{align*} \sqrt{Q} \cdot (\widetilde{m}_n - \mu_n) &\overset{\scriptscriptstyle \mathrm{Dist}}{\to} 0 \end{align*}$

Thus, the estimate $\widetilde{m}_n$ is a strictly better estimator of $\mu_n$ than the sample average. Pretty cool.

However, in another sense, it’s a terrible estimator since the maximum risk associated with trading on $\widetilde{\mathbf{m}}$ is unbounded:

(15) $\begin{align*} \sup_{\mu_n} R_n(\widetilde{\mathbf{m}}) &\to \infty \quad \text{where} \quad R_n(\widetilde{\mathbf{m}}) = \mathrm{E}\left[ Q \cdot (\widetilde{m}_n - \mu_n)^2 \right] \end{align*}$

This result means that if you use this estimator to trade on, there are parameter values for $\mu_n$ which lead your computer to take on positions that are infinitely risky and make you really unhappy! How is this possible? Well, take a look at the following decomposition of the risk associated with the estimator $\widetilde{\mathbf{m}}$ :

(16) $\begin{align*} R_n(\widetilde{\mathbf{m}}) &= \mathrm{E}\left[ Q \cdot (\widetilde{m}_n - \mu_n)^2 \right] \\ &= \mathrm{E}\left[ Q \cdot (\mathrm{Avg}_n[Q] \cdot 1_{\{ |\mathrm{Avg}_n[Q]| \geq \kappa \}} - \mu_n)^2 \right] \\ &= \mathrm{Pr}\left[ |\mathrm{Avg}_n[Q]| \geq \kappa \right] \cdot \mathrm{E}\left[ Q \cdot (\mathrm{Avg}_n[Q] - \mu_n)^2 \right] + \mathrm{Pr}\left[ |\mathrm{Avg}_n[Q]| < \kappa \right] \cdot \mathrm{E}\left[ Q \cdot \mu_n^2 \right] \end{align*}$

It turns out that there are choices of $\mu_n$ for which $\mathrm{Pr}\left[ |\mathrm{Avg}_n[Q]| < \kappa \right] \to 1$ as $Q \to \infty$ , but at the same time $\mathrm{E}\left[ Q \cdot \mu_n^2 \right] \to \infty$ for the same process. In words, this means that there are choices of $\mu_n$ which make dimension $n$ always appear to your machines as an unimportant detail no matter how many stocks you look at, but this conclusion is still risky enough that if you applied to every one of these $Q$ stocks you’d get an infinitely risky portfolio.

e.g., consider the case where:

(17) $\begin{align*} \mu_n &= \hbar \cdot Q^{-1/4} \quad \text{with} \quad 0 < \hbar < 1 \end{align*}$

Then, it’s easy to see that:

(18) $\begin{align*} \mathrm{Pr}_{\mu_n} \left[ \ |\mathrm{Avg}_n[Q]| < Q^{-1/4} \ \right] &= \mathrm{Pr}_{\mu_n} \left[ \ Q^{-1/4} < \mathrm{Avg}_n[Q] < Q^{-1/4} \ \right] \\ &= \mathrm{Pr}_{\mu_n} \left[ \ \sqrt{Q} \cdot \left(- Q^{-1/4} - \mu_n \right) < z < \sqrt{Q} \cdot \left(- Q^{-1/4} - \mu_n \right) \ \right] \\ &= \mathrm{Pr}_{\mu_n} \left[ \ - Q^{1/4} \cdot ( 1 + \hbar ) < z < Q^{1/4} \cdot ( 1 - \hbar ) \ \right] \end{align*}$

where $z \overset{\scriptscriptstyle \mathrm{iid}}{\sim} N(0,1)$ . But, this means that as $Q \to \infty$ , we have that:

(19) $\begin{align*} \mathrm{Pr}_{\mu_n} \left[ \mathrm{Avg}_n[Q] < \kappa \right] &\to 1 \quad \text{while} \quad \mathrm{E}\left[ Q \cdot \mu_n^2 \right] \to \infty \end{align*}$

Thus, we have a proof by explicit construction. The punchline is that unleashing your machines on the market might lead them into situations where the scale of the position grows faster than the precision of the signal.

5. Empirical Prediction

What does this result mean in the real world? Well, the argument in the section above relied on the fact that when you used $Q$ different assets to derive a signal, you also then traded on all $Q$ assets. i.e., this is why the risk function has a factor of $Q$ in it. Thus, one way to get around this problem would be to commit to trading a smaller number of asset $Q'$ where:

(20) $\begin{align*} Q \gg Q' \end{align*}$

even though you had to use $Q$ assets to recover the signal. i.e., even if you have to grab signals from the $4$ corners of the market to create your trading strategy, you might nevertheless specialize in trading only a few assets so that in contrast to Equation (19) you would always have:

(21) $\begin{align*} \mathrm{Pr}_{\mu_n} \left[ \mathrm{Avg}_n[Q] < \kappa \right] &\to 1 \quad \text{while} \quad \mathrm{E}\left[ Q' \cdot \mu_n^2 \right] \to \mathrm{const} < \infty \end{align*}$

If traders are actually using signals from lots of assets, but only trading a few of them to avoid the infinite risk problem, then this theory would give a new motivation for excess comovement. i.e., you and I might feed the same data into our machines, get the same output, and choose to trade on entirely different subsets of stocks.

Identifying Relevant Asset Pricing Time Scales

August 21, 2013 by Alex

1. Introduction

Take a look at the figure below which displays the price level and trading volume of the S&P 500 SPDR over trading year from July 2012 to July 2013. The solid black line in the top panel shows the price process for the ETF at a daily frequency. Look at how much within day variation in the trading volume there is. You can see that the number of shares traded per day ranges over several orders of magnitude from roughly $1 \times 10^6$ to $300 \times 10^6$ . What’s more, the red vertical lines in the top figure show the intraday range for the traded price, and these bands stretch $\mathdollar 10$ per share in some cases. People are trading this ETF at all sorts of different investment horizons. Any model fit to daily data will ignore really interesting economics operating at these shorter investment horizons. Vice versa, any model fit to higher frequency minute-by-minute data will miss out on some longer buy-and-hold decisions.

In this post, I ask a pair of questions: (a) “Is it possible to recover the most ‘important’ investment horizons from the time series of SPDR prices?” and (b) “What statistical techniques might you use to do this?”

I work in reverse order. After outlining a toy stochastic process with a clear time scale pattern in Section 2, I start the real work in Sections 3 and 4 by discussing a pair different statistical tools you might use to uncover important time scales in this asset market. Here, when you read the word ‘important’ you should think ‘time scales where people are actually making decisions’. In Section 3, I outline the standard approach in asset pricing of using a time series regression with multiple lags. Then, in Section 4, I show how this technique has an equivalent waveform representation. After reading Sections 2 and 3 it may seem like it’s always possible to recover the relevant investment horizons from a financial time series. In Sections 5 and 6, I end on a down note by giving a counter example. The offending stochastic process is a workhorse in financial economics—namely, the Ornstein-Uhlenbeck process. Thus, the answer to question (a) seems to be: No.

You can find all of the code to create the figures below here.

2. Toy Stochastic Process

The next $2$ sections discuss different ways of recovering the relevant time scales from a financial data series. In particular, I am interested in the time series of log prices as I don’t want to have to worry about the series going negative. I define returns, $r_{t + \Delta t}$ , as:

(1) $\begin{align*} r_{t + \Delta t} &= \log p_{t+\Delta t} - \log p_t \end{align*}$

and assume that both log prices and returns are wide-sense stationary so that:

(2) $\begin{align*} \mathrm{E}[x_t] &= 0 \quad \text{and} \quad \mathrm{E}[x_t \cdot x_{t - h \cdot \Delta t}] = \mathrm{C}(h) \cdot \sigma^2 \end{align*}$

for $x_t \in \{ \log p_t, r_t \}$ where $\mathrm{C}(h)$ denotes the $h$ -period ahead autocorrelation function. I use $+\Delta t$ instead of the usual $+1$ in the time subscripts above because I want to emphasize the fact that the log price and return time series are scale dependent. In the analysis below, I’m going to think about running the analysis at the daily horizon so that $\Delta t = 1{\scriptstyle \mathrm{day}}$ .

To make this problem concrete, I use a particular numerical example:

(3) $\begin{align*} \log p_t &= \frac{95}{100} \cdot \left( \frac{1}{7} \cdot \sum_{h=1}^7 \log p_{(t - h){\scriptscriptstyle \mathrm{days}}} \right) - \frac{95}{100} \cdot \left( \frac{1}{30} \cdot \sum_{h=1}^{30} \log p_{(t - h){\scriptscriptstyle \mathrm{days}}} \right) + \frac{1}{10} \cdot \varepsilon_t \end{align*}$

I plot a year’s worth of daily data from this process (err… I am using calendar time rather than market time… so think $365$ not $252$ ) in the plot above. This process says that the log price today will go up by $95$ cents whenever the average log price over the last week was $1$ unit higher, but it will go down by $95$ cents whenever the average log price over the last month ago was $1$ unit higher. It’s a nice example to work with because there is an obvious pattern in log prices with a period of just south of $2$ months.

3. Autoregressive Representation

The most common way of accounting for time series predictability in asset pricing is to use an autoregression. e.g., you might run a regression of the log price level today on the log price level yesterday, on the log price level the day before, on the log price level the day before that, and so on…

(4) $\begin{align*} \log p_t &= \sum_{h=1}^H \beta_h \cdot \log p_{t-h} + \xi_{h,t} \end{align*}$

Note that because of the wide-sense stationarity of the log price process the regression coefficients simplify to just the horizon-specific autocorrelation:

(5) $\begin{align*} \beta_h &= \frac{\mathrm{C}(h) \cdot \mathrm{StD}[\log p_t] \cdot \mathrm{StD}[\log p_{t-h}]}{\mathrm{Var}[\log p_{t-h}]} \quad \text{and} \quad \mathrm{StD}[\log p_t] = \mathrm{StD}[\log p_{t-h}] \end{align*}$

Why might this approach make sense? First, for the process described in Equation (3), it’s obvious that the log price time series has an autoregressive representation since I constructed it that way. Second and more generally, this approach will hold due to Wold’s Theorem which states that every covariance stationary time series $x_t$ can be written as the sum of $2$ time series with the first time series completely deterministic and the second completely random:

(6) $\begin{align*} x_t &= \eta_t + \sum_{h=0}^{\infty} \mathrm{C}(h) \cdot \epsilon_{t-h}, \quad \sum_{h=1}^{\infty} |\mathrm{C}(h)|^2 < \infty \text{ and } \mathrm{C}(0) = 1 \end{align*}$

Here, $\eta_t$ is the completely deterministic time series and $\epsilon_t$ is the completely random white noise time series. The figure below shows the coefficient estimates, $\widehat{\mathrm{C}(h)}$ , from projecting the log price time series onto its past realizations for lags of anywhere from $h = 1{\scriptstyle \mathrm{day}}$ to $h = 3{\scriptstyle \mathrm{months}}$ .

4. Waveform Representation

Fun fact: There is also a waveform representation of the same autocorrelation function:

(7) $\begin{align*} \mathrm{C}(h) &= \int_{f \geq 0} \mathrm{S}(f) \cdot e^{i \cdot f \cdot h \cdot \Delta t} \cdot df \end{align*}$

This representation will always exist whenever the data have translational symmetry. i.e., put yourself in the role a trader thinking about buying a share of the S&P 500 SPDR again. If you had to make a prediction about tomorrow’s price level as a function of the log price level today, its values $1$ week ago, and its value $1$ month ago, you wouldn’t really care whether the current year was 1967, 1984, 1999, or 2013. This is just another way of saying that the autocorrelation coefficients only depend on the time gap.

Where does this alternative representation come from? Why translational symmetry? Plane waves turn out to be the eigenfunctions of the translation operator, $\mathrm{T}_\theta[\cdot]$ :

(8) $\begin{align*} \mathrm{T}_\theta[\mathrm{C}(h)] &= \mathrm{C}(h - \theta) \end{align*}$

In the context of this note, the translation operator eats autocorrelation functions and returns the value $\theta$ time periods to the right. i.e., if $\mathrm{C}(4{\scriptstyle \mathrm{days}})$ gave you the autocorrelation between the log price at any two points in time that are $4$ days apart, then $\mathrm{T}_{1{\scriptscriptstyle \mathrm{day}}}[\mathrm{C}(4{\scriptstyle \mathrm{days}})]$ would give you the autocorrelation between the log price at any two points in time that are $3$ days apart. Note that the translation operator is linear since translating the sum of functions is the same as the sum of translated functions. Thus, just as if $\mathrm{T}_\theta[\cdot]$ was a matrix, we can ask for the eigenfunctions of $\mathrm{T}_\theta[\cdot]$ written as $\mathrm{C}_{f}$ :

(9) $\begin{align*} \mathrm{T}_\theta[\mathrm{C}_f(h)] &= \mathrm{C}_f(h - \theta) = \lambda_{f,\theta} \cdot \mathrm{C}_f(h) \end{align*}$

Such a process is obviously given by the complex plane waves with $\mathrm{C}_f(h) = e^{i \cdot f \cdot h \cdot \Delta t}$ and $\lambda_{f,\theta} = e^{-i \cdot f \cdot \theta \cdot \Delta t}$ .

As a result, we can think about recovering all the information in the autocorrelation function at horizon $h$ by projecting it onto the eigenfunctions $\{ \mathrm{C}_f(h) \}_{f \geq 0}$ as depicted in the figure above known as a spectral density plot. This figure shows the results of $100$ regressions at frequencies in the range $[1/100{\scriptstyle \mathrm{days}},1/3{\scriptstyle \mathrm{days}}]$ :

(10) $\begin{align*} \log p_t &= \hat{a}_f \cdot \sin(f \cdot t) + \hat{b}_f \cdot \cos(f \cdot t) + \xi_{f,t} \end{align*}$

Roughly speaking, the coefficients $\hat{a}_f$ and $\hat{b}_f$ capture how predictive fluctuations at the frequency $f$ in units of $1/\mathrm{days}$ are of future log price movements. Thus, the summary statistic $(\hat{a}_f^2 + \hat{b}_f^2)/2$ captures how much of the variation in log prices is explained by historical movements at the frequency $f$ . This statistic is known as the power of the log price series at a particular frequency.

The Wiener-Khintchine Theorem formally links these two different ways of looking at the same autocorrelation information:

(11) $\begin{align*} \mathrm{C}(h) &= \sum_{f \geq 0} \mathrm{S}(f) \cdot e^{i \cdot f \cdot h \cdot \Delta t} \cdot \Delta f \quad \text{and} \quad \mathrm{S}(f) = \sum_{h \geq} \mathrm{C}(h) \cdot e^{- i \cdot f \cdot h \cdot \Delta t} \cdot \Delta h \end{align*}$

Using Euler’s formula that $e^{i \cdot x} = \cos(x) + i \cdot \sin(x)$ and keeping only the real component yields the following mapping from frequency space to autocorrelation space:

(12) $\begin{align*} \mathrm{C}_{\mathrm{WK}}(h) &= \sum_{f=0}^F \left( \frac{\widehat{\mathrm{S}(f)}}{\sum_{f'=0}^F \widehat{S(f')}} \right) \cdot \cos(f \cdot h \cdot \Delta t) \end{align*}$

where I assume that the range $[0,F]$ covers a sufficient amount of the relevant frequency spectrum. The figure below verifies the mathematics by showing the close empirical fit between the two calculations.

5. A Counter Example

After giving some tools to mine relevant time scales from financial time series in the previous sections, I conclude by giving an example of a simple stochastic process which thumbs its nose at these tools. Before actually looking at the example, it’s worthwhile to stop for a moment to think about the sort of process which might be hard to handle. You can see glimpses of it in the analysis above. Specifically, note how even though I created the time series in Equation (3) using a $7$ day moving average and a $30$ day moving average, there is no evidence of these $2$ time horizons in the sample autocorrelation coefficients. It’s not as if the figure shows a coefficient of:

(13) $\begin{align*} \widehat{\mathrm{C}(h)} &= \frac{95}{100} \cdot \left( \frac{1}{7} - \frac{1}{30} \right) \end{align*}$

for all lags $h \leq 7{\scriptstyle \mathrm{days}}$ . Likewise, the spectral density of the process shows a peak at somewhere between $1/7{\scriptstyle \mathrm{days}}$ and $1/30{\scriptstyle \mathrm{days}}$ . Thus, the time scale we see in the raw data is an emergent feature of the interaction of both the weekly and monthly effects. Intuitively, it would be very hard to identify the economically relevant time scale from a stochastic process where interesting features emerge at all time scales.

An Ornstein and Uhlenbeck gave an example of just such a stochastic process. Take a look at the figure above which plots the following Ornstein-Uhlenbeck (OU) process:

(14) $\begin{align*} d \log p_t &= \theta \cdot \left( \mu - \log p_t \right) \cdot dt + \sigma \cdot d\xi_t, \quad \text{with} \quad \mu = 0, \ \theta = 1 - e^{- \log 2 / 15}, \ \sigma = 1/10 \end{align*}$

With $dt = 1$ day, the equation above reads: “Daily changes in the log price are $0$ on average. However, the log price realizes daily kicks on the order of $1/10$ th of a percent, and these kicks have a half life of $15$ days.” Thus, it’s natural to think about this OU process as having a relevant time scale on the order of $1$ month, and you can see this time scale in the sample log price path. The peaks and troughs in the green line all last somewhere around $1$ month.

Here’s the punchline. Even though the process was explicitly constructed to have a relevant monthly time scale, there is no obvious bump at the monthly horizon in either the autoregressive representation or the waveform representation. In fact, OU processes are well known to produce $1/f$ noise—i.e., noise which follows a power law decay pattern as shown in the figure below. Kicks which have a half life on the order of $30$ days lead to emergent behavior at all time scales!

6. Uniqueness of Approximations

Of course, there is a mapping between the precise rate of decay in the figure below at the relevant time scale, but this is besides the point. You would have to know the exact stochastic process to know to reverse engineer the mapping. What’s more, this problem isn’t an issue that will be solved with more advanced filtering techniques such as wavelets. It’s not that the filtering technology is too coarse to capture the real structure. It’s that the real time scale structure created by the OU process itself is incredibly smooth. If you see a price process whose power spectrum mirrors that of an OU process with $1/f$ decay, you can’t be sure if its an OU process with a monthly time scale as above or a process economic decisions being made at each horizon.

This result has to do with the fact that even very well behaved approximations are only unique in a very narrow sense. What do I mean by this? Well, consider asymptotic approximations where the approximation error is smaller than the last term at each level of approximation. i.e., the approximation:

(15) $\begin{align*} f(\epsilon) &\sim \sum_{n=0}^N a_n \cdot f_n(\epsilon) \end{align*}$

is asymptotic to $f(\epsilon)$ as $\epsilon \to 0$ if for each $M \leq N$ :

(16) $\begin{align*} \frac{f(\epsilon) - \sum_{n=1}^N f_n(\epsilon)}{f_M(\epsilon)} &\to 0 \quad \text{as} \quad \epsilon \to 0 \end{align*}$

Asymptotic approximations are well behaved in the sense that you can naively add, subtract, multiply, divide, etc\ldots them just like they were numbers. What’s more, for a given choice of $\{ f_n \}_{n \geq 0}$ , all of the coefficients $\{ a_n \}_{n \geq 0}$ are unique.

At first this uniqueness result looks really promising! However, on closer inspection it’s clear that the result is rather finicky. e.g., the same function can have different asymptotic approximations:

(17) $\begin{align*} \text{as } \epsilon \to 0, \quad \tan(\epsilon) &\sim \epsilon + \frac{1}{3} \cdot \epsilon^3 + \frac{2}{15} \cdot \epsilon^5 \\ &\sim \sin(\epsilon) + \frac{1}{2} \cdot \sin(\epsilon)^3 + \frac{3}{8} \cdot \sin(\epsilon)^5 \\ &\sim \epsilon \cdot \cosh\left( \epsilon \cdot \sqrt{2/3} \right) + \frac{31}{270} \cdot \left( \epsilon \cdot \cosh\left( \epsilon \cdot \sqrt{2/3} \right) \right)^5 \end{align*}$

What’s more, different functions can have the same asymptotic approximations:

(18) $\begin{align*} e^{\epsilon} &\sim \sum_{n=0}^{\infty} \frac{\epsilon^n}{n!} \quad \text{as} \quad \epsilon \to \infty \\ e^{\epsilon} + e^{-1/\epsilon} &\sim \sum_{n=0}^{\infty} \frac{\epsilon^n}{n!} \quad \text{as} \quad \epsilon \searrow \infty \end{align*}$

What’s really interesting about this last example is that these $2$ functions have asymptotic approximations that share an infinite number of terms!

To close the loop, consider these approximation results in the context of the econometric analysis above. What I was doing in these exercises was picking a collection of $\{f_n\}_{n \geq 0}$ and then empirically estimating $\{a_n\}_{n \geq 0}$ . For each choice of approximations, I got a unique set of coefficients out. However, the counter example above in Section 5 shows that data generating functions with very different time scales can have very similar approximations. The analysis in this section shows that perhaps this result is not too surprising. A different way of putting this idea is that by choosing an approximation to data generating process, $f(\epsilon)$ , you are factoring the economic content of the series into $2$ different component: $\{a_n\}_{n \geq 0}$ and $\{f_n\}_{n \geq 0}$ . If you take a stand on the $\{f_n\}_{n \geq 0}$ terms, the corresponding $\{a_n\}_{n \geq 0}$ will certainly be unique; however, there is no guarantee that these coefficients carry all of the economic information that you want to recover from the data. e.g., the relevant time scale information might be buried in the $\{f_n\}_{n \geq 0}$ series rather than the coefficients $\{a_n\}_{n \geq 0}$ .

Sacrificing Noise Traders

July 1, 2013 by Alex

1. Introduction

One way to look at the stock market is as an information aggregation technology. For instance, imagine that you are the CEO of a pencil making company, and have to decide whether or not to stick with making old-fashioned wood pencils or to switch over to making mechanical pencils. If equity shares in lumber companies are publicly traded, you can pop open your laptop and look at their valuation online. Suppose you see that all lumber companies have low valuations and few other customers. In this world, you should really consider updating your product line. Note that it would be much harder to make this inference if all lumber company equity was privately held. No private equity shop is going to answer the phone and tell you that one of their investments is in the toilet. What’s more, (1) more analysts and (2) better informed analysts will study each lumber company’s business operations if there is publicly traded equity and no one has to know who these better informed analysts are ahead of time either. As I think Kevin Costner once said: “If there’s profits, they will come.”

The big question, though, is: Where do these profits come from? What entices informed traders to enter the market and push their information into prices? Asset pricing models such as Grossman and Stiglitz (1980) and Kyle (1985) give us the answer. Informed traders’ profits come directly from the stupidity of noise traders. These profits are transfer payments from the pockets of one group of citizens to the pockets of another. For a social planner, having prices that tell people about the fundamental values of important companies is a good thing. However, noise traders are people too, and sacrificing too many of them to get accurate prices is bad.

In this post, I use a simple, one period, Kyle (1985)-type model to ask the question: How many noise traders do you need to throw to the dogs in order to get accurate prices? Specifically, I think about a world with an asset that pays out $v \overset{\scriptscriptstyle \mathrm{iid}}{\sim} \mathrm{N}(0,\sigma_v^2)$ and has price $p$ . If you are the social planner, you then try to maximize the benefits of having informative prices, $\mathrm{Cov}[p,v]$ , minus the costs of wasting lots of noise traders who could be doing other productive things, $c(\text{noise traders})$ , subject to the constraint that it has to be worth it for informed traders to enter into the market, $\mathrm{E}[\text{informed trader profit}] \geq \bar{\pi}$ :

(1) $\begin{align*} \max_{\text{noise traders} \geq 0} \left\{ \mathrm{Cov}[p,v] - c(\text{noise traders}) \right\} \quad &\text{subject to} \quad \mathrm{E}[\text{informed trader profit}] \geq \bar{\pi} \end{align*}$

The crazy thing about setting the problem up this way is that the number of noise traders in the market doesn’t affect how informative the prices are:

(2) $\begin{align*} \mathrm{Cov}[p,v] &= \text{constant} \times \sigma_v^2 \end{align*}$

Put differently, as the social planner, you need to sacrifice enough noise traders so that informed traders actually like being traders and won’t change careers. If there aren’t enough noise traders, informed traders won’t make their reservation wage, $\bar{\pi}$ , and will switch over to being butchers or bakers. However, pumping more and more noise traders into the market won’t make prices any more informative.

2. Economic Model

How does the model work? Imagine that Alice decides to be an informed trader rather than a butcher. For all of her time studying the markets, she get rewarded with a signal, $s$ , about the fundamental value of a lumber company, Logs Inc:

(3) $\begin{align*} s &= v + \epsilon \quad \text{where} \quad \epsilon \overset{\scriptscriptstyle \mathrm{iid}}{\sim} \mathrm{N}(0,\sigma_\epsilon^2) \end{align*}$

Since everything in the model is nice and normally distributed, her posterior beliefs about the fundamental value of Logs Inc will be normally distributed with variance $\mathrm{Var}[v|s] = (1/\sigma_v^2 + 1/\sigma_\epsilon^2)^{-1}$ and mean $\mathrm{E}[v|s] = \mathrm{SNR} \cdot s$ where $\mathrm{SNR}$ denotes Alice’s signal to noise ratio:

(4) $\begin{align*} \mathrm{SNR} &= \frac{\sigma_v^2}{\sigma_v^2 + \sigma_\epsilon^2} \end{align*}$

For instance, if Alice just saw the fundamental value, $v$ , directly then $\sigma_\epsilon = 0$ and here signal to noise ratio would be $\mathrm{SNR} = 1$ . Conversely, as $\sigma_\epsilon \to \infty$ her signal becomes meaningless and her signal to noise ratio tends to $\mathrm{SNR} \to 0$ .

There is a competitive market maker for Logs Inc stock, Bob, who observes aggregate demand, $y$ , and sets the price equal to his conditional expectation of its fundamental value:

(5) $\begin{align*} y &= x + z \quad \text{where} \quad z \overset{\scriptscriptstyle \mathrm{iid}}{\sim} \mathrm{N}(0,\sigma_z^2) \\ p &= \mathrm{E}[v|y] \end{align*}$

Here $x$ denotes Alice’s informed demand for Logs Inc stock in units of shares and $z$ denotes noise trader demand for Logs Inc stock. When I say that Bob only observes aggregate demand, I mean that if Bob sees a buy order of $10$ shares, he has no idea whether (a) Alice wants to buy $20$ shares and the noise traders want to sell $10$ shares or (b) Alice wants to sell $10$ shares and the noise traders want to buy $20$ shares. The assumption of perfect competition for Bob means that he has to set the price equal to his conditional expectation. If he tried to hedge his bets and deviate from $p = \mathrm{E}[v|y]$ , someone else would step in and scoop his business.

If Alice is trying to maximize her expected profit, $\pi_x$ :

(6) $\begin{align*} \pi_x &= (v - p) \cdot x \end{align*}$

then an equilibrium would be a choice of demand for Alice, $x = \beta \cdot s$ , and a pricing rule for Bob, $p = \lambda \cdot y$ , such that:

Given Bob’s pricing rule, $p = \lambda \cdot y$ , Alices demand maximizes her expected profit.
Given Alice’s demand rule, $x = \beta \cdot s$ , Bob’s price equals his conditional expectation of Logs Inc’s fundamental value.

Taking Bob’s pricing rule as given and maximizing Alice’s expected profit, $\mathrm{E}[\pi_x|v]$ , yields an equation for her optimal demand given her realized signal, $s$ :

(7) $\begin{align*} x &= \left( \frac{\mathrm{SNR}}{2 \cdot \lambda} \right) \cdot s \end{align*}$

Substituting this demand rule into Bob’s conditional expectation then characterizes the equilibrium parameters $\beta$ and $\lambda$ that govern Alice and Bob’s demand and pricing rules:

(8) $\begin{align*} \lambda = \frac{\mathrm{Cov}[v,y]}{\mathrm{Var}[y]} &= \frac{\sqrt{\mathrm{SNR}}}{2} \cdot \frac{\sigma_v}{\sigma_z} \\ \beta = \frac{\mathrm{SNR}}{2 \cdot \lambda} &= \sqrt{\mathrm{SNR}} \cdot \frac{\sigma_z}{\sigma_v} \end{align*}$

In words, $\lambda \propto \sigma_v/\sigma_z$ means that the price of Logs Inc stock will be more responsive to aggregate demand shocks when there is more information to be revealed or when there are few noise traders to mask the information. Conversely, Alice’s demand will be more responsive to a strong private signal when there is more noise trading for her to hid behind or when she wasn’t expecting to discover much in the first place.

3. Unconditional Moments

With the model in place, we can now get to the interesting part of the analysis. Namely, the number of noise traders in the market doesn’t affect how informative prices are. To see this, note that prices have the following functional form:

(9) $\begin{align*} p &= \frac{\mathrm{SNR}}{2} \cdot (v+\epsilon) + \frac{\sqrt{\mathrm{SNR}}}{2} \cdot \frac{\sigma_v}{\sigma_z} \cdot z \end{align*}$

The key piece of this equation is that the coefficient on the fundamental value of Logs Inc, $\mathrm{SNR}/2$ , doesn’t have any dependence on the number of noise traders in the market. i.e., if the fundamental value of Logs Inc goes up by $\mathdollar 1$ , then the price of Logs Inc will go up by $\mathdollar \mathrm{SNR}/2$ on average, and this relationship will be true whether the volatility of noise trader demand is $1{\scriptstyle \mathrm{mil}}$ shares per period or $1$ share per period. More precisely, we have that the covariance of Logs Inc’s price with its fundamental value is:

(10) $\begin{align*} \mathrm{Cov}[p,v] &= \frac{\mathrm{SNR}}{2} \cdot \sigma_v^2 \end{align*}$

No matter how much noise trader demand there is, the price is always equally informative about the fundamental value. This is a very strong prediction!

4. Planner’s Problem

OK. So, we’ve written down a really simple model, and this model says that the number of noise traders doesn’t impact how informative prices are about asset fundamentals. What does this say about the original question? How should you, the social planner, decide the number of noise traders to sacrifice so that everyone in the economy can use the resulting price signals?

Well, the first thing to see is that all of Alice’s profits from being an informed trader rather than a butcher come at the expense of noise traders:

(11) $\begin{align*} \mathrm{E}[\pi_x] &= - \mathrm{E}[\pi_z] = \frac{\sqrt{\mathrm{SNR}}}{2} \cdot \sigma_z \cdot \sigma_v \end{align*}$

Essentially, the rest of the economy is subsidizing the financial market by an amount $\sqrt{\mathrm{SNR}} \cdot \sigma_z \cdot \sigma_v/2$ . It might be worth it if it’s really helpful for everyone in the economy to see an accurate valuation of Logs Inc. There are lots of transfer payments which people are happy to make (e.g., welfare, social security, etc…), but the key observation is that it’s a transfer payment. What’s more, since adding more noise traders doesn’t affect price informativeness, you are going to want to sacrifice the the minimum number of noise trader required to make sure that Alice is a trader and not a butcher:

(12) $\begin{align*} \frac{\sqrt{\mathrm{SNR}}}{2} \cdot \sigma_z \cdot \sigma_v &\geq \bar{\pi} \end{align*}$

To get an answer in numbers of noise traders, suppose that each noise trader that you anoint contributes demand variance, $\hbar$ , so that total noise trader demand variance is given by $\sigma_z^2 = N \cdot \hbar$ .

In this world, you need to sacrifice $N$ noise traders to make sure Alice becomes an informed trader:

(13) $\begin{align*} N &= 4 \cdot \frac{\bar{\pi}^2}{\hbar} \cdot \left( \frac{\sigma_v^2 + \sigma_\epsilon^2}{\sigma_v^4} \right) \end{align*}$

This equation says that you need to sacrifice (a) more noise traders when Alice can make more money being a butcher, (b) fewer noise traders when each of them is willing to trade more wildly, (c) fewer noise traders when there is more information about Logs Inc to be discovered, and (d) more noise traders when Alice’s signal about Logs Inc is more noisy.

5. Conclusion

Stock prices are useful signals that we pay for with noise trader demand. This post then used a Kyle (1985)-type model to answer a simple question: As a social planner, how many noise traders should you sacrifice? The interesting fact that pops out of this model is that noise trader demand volatility doesn’t affect price informativeness. It only affects informed trader profits. So as a social planner, you want to have just enough noise trader demand volatility in the market to get Alice to figure out the value of Logs Inc.

A natural question to conclude with is: How general is this result? Surely there are times when noise trader demand shocks affect price informativeness in the real world. In the model, these $2$ aspects of the economy are completely divorced due to the fact that the equilibrium price impact coefficient, $\lambda$ , and the equilibrium demand coefficient, $\beta$ , in some sense undo one another:

(14) $\begin{align*} \lambda \times \beta &= \text{constant} \end{align*}$

Put differently, and increase in noise trader demand will make Alice trade more aggressively since it will be harder for Bob to figure out whether or not changes in aggregate demand are due to Alice or the noise traders. However, as a result, Bob will respond by moving the price around less in response to equal sized changes in aggregate demand.

To see how delicate this canceling out actually is, imagine the Bob has beliefs about the volatility of the underlying asset that are off by $(100 \times \eta)\%$ . e.g., if $\eta = 0.05$ then he would believe that fundamental volatility was $\mathdollar 1.05$ when it was in fact $\mathdollar 1.00$ . When Alice gets a really strong signal abou the fundamental value, $\mathrm{SNR} = 1$ , this canceling out seems to be quite robust:

(15) $\begin{align*} \mathrm{Cov}[p,v] &= \frac{1}{2} \cdot \sigma_v^2 \cdot \left\{ 1 + \eta + \eta^2 + \cdots \right\} \end{align*}$

Small errors in Bob’s beliefs decay pretty quickly. However, when $\mathrm{SNR} \searrow 0$ , problems can occur and the delicate balance between $\lambda$ and $\beta$ can break down:

(16) $\begin{align*} \mathrm{Cov}[p,v] &= \frac{\mathrm{SNR}}{2} \cdot \sigma_v^2 \cdot \left\{ 1 + \eta \cdot (2 - \mathrm{SNR}) + \cdots \right\} \end{align*}$

In percentage terms, small errors in Bob’s beliefs could really add up in situations where the $\mathrm{SNR}$ is low. Since $\mathrm{SNR} \leq 1$ , the factor multiplying $\eta$ is greater than unity. Thus, when Alice gets really weak signals about the fundamental value of Logs Inc, minor errors in Bob’s understanding of the market can lead to wildly incorrect pricing.

Spontaneous Cognition Equilibrium

June 24, 2013 by Alex

1. Motivation

This note develops an information-based asset pricing model based on Tirole (2009) where thinking through market contingencies is costly and fear of missing an important detail restrains trading behavior. For example, think about a statistical arbitrageur who decided not to release the throttle on an otherwise profitable trading strategy because she noticed that it had an unexplained industry $\beta$ . Alternatively, consider a value investor who didn’t fully invest in a seemingly undervalued conglomerate because of his unfamiliarity with every single one of its business lines. In both of these examples, traders weren’t necessarily afraid of releasing too much information to the market while building their position; instead, they were afraid of taking on a large position and then being held-up by “Mr. Market” after the fact.

To see how the model works, imagine you’re a market neutral statistical arbitrageur who usually puts together a position of type $\mathbf{a}$ to exploit the momentum anomaly at the monthly horizon. This position has a price of $p$ dollars, usually generates a payout of $v > 0$ dollars, and costs Mr. Market $c_{\mathrm{Transact}} > 0$ dollars to put together. You and Mr. Market split the gains to trade with you getting a fraction, $\theta_{\mathrm{Trader}}$ , and Mr. Market getting a fraction, $\theta_{\mathrm{Market}}$ , so that:

(1) $\begin{align*} 1 &= \theta_{\mathrm{Trader}} + \theta_{\mathrm{Market}} \end{align*}$

In a more general model the distribution of these bargaining positions would be an equilibrium object, but for now I take them as given.

Portfolio position $\mathbf{a}$ is the best position you could put together using the available information, but you realize that something may well go wrong. The problem is that you simply can’t write out every single possible contingency. There are just too many. With probability $\rho > 0$ , your boiler-plate portfolio position $\mathbf{a}$ will only deliver a value of $(v - \delta)$ dollars where $\delta > 0$ . In this situation, you actually need to put together the position $\mathbf{a}'$ to exploit momentum while staying market neutral. Portfolio $\mathbf{a}'$ is different from $\mathbf{a}$ but impossible to specify ahead of time. Rebalancing your position ex post will cost $c_{\mathrm{Rebal}} > 0$ dollars. For example, during the Quant Crisis of August 2007 the market suddenly and unexpectedly went sideways for quantitative traders in long-short equity positions. According to Khandani and Lo (2007), some of the most consistently profitable quant funds in the history of the industry reported “month-to-date losses ranging from $5{\scriptstyle \%}$ to $30{\scriptstyle \%}$ ” of assets under management.

Before trading, you can exert cognitive effort to find out about what may go wrong and how to put together your portfolio accordingly. I assume that once you start trading the position $\mathbf{a}'$ rather than $\mathbf{a}$ , Mr. Market immediately knows that $\mathbf{a}'$ rather than $\mathbf{a}$ is optimal. Put differently, changing your usual behavior is an eye-opener. The entire reason for search for the correct portfolio position is to avoid the ex post rebalancing costs. I denote this cognitive effort by:

(2) $\begin{align*} \mathrm{Eff}(\pi) &\geq 0 \end{align*}$

where $\pi$ denotes the probability that you discover the correct portfolio position $\mathbf{a}'$ conditional on $\mathbf{a}$ not being the right position. Thus, in contrast to the Veldkamp 2006 model, here you invest cognitive effort until your marginal thinking costs equal the change in your expected ex post hold up costs.

Model timing.

2. Agents and Assets

Suppose that you are thinking of putting together a portfolio position denoted by the $(N \times 1)$ -dimensional vector, $\mathbf{a}$ . To do this, you have to buy and sell stocks from Mr. Market subject to a fixed transaction cost of $c_{\mathrm{Transact}} > 0$ . For example, imagine you’re a market neutral statistical arbitrageur who usually puts together a position of type $\mathbf{a}$ to exploit the momentum anomaly. Initially, you believe that the price of this portfolio position is $p$ dollars:

(3) $\begin{align*} \widetilde{\mathrm{E}}[\mathbf{x}^{\top} \mathbf{a}] &= p \end{align*}$

where $\mathbf{x}$ denotes the $(N \times 1)$ -dimensional vector of random payouts of the $N$ assets in the market. For example, in the classic Jegadeesh and Titman (1993) setting, you would repurchase $1/6$ th of your portfolio holdings each month at a total price of $p$ dollars when using a momentum holding period of $6$ months.

However, with probability $\rho \in (0,1)$ your initial ideas about how the market will play out turn out to be wrong, and the portfolio position $\mathbf{a}$ won’t deliver the required payouts. Instead, the position will only be worth $(v - \delta)$ dollars where $\delta > 0$ . For example, perhaps lots of other statistical arbitrageurs are also putting on a similar position. In such a world, your seemingly well-hedged position would be anything but. You could easily lose a large chunk of your assets under management if you didn’t quickly rebalance your portfolio in the event of sudden fire sales as documented in Khandani and Lo (2007). There is a different portfolio $\mathbf{a}'$ that delivers your desired payout, but after you put on the initial position $\mathbf{a}$ it will take an adjustment cost of $c_{\mathrm{Rebal}} > 0$ dollars to switch over to the portfolio $\mathbf{a}'$ . I assume that it is worth it for you to enter into the market even if you know that you will never discover the correct portfolio position ahead of time:

(4) $\begin{align*} 0 &< (v - \rho \cdot c_{\mathrm{Rebal}}) - c_{\mathrm{Transact}} \end{align*}$

3. Information Structure

You and Mr. Market can agree to transact the portfolio position $\mathbf{a}$ . This portfolio position may or may not be what suits your needs as the buyer. If it doesn’t, an initially unknown portfolio position $\mathbf{a}'$ will deliver the desired payouts provided that you can return to the market and rebalance your position. At the initial stage, though, both you and Mr. Market are aware only of $\mathbf{a}$ , although you both know that it may not be the right one. You both may, before contracting, incur a cognitive cost to think about alternatives to $\mathbf{a}$ , and whomever finds out that portfolio position $\mathbf{a}'$ is the right one can decide whether to immediately trade on this information or not. The key idea here is that the discovery of the correct position is an “eye-opener”.

I assume that you have bargaining power $\theta_{\mathrm{Trader}}$ and Mr. Market has bargaining power $\theta_{\mathrm{Market}}$ where $\theta_{\mathrm{Trader}} + \theta_{\mathrm{Market}} = 1$ . For example, if you are the only trader in the market trying to sell the stocks required by portfolio $\mathbf{a}$ and there are lots of other traders lining up to buy them, then your bargaining power, $\theta_{\mathrm{Trader}}$ , will be close to $1$ . Conversely, if you are one of many traders trying to short a hard to locate stock, then your bargaining power, $\theta_{\mathrm{Trader}}$ , will be close to $0$ .

Before trading, you can incur thinking costs of $\mathrm{Eff}(\pi)$ . Here, $\pi$ denotes the probability that you will discover the correct portfolio position $\mathbf{a}'$ conditional on $\mathbf{a}$ not being the right one. For example, if you wanted to know when lots of other statistical arbitrageurs were also putting on a similar position and thus destroying your market neutrality $50{\scriptstyle \%}$ of the time, then $\pi = 0.50$ and you would be $\mathrm{Eff}(0.50)$ dollars in cognitive costs to maintain this level of informativeness. I assume both that:

(5) $\begin{align*} \mathrm{Eff}(0) &= 0 \quad \text{and} \quad \mathrm{Eff}(1) = \infty \end{align*}$

as well as that:

(6) $\begin{align*} \frac{d^2\mathrm{Eff}}{(d\pi)^2} &> \frac{\rho^2 \cdot (1 - \rho) \cdot \theta_{\mathrm{Market}} \cdot \delta}{(1 - \rho \cdot \pi)^2} \end{align*}$

The first assumption reads that learning nothing costs you nothing while it is prohibitively expensive to always know when you need to deviate from the standard position. The second assumption guarantees a unique solution to the cognitive effort optimization problem described in Equation (14) below.

4. Asset Pricing

In this world, how much effort should you expend trying to figure out if $\mathbf{a}$ is the correct portfolio position? At what point is it worth it to just trade and then clean up any mess after the fact? Well, first let’s consider the case where you don’t discover the correct portfolio position ahead of time. Conditional on not finding an alternative portfolio, $\mathbf{a}'$ , the posterior probability that $\mathbf{a}$ is not correct conditional on searching with intensity $\pi$ is given by:

(7) $\begin{align*} \hat{\rho}(\pi) &= \frac{\rho \cdot (1 - \pi)}{1 - \rho \cdot \pi} \end{align*}$

The numerator is probability that $\mathbf{a}'$ is the correct portfolio, $\rho$ , times the probability that you didn’t discover this fact during your search, $(1 - \pi)$ . The denominator is the probability that you didn’t find an alternative portfolio, $1 - \rho \cdot pi$ . If $\mathbf{a}'$ is the appropriate portfolio then Mr. Market captures a fraction $\theta_{\mathrm{Market}}$ of the renegotiation gain creating a hold-up:

(8) $\begin{align*} h &= \theta_{\mathrm{Market}} \cdot (\delta - c_{\mathrm{Rebal}}) \end{align*}$

where $(\delta - c_{\mathrm{Rebal}})$ dollars is the surplus available to be split after realizing ex post that $\mathbf{a}'$ is the correct portfolio. Let $\pi^*$ denote your equilibrium level of search. The ex ante price $p(\pi^*)$ for portfolio $\mathbf{a}$ accounts for the possible hold-up so that:

(9) $\begin{align*} p(\pi^*) - \left\{ c_{\mathrm{Transact}} - \hat{\rho}(\pi^*) \cdot h \right\} &= \theta_{\mathrm{Market}} \cdot \left\{ (v - c_{\mathrm{Transact}}) - \hat{\rho}(\pi^*) \cdot c_{\mathrm{Rebal}} \right\} \end{align*}$

This equation reads that the price you are willing to pay for the portfolio $\mathbf{a}$ given your search equilibrium intensity $\pi^*$ minus the transaction cost and expected hold up costs must equal Mr. Markets share of the gains to trade. Rewriting this equation to isolate the price function yields:

(10) $\begin{align*} p(\pi^*) &= c_{\mathrm{Transact}} + \theta_{\mathrm{Market}} \cdot \left\{ (v - c_{\mathrm{Transact}}) - \hat{\rho}(\pi^*) \cdot \delta \right\} \end{align*}$

Next, let’s consider the case where you find out that $\mathbf{a}'$ is the correct portfolio after expending some cognitive effort. You now how $2$ choices:

You can trade portfolio $\mathbf{a}'$ and disclose its correctness to Mr. Market.
You can still trade portfolio $\mathbf{a}$ and then rebalance your position ex post.

By disclosing $\mathbf{a}'$ , you will realize a fraction $\theta_{\mathrm{Trader}}$ of the gains to trade, $\theta_{\mathrm{Trader}} \cdot (v - c_{\mathrm{Transact}})$ . If you conceal $\mathbf{a}'$ and continue to trade position $\mathbf{a}$ anyways, you will get $v - (c_{\mathrm{Rebal}} + h) - p(\pi^*)$ where the middle term comes from the fact that you know you will have to rebalance your portfolio and the price is given by Equation (10) above. Combining these two expressions and simplifying then says that revealing the correct portfolio position yields an efficiency gain of:

(11) $\begin{align*} \Delta U_{\mathrm{Trader}} &= \theta_{\mathrm{Trader}} \cdot c_{\mathrm{Rebal}} + \left\{ 1 - \hat{\rho}(\pi^*) \right\} \cdot \theta_{\mathrm{Market}} \cdot \delta > 0 \end{align*}$

The first term in this equation says that you avoid paying your share of the rebalancing costs. The second term in this equation says that you capture Mr. Market’s expected share of the gains to rebalancing. After all, if you start trading portfolio $\mathbf{a}'$ immediately, then the price no longer has to account for the fact that Mr. Market might hold you up later for his share of the gains to rebalancing, $\delta$ . The important thing about Equation (11) is that it’s always positive. Thus, you always want to trade on $\mathbf{a}'$ when you find out that this is the right portfolio.

5. Cognitive Effort

Note that even before getting into any mathematical details, it’s clear that a social planner would ask you to look for new contingencies until the marginal cost of looking for the next important detail would just offset of the expected rebalancing costs, $\pi^{**}$ :

(12) $\begin{align*} \left. \frac{d \mathrm{Eff}}{d\pi} \right|_{\pi^{**}} &= \rho \cdot c_{\mathrm{Rebal}} \end{align*}$

Moreover, in the absence of any rebalancing costs, $c_{\mathrm{Rebal}} = 0$ , any investment in cognition is purely rent-seeking!

So what happens in the model? You choose your level of cognitive effort by solving the optimization problem:

(13) $\begin{align*} U_{\mathrm{Trader}} &= \max_{\pi \in [0,1]} \Big\{ \rho \cdot \pi \cdot \theta_{\mathrm{Trader}} \cdot (v - c_{\mathrm{Transact}}) \\ &\qquad \qquad \qquad + \ \rho \cdot (1 - \pi) \cdot \left\{ v - (c_{\mathrm{Rebal}} + h) - p(\pi^*)\right\} \\ &\qquad \qquad \qquad \qquad + \ (1 - \rho) \cdot \left\{ v - p(\pi^*) \right\} \\ &\qquad \qquad \qquad \qquad \qquad - \ \mathrm{Eff}(\pi) \Big\} \end{align*}$

What are the terms in this equation? Well, there are $3$ possible outcomes: $\mathbf{a}'$ could be the right portfolio and you could discover it right away, $\mathbf{a}'$ could be the right portfolio and you might not discover it until it’s too late, and $\mathbf{a}$ could be the right portfolio all along. The first $3$ terms represent your payouts in each of these states weighted by the probabilities that they occur. The final term is just your cognitive costs.

Differentiating with respect to your cognitive effort level, $\pi$ , and observing that in equilibrium is has to be that $\pi = \pi^*$ yields:

(14) $\begin{align*} \left.\frac{d\mathrm{Eff}}{d\pi} \right|_{\pi^*} &= \rho \cdot c_{\mathrm{Rebal}} + \rho \cdot h - \rho \cdot \left( \frac{\rho \cdot (1 - \pi^*)}{1 - \rho \cdot \pi^*} \right) \cdot \theta_{\mathrm{Market}} \cdot \delta \end{align*}$

This equation is pretty interesting. It says that you will search until your marginal cost of mulling over your portfolio equals your expected rebalancing costs plus a pair of additional terms. The first extra term says that you will increase your cognitive efforts in order to avoid being held up by Mr. Market after the fact. The second additional term says that you won’t fully account for this hold up problem since you can adjust the price ex ante. The sum of these additional terms will always be greater than $0$ ; thus, you will always expend too much cognitive effort. Put differently, this model is populated by Woody Allen traders who neurotically search for negative contingencies.

6. Discussion

This model delivers a couple of interesting implications. First, spontaneous cognition is a natural source of noise in financial markets. This is an attractive feature since noise traders are akin to theoretical dark matter. Without them, informed traders would be unable to exploit their informational advantage in existing noisy rational expectations models. Nevertheless, these traders are inherently difficult to identify in the data and make welfare analysis problematic. How should the social planner weight noise trader utility?

Second, traders spend too much time trying to identify the perfect portfolio position. If you are a statistical arbitrageur, you obviously can’t sit on the sidelines until you have a sure fire strategy. You have to trade even when you are not completely certain about your strategy’s payouts. No one will pay you fees to sit on their money. Interestingly, I find that traders might well choose to expend too much cognitive effort looking for holes in your strategy in order not to be held-up by Mr. Market. I have never seen another model predict this.

Third and finally, traders specialize in identifying bad news for themselves. A trader has little motivation to look for confirming evidence that position $\mathbf{a}$ is correct. After all, this would be his portfolio position if he exerted no effort whatsoever and spent the morning in sweatpants on his couch. In fact, the adverse selection problem can be severe enough that traders will not go through with the trade unless they find something wrong with the boiler-plate portfolio position.

Factors vs. Characteristics

May 13, 2013 by Alex

1. Introduction

Fama and French (1993) found that both a firm’s size and its book-to-market ratios are highly correlated with its average excess return as illustrated in Figure 1 below. For instance, the center panel says that stocks with low book-to-market ratios (i.e., the $5$ portfolios at the bottom linked with an orange line) have too high a $\beta_{\mathrm{Mkt},n}$ on the market when considering their paltry realized excess returns. For some reason, it doesn’t take much to get traders to hold growth stocks.

FIGURE 1. Left Panel: Average excess returns vs. the market beta for $25$ portfolios sorted on the basis of size and book-to-market ratio using monthly data over the time period from July 1963 to December 1993. Center Panel: Same $25$ data points connected by book-to-market ratio with $\mathrm{BM}_{\mathrm{Low}}$ denoting the $5$ portfolios in the lowest book-to-market quintile. Right Panel: Same $25$ data points connected by size with $\mathrm{S}_{\mathrm{Low}}$ denoting the $5$ portfolios in the lowest size quintile. Plots correspond to Figures 20.9, 20.10, and 20.11 in Cochrane (2001).

This post reviews the analysis in Daniel and Titman (1997) which asks the natural follow up question: Why? The original explanation proposed in Fama and French (1993) was that these additional excess returns earned by small firms with high book-to-market ratios were due to exposures to latent risk factors. e.g., a stock with a high book-to-market ratio will tend to do poorly when the entire economy suffers from a financial crisis and precisely when you need cash the most. As a result, you are willing to pay less in order to hold this risk. However, Daniel and Titman (1997) suggest an alternative explanation: some omitted variable both causes value stocks to earn higher excess returns (i.e., have a high $\alpha_{n,t}$ ) and comove with one another (i.e., have a high $\beta_{\mathrm{HML},n}$ ).

Daniel and Titman (1998) highlight a nice parallel between the causal inference problem outlined above, and the inference problem facing an econometrician when trying to figure out the causal effect of going to college on a student’s future earnings. We all know that people with college degrees earn more over their lifetime than people without college degrees (e.g., see Card (1999)). Just as above, the main question is: Why? On one hand, it could be that the process of getting a degree raises your earning power (analogous to the “factor model”). However, it could also be that IQ really drives everyone’s lifetime earnings and on average people with higher IQs are more likely to get college degrees (analogous to the “characteristics model”). In this situation, finding that college graduates earn more than non-graduates says nothing about the relative value of person $n$ ‘s IQ or her degree in determining her salary:

(1) $\begin{align*} \mathrm{salary}_n &= \mu + \lambda_{\mathrm{GRAD}} \cdot 1_{\{\mathrm{GRAD}_n = 1\}} + \xi_n, \quad \lambda_{\mathrm{GRAD}} > 0 \end{align*}$

Similarly, finding that stocks with high book-to-market ratios realize higher excess returns says nothing about where these excess returns are coming from. The only real conceptual difference between the two inference problems is in the case of graduation vs. IQ, the inputs to the regression are data; by contrast, in the case of factors vs characteristics, the inputs to the regression are estimated coefficients:

(2) $\begin{align*} \alpha_n &= \mu + \lambda_{\mathrm{HML}} \cdot \beta_{\mathrm{HML},n} + \xi_n, \quad \lambda_{\mathrm{HML}} > 0 \end{align*}$

where $\alpha_n$ is the monthly abnormal return to holding stock/portfolio $n$ and $\beta_{\mathrm{HML},n}$ is stock $n$ ‘s loading on the high-minus-low book-to-market factor from Fama and French (1993).

I begin in Section 2 by describing Fama and French (1993)‘s interpretation of the size and value premia. Then, in Section 3, I outline the alternative interpretation of these effects given by Daniel and Titman (1997). The authors propose a test to determine if some of the effect of the size and value premia flow through a channel other than the factor loadings. In Section 4, I describe this test and replicate their empirical analysis suggesting that there is indeed a component to the size and value premia that cannot be explained by factor loadings. Finally, in Section 5, I conclude with a short discussion of Daniel and Titman (1997)‘s results. All of the code used to create the figures in this post can be found on GitHub.

2. Distress Factor Loading

This section describes Fama and French (1993)‘s interpretation of the value premia—i.e., the higher excess returns earned by stocks with a high book-to-market ratio. A stock with a high book-to-market ratio has lots of tangible assets on its books in accounting terms (i.e., a high book value); however, the market does not value the equity in this company very highly (i.e., a low market capitalization). These stocks are in financial distress. Define $\tilde{r}_{n,t+1}$ as the abnormal return to stock $n$ after accounting for its comovement with the market return:

(3) $\begin{align*} \tilde{r}_{n,t+1} &= \left( r_{n,t+1} - r_{f,t+1} \right) - \beta_{\mathrm{Mkt},n} \cdot r_{\mathrm{Mkt},t+1} \end{align*}$

The Figure 2 below shows that firms with high book-to-market ratios have really high returns and firms with low book-to-market ratios have really low returns on average.

FIGURE 2. Monthly excess returns of $25$ portfolios sorted on size and book-to-market ratio using data from July 1963 to December 1993. e.g., the time series in the $\mathrm{BM}_{\mathrm{Low}} \times \mathrm{S}_{\mathrm{High}}$ panel in the lower left-hand corner corresponds to the monthly excess returns over the $30$ -day T-bill rate of a value weighted portfolio of stocks in the lowest book-to-market ratio quintile and the highest size quintile. The $\mu$ value reported in the lower right-hand corner of each panel represents the mean excess return over the sample period and corresponds to the values reported in Table 1(a) from Daniel and Titman (1997). The height of the shaded red region in each panel is $\mu$ which makes it easier to see how the mean excess returns vary across the $25$ portfolios.

If a financial crisis comes along it will hit all of the firms already in financial distress the hardest. Fama and French (1993) point out that the outsized excess returns earned by high book-to-market stocks is consistent with the idea that traders don’t want to find out that their stocks have become worthless in the middle of a financial crisis. Thus, in order to hold these stocks, they must be rewarded with higher average excess returns. If this story is true, then these higher average excess returns will result from a larger $\beta_{\mathrm{HML},n} \cdot \mathrm{E}_t[f_{\mathrm{HML},t+1}]$ term in the intercept to the regression equation:

(4) $\begin{align*} \tilde{r}_{n,t+1} &= \mathrm{E}_t[\tilde{r}_{n,t+1}] + \beta_{\mathrm{HML},n} \cdot f_{\mathrm{HML},t+1} + \varepsilon_{n,t+1} \\ &= \underbrace{\left( \mathrm{E}_t[\tilde{r}_{n,t+1}] - \beta_{\mathrm{HML},n} \cdot \mathrm{E}_t[f_{\mathrm{HML},t+1}] \right)}_{\alpha_{n,t}} + \beta_{\mathrm{HML},n} \cdot \left( f_{\mathrm{HML},t+1} - \mathrm{E}_t[f_{\mathrm{HML},t+1}] \right) + \varepsilon_{n,t+1} \end{align*}$

One way to test this hypothesis would be to create a group of $N$ test assets, run $N$ versions of the time series regression specified in Equation (4) above to collect the $\alpha_{n,t}$ and $\beta_{\mathrm{HML},n}$ coefficients, and test to see if a nice linear relationship holds between the realized excess returns and each stock/portfolio’s loading on the HML factor:

(5) $\begin{align*} \alpha_{n,t} &= \mu + \lambda_{\mathrm{HML}} \cdot \beta_{\mathrm{HML},n} + \xi_{n,t} \end{align*}$

Figure 3 below shows that causal diagram assumed in Fama and French (1993) linking the each stock’s average excess returns, $\alpha_{n,t}$ , to its loading on the HML factor, $\beta_{\mathrm{HML},n}$ . Figure 4 then shows that controlling for exposure to size and book-to-market ratio explains away much of the residual variation in the excess returns of the $25$ test assets in Fama and French (1993) that isn’t explained by their comovement with the market.

FIGURE 3. Causal diagram linking the coefficients $\alpha_{n,t}$ and $\beta_{\mathrm{HML},n}$ assumed in Fama and French (1993).

FIGURE 4. Average excess return vs. excess return predicted by the Fama and French (1993) 3-factor model computed for $25$ portfolios sorted on the basis of size and book-to-market ratio using monthly data over the time period from July 1963 to December 1993. Left Panel: Data points connected by book-to-market ratio with $\mathrm{BM}_{\mathrm{Low}}$ denoting the $5$ portfolios in the lowest book-to-market quintile. Right Panel: Data points connected by size with $\mathrm{S}_{\mathrm{Low}}$ denoting the $5$ portfolios in the lowest size quintile. Plots correspond to Figures 20.12 and 20.13 in Cochrane (2001).

3. Characteristics-Based Pricing

In this section I describe Daniel and Titman (1997)‘s alternative interpretation of the value premium. These authors start with a similar first stage regression model:

(6) $\begin{align*} \tilde{r}_{n,t+1} &= \mathrm{E}_t[\tilde{r}_{n,t+1}|D_n] + \beta_{\mathrm{HML},n} \cdot f_{\mathrm{HML},t+1} + \varepsilon_{n,t+1} \end{align*}$

but replace the unconditional expectation $\mathrm{E}_t[\tilde{r}_{n,t+1}]$ with the conditional expectation $\mathrm{E}_t[\tilde{r}_{n,t+1}|D_n]$ . i.e., they propose that there is an omitted variable related to the fundamental “distressed-ness” of each firm $n$ . Under this hypothesis, as a firm gets more and more financially distressed, its average excess returns must rise by an amount $\lambda_D$ in order to induce traders to hold the stock. Thus, the time series regression in Equation (4) becomes:

(7) $\begin{align*} \tilde{r}_{n,t+1} &= \underbrace{\left( \mathrm{E}_t[\tilde{r}_{n,t+1}] - D_n \cdot \lambda_D - \beta_{\mathrm{HML},n} \cdot \mathrm{E}_t[f_{\mathrm{HML},t+1}] \right)}_{\alpha_{n,t}} \\ &\qquad \qquad + \ \beta_{\mathrm{HML},n} \cdot \left( f_{\mathrm{HML},t+1} - \mathrm{E}_t[f_{\mathrm{HML},t+1}] \right) + \varepsilon_{n,t+1} \end{align*}$

with the second stage regression:

(8) $\begin{align*} \alpha_{n,t} &= \mu + \lambda_{\mathrm{HML}} \cdot \beta_{\mathrm{HML},n} + \lambda_D \cdot D_n + \xi_{n,t} \end{align*}$

Figure 5 below shows the causal diagram assumed in Daniel and Titman (1997) linking the each stock’s average excess returns, $\alpha_{n,t}$ , to its loading on the HML factor, $\beta_{\mathrm{HML},n}$ , and its distressed-ness, $D_n$ . The dotted line linking $\beta_{\mathrm{HML},n}$ and $D_n$ captures the idea that distressed firms are likely to have larger loadings on the $\mathrm{HML}$ factor in the same way that people with higher IQs are more likely to go to college.

FIGURE 5. Causal diagram linking the coefficients $\alpha_{n,t}$ and $\beta_{\mathrm{HML},n}$ assumed in Daniel and Titman (1997).

The natural way to break this logjam and determine whether the value premium is due to a factor loadings or characteristic-based explanation would be to use an instrument. e.g., find some variable that is correlated with each firm’s factor loading, $\beta_{\mathrm{HML},n}$ , but uncorrelated with its distress status, $D_n$ ; or, find some variable that is correlated with each firm’s distress status but uncorrelated with its factor loading. Similarly, to solve the graduation vs. IQ debate from the introduction, you would need either an instrument that randomly assigns people with the same IQ to college and non-college groups or an instrument that randomly shocks people’s IQs once they have made their college decision one way or another.

Daniel and Titman (1997) instrument for each firm’s level of distress, $D_n$ . Note that the analogy to the instrumental variables approach here is imprecise since we can’t actually observe each firm’s level of distress directly. e.g., it would be impossible to predict the variable $D_n$ in a regression. Within each size and book-to-market bucket, Daniel and Titman (1997) use a firm’s exposure to the $\mathrm{HML}$ factor prior to the portfolio formation period as the instrument:

(9) $\begin{align*} Z_n &= \{ z_L, z_2, z_3, z_4, z_H\} \end{align*}$

The logic behind this instrument is the following: If characteristics drive expected returns, there should be firms with characteristics that do not match their factor loadings. All the stocks in the same size and book-to-market deciles will have the same loading on the $\mathrm{HML}$ factor. However, within each of the size and book-to-market buckets, there will be firms whose returns have been highly correlated with the $\mathrm{HML}$ factor in the past as well as firms whose returns have been weakly correlated with the $\mathrm{HML}$ factor in the past. Daniel and Titman (1997) think about this within group historical variation as exogenous and use it to instrument for each firm’s true level of distress.

I use $Z_n = z_H$ to denote the firms with the highest historical correlation with the $\mathrm{HML}$ factor and $Z_n = z_L$ to denote the firms with the lowest historical correlation. To empirically estimate whether or not more distressed firms earn higher average excess returns independent of their $\mathrm{HML}$ factor loading, Daniel and Titman (1997) first sort stocks into size and book-to-market buckets to create a residual $\tilde{\alpha}_{n,t}$ that captures the excess returns not explain by firms’ factor loadings:

(10) $\begin{align*} \tilde{\alpha}_{n,t} &= \alpha_{n,t} - \left( \mu + \lambda_{\mathrm{HML}} \cdot \beta_{\mathrm{HML},n} \right) \end{align*}$

They then compute:

(11) $\begin{align*} \mathrm{E}[\tilde{\alpha}_{n,t} | Z_n = z_H] - \mathrm{E}[\tilde{\alpha}_{n,t} | Z_n = z_L] &= \lambda_D \cdot \left( \mathrm{E}[D_n | Z_n = z_H] - \mathrm{E}[D_n | Z_n = z_L ] \right) \\ &\qquad \qquad - \ \left( \mathrm{E}[\xi_{n,t} | Z_n = z_H] - \mathrm{E}[\xi_{n,t} | Z_n = z_L ] \right) \end{align*}$

which captures the mean effect of being more distressed, $\lambda_D$ , times the average level of additional distressed experienced by firms with a high historical correlation with the $\mathrm{HML}$ factor:

(12) $\begin{align*} \mathrm{E}[D_n | Z_n = z_H] - \mathrm{E}[D_n | Z_n = z_L ] \end{align*}$

4. Empirical Analysis

This section replicates the main empirical results in Daniel and Titman (1997). I calculate each stock’s book equity using COMPUSTAT data as the stock holder’s equity plus any deferred taxes and any investment tax credit, minus the value of any preferred stock. I calculate each stock’s market equity using CRSP data as the number of shares outstanding times its share price. To compute the book-to-market ratio in year $t$ , I use the book equity value from any point in year $(t - 1)$ , and the market equity on the last trading day in year $(t - 1)$ . The market equity value used in forming the size portfolios is the last trading day of June of year $t$ . I exclude firms that have been listed on COMPUSTAT for less than $2$ years or have a book-to-market ratio of less than $0$ . I demand that firms have prices available on CRSP in both December of $(t - 1)$ and June of year $t$ . See Figure 6 below for a summary of the timing.

FIGURE 6. Timing of the portfolio creation and holding periods associated with the size and book-to-market portfolios analyzed in Daniel and Titman (1997).

I use the size and book-to-market ratio data to create the Fama and French (1993) $\mathrm{SMB}$ and $\mathrm{HML}$ factors as follows. For the $\mathrm{SMB}$ factor, big stocks $(B)$ are above the median market equity of NYSE firms and small stocks $(S)$ are below the median. For the $\mathrm{HML}$ factor, low book-to-market ratio stocks $(L)$ are below the $30$ th percentile of the book-to-market ratios of NYSE firms, medium book-to-market ratio stocks $(M)$ are in the middle $40{\scriptstyle \%}$ percent, and high book-to-market ratio stocks $H$ are in the top $30{\scriptstyle \%}$ . Using these buckets, I then form $6$ value-weighted portfolios and then estimate the $\mathrm{SMB}$ and $\mathrm{HML}$ factors as the intersection of these portfolio returns:

(13) $\begin{align*} f_{\mathrm{HML},t} &= \left( \frac{r_{S,H,t} + r_{B,H,t}}{2} \right) - \left( \frac{r_{S,L,t} + r_{B,L,t}}{2} \right) \\ f_{\mathrm{SMB},t} &= \left( \frac{r_{S,H,t} + r_{S,M,t} + r_{S,L,t}}{3} \right) - \left( \frac{r_{B,H,t} + r_{B,M,t} + r_{B,L,t}}{3} \right) \end{align*}$

To create the $25$ size and book to market portfolio returns, I use cutoffs at $20{\scriptstyle \%}$ , $40{\scriptstyle \%}$ , $60{\scriptstyle \%}$ , and $80{\scriptstyle \%}$ for both the size and book-to-market ratio dimensions. To create the $9$ size and book to market portfolio returns, I use cutoffs at $33{\scriptstyle \%}$ and $66{\scriptstyle \%}$ for both the size and book-to-market ratio dimensions.

To estimate a firm’s historical exposure to the $\mathrm{HML}$ factor, I take all of the firms in each of the $9$ size and book-to-market ratio buckets as of July each year $t$ . For each of these firms, I then estimate the following time series regression from January of $(t-3)$ to December of $(t-1)$ for a total of $36$ months:

(14) $\begin{align*} r_{n,t} &= \alpha_n + \beta_{\mathrm{Mkt},n} \cdot r_{\mathrm{Mkt},t} + \beta_{\mathrm{HML},n} \cdot f_{\mathrm{HML},t+1} + \beta_{\mathrm{SMB},n} \cdot f_{\mathrm{SMB},t} + \varepsilon_{n,t} \end{align*}$

I harvest the regression coefficients and sort the stocks into $5$ buckets based on the realized $\beta_{\mathrm{HML},n}$ loadings to assign a value of $Z_n$ to each firm using cutoffs at $20{\scriptstyle \%}$ , $40{\scriptstyle \%}$ , $60{\scriptstyle \%}$ , and $80{\scriptstyle \%}$ . Thus, a firm in the $Z_n = z_H$ bucket in July $2005$ had a $\beta_{\mathrm{HML},n}$ loading from January $2002$ to December $2004$ that was among the highest $20{\scriptstyle \%}$ within its size and book-to-market grouping. I drop the $6$ month period between July $2005$ and December $2004$ because it appears that the returns to stocks in the $\mathrm{HML}$ portfolio behave abnormally over this sample period as illustrated in Figure 7 below.

FIGURE 7. Pre-formation returns to stocks in the HML portfolio for formation dates during the period from July 1963 to July 1993. The thick black line represents the mean value, the vertical bars represent the $95{\scriptstyle \%}$ confidence bounds around this mean in each month, and the $2$ -digit numbers label the realized returns to stocks in the HML portfolio $\tau$ months prior to portfolio formation in the year $19\mathrm{YY}$ . This figure corresponds to Figure 1 in Daniel and Titman (1997).

Now comes the punchline of the paper: a portfolio that is long firms in the high distress group, $z_H$ , and short firms in the low distress group, $z_L$ , within each of the $9$ size and book-to-market buckets generates abnormal returns relative to the Fama and French (1993) $3$ factor model. To see this, first take a look at Figure 8 below. Just as in Figure 2, it’s clear that a stock’s average excess returns rise as it becomes smaller and its book-to-market ratio gets larger. i.e., the average height of the numbers increases as you move northwest across the panels. However, Figure 8 also shows that, within each of the $9$ size and book-to-market portfolios, firms with higher historical loadings on the $\mathrm{HML}$ factor tend to earn higher excess returns. i.e., the average height of the numbers increases as you move from left to right within each of the panels. What’s more, moving to Figure 9 reveals that this effect is robust to the Fama and French (1993) $3$ factor model. Figure 9 plots the coefficient estimates and standard errors to the $9$ time series regressions:

(15) $\begin{align*} r_{z_H,t+1} - r_{z_L,t+1} &= \alpha + \beta_{\mathrm{Mkt},t+1} \cdot r_{\mathrm{Mkt},t+1} + \beta_{\mathrm{HML},t+1} \cdot f_{\mathrm{HML},t+1} + \beta_{\mathrm{SMB},t+1} \cdot f_{\mathrm{SMB},t+1} + \varepsilon_{t+1} \end{align*}$

All of the estimated $\alpha$ s are positive except for $1$ , $2$ are statistically significant at the $5{\scriptstyle \%}$ level, and $2$ more are quite close to this threshold. By contrast, a purely factor model explanation would predict that all of these $\alpha$ s should be $0$ .

FIGURE 8. Mean monthly excess returns of the $45$ portfolios sorted on size, book-to-market, and pre-formation HML factor loading using data from July 1973 to December 1993. The blue numbers labelled “Actual” correspond to the values reported in Table 3 of Daniel and Titman (1997). The red numbers labelled “Estimated” correspond to the values that I calculated. e.g., this figure reads that I estimate the average, value-weighted, monthly excess return of stocks in the lowest size tercile, highest book-to-market tercile, and lowest pre-formation HML factor loading quintile to be $0.906{\scriptstyle \%/\mathrm{mo}}$ while the value reported in Table 3 of Daniel and Titman (1997) is $1.211{\scriptstyle \%/\mathrm{mo}}$ .

FIGURE 9. Estimated coefficients and $R^2$ s from the regression in Equation (15) estimated within each of the $9$ size and book-to-market buckets. The dots represent the point estimates. The vertical lines represent the $95{\scriptstyle \%}$ confidence intervals. All statistically significant coefficients are flagged in red. e.g., this figure reads that within the group of stocks with the lowest book-to-market ratio and the highest market capitalization (e.g., the bottom left panel), firms with the highest historical loading on the $\mathrm{HML}$ factor (i.e., the most distressed firms) had excess returns that were $0.87{\scriptstyle \%/\mathrm{mo}}$ higher than firms with the lowest historical loading on the $\mathrm{HML}$ factor (i.e., the least distressed firms). The estimated values in this figure correspond to the values reported in Table 6 of Daniel and Titman (1997).

5. Discussion

Daniel and Titman (1997) is a really nice paper that makes a very simple and insightful point: factor loadings do not imply a causal relationship. They support this point by giving evidence that even after controlling for factor exposure, firm’s which are more distressed prior to portfolio formation (i.e., have a distress characteristic) earn higher returns. However, there is a big caveat that comes with the findings. Namely, any characteristics-based model of stock returns necessarily admits arbitrage. After all, a characteristics-based explanation for the value premium says that by choosing stocks with different characteristics, you can change your portfolio’s average return without adjusting its risk loadings. i.e., you can create an arbitrage opportunity. This fact makes it difficult to interpret the phrase “characteristics-based explanation.” As Arthur Eddington (1934) wrote, “it is a good rule not to put overmuch confidence in the observational results that are put forward until they have been confirmed by theory.”

« Previous Page