Factors vs. Characteristics

1. Introduction

Fama and French (1993) found that both a firm’s size and its book-to-market ratios are highly correlated with its average excess return as illustrated in Figure 1 below. For instance, the center panel says that stocks with low book-to-market ratios (i.e., the $5$ portfolios at the bottom linked with an orange line) have too high a $\beta_{\mathrm{Mkt},n}$ on the market when considering their paltry realized excess returns. For some reason, it doesn’t take much to get traders to hold growth stocks.

FIGURE 1. Left Panel: Average excess returns vs. the market beta for $25$ portfolios sorted on the basis of size and book-to-market ratio using monthly data over the time period from July 1963 to December 1993. Center Panel: Same $25$ data points connected by book-to-market ratio with $\mathrm{BM}_{\mathrm{Low}}$ denoting the $5$ portfolios in the lowest book-to-market quintile. Right Panel: Same $25$ data points connected by size with $\mathrm{S}_{\mathrm{Low}}$ denoting the $5$ portfolios in the lowest size quintile. Plots correspond to Figures 20.9, 20.10, and 20.11 in Cochrane (2001).

This post reviews the analysis in Daniel and Titman (1997) which asks the natural follow up question: Why? The original explanation proposed in Fama and French (1993) was that these additional excess returns earned by small firms with high book-to-market ratios were due to exposures to latent risk factors. e.g., a stock with a high book-to-market ratio will tend to do poorly when the entire economy suffers from a financial crisis and precisely when you need cash the most. As a result, you are willing to pay less in order to hold this risk. However, Daniel and Titman (1997) suggest an alternative explanation: some omitted variable both causes value stocks to earn higher excess returns (i.e., have a high $\alpha_{n,t}$ ) and comove with one another (i.e., have a high $\beta_{\mathrm{HML},n}$ ).

Daniel and Titman (1998) highlight a nice parallel between the causal inference problem outlined above, and the inference problem facing an econometrician when trying to figure out the causal effect of going to college on a student’s future earnings. We all know that people with college degrees earn more over their lifetime than people without college degrees (e.g., see Card (1999)). Just as above, the main question is: Why? On one hand, it could be that the process of getting a degree raises your earning power (analogous to the “factor model”). However, it could also be that IQ really drives everyone’s lifetime earnings and on average people with higher IQs are more likely to get college degrees (analogous to the “characteristics model”). In this situation, finding that college graduates earn more than non-graduates says nothing about the relative value of person $n$ ‘s IQ or her degree in determining her salary:

(1) $\begin{align*} \mathrm{salary}_n &= \mu + \lambda_{\mathrm{GRAD}} \cdot 1_{\{\mathrm{GRAD}_n = 1\}} + \xi_n, \quad \lambda_{\mathrm{GRAD}} > 0 \end{align*}$

Similarly, finding that stocks with high book-to-market ratios realize higher excess returns says nothing about where these excess returns are coming from. The only real conceptual difference between the two inference problems is in the case of graduation vs. IQ, the inputs to the regression are data; by contrast, in the case of factors vs characteristics, the inputs to the regression are estimated coefficients:

(2) $\begin{align*} \alpha_n &= \mu + \lambda_{\mathrm{HML}} \cdot \beta_{\mathrm{HML},n} + \xi_n, \quad \lambda_{\mathrm{HML}} > 0 \end{align*}$

where $\alpha_n$ is the monthly abnormal return to holding stock/portfolio $n$ and $\beta_{\mathrm{HML},n}$ is stock $n$ ‘s loading on the high-minus-low book-to-market factor from Fama and French (1993).

I begin in Section 2 by describing Fama and French (1993)‘s interpretation of the size and value premia. Then, in Section 3, I outline the alternative interpretation of these effects given by Daniel and Titman (1997). The authors propose a test to determine if some of the effect of the size and value premia flow through a channel other than the factor loadings. In Section 4, I describe this test and replicate their empirical analysis suggesting that there is indeed a component to the size and value premia that cannot be explained by factor loadings. Finally, in Section 5, I conclude with a short discussion of Daniel and Titman (1997)‘s results. All of the code used to create the figures in this post can be found on GitHub.

2. Distress Factor Loading

This section describes Fama and French (1993)‘s interpretation of the value premia—i.e., the higher excess returns earned by stocks with a high book-to-market ratio. A stock with a high book-to-market ratio has lots of tangible assets on its books in accounting terms (i.e., a high book value); however, the market does not value the equity in this company very highly (i.e., a low market capitalization). These stocks are in financial distress. Define $\tilde{r}_{n,t+1}$ as the abnormal return to stock $n$ after accounting for its comovement with the market return:

(3) $\begin{align*} \tilde{r}_{n,t+1} &= \left( r_{n,t+1} - r_{f,t+1} \right) - \beta_{\mathrm{Mkt},n} \cdot r_{\mathrm{Mkt},t+1} \end{align*}$

The Figure 2 below shows that firms with high book-to-market ratios have really high returns and firms with low book-to-market ratios have really low returns on average.

FIGURE 2. Monthly excess returns of $25$ portfolios sorted on size and book-to-market ratio using data from July 1963 to December 1993. e.g., the time series in the $\mathrm{BM}_{\mathrm{Low}} \times \mathrm{S}_{\mathrm{High}}$ panel in the lower left-hand corner corresponds to the monthly excess returns over the $30$ -day T-bill rate of a value weighted portfolio of stocks in the lowest book-to-market ratio quintile and the highest size quintile. The $\mu$ value reported in the lower right-hand corner of each panel represents the mean excess return over the sample period and corresponds to the values reported in Table 1(a) from Daniel and Titman (1997). The height of the shaded red region in each panel is $\mu$ which makes it easier to see how the mean excess returns vary across the $25$ portfolios.

If a financial crisis comes along it will hit all of the firms already in financial distress the hardest. Fama and French (1993) point out that the outsized excess returns earned by high book-to-market stocks is consistent with the idea that traders don’t want to find out that their stocks have become worthless in the middle of a financial crisis. Thus, in order to hold these stocks, they must be rewarded with higher average excess returns. If this story is true, then these higher average excess returns will result from a larger $\beta_{\mathrm{HML},n} \cdot \mathrm{E}_t[f_{\mathrm{HML},t+1}]$ term in the intercept to the regression equation:

(4) $\begin{align*} \tilde{r}_{n,t+1} &= \mathrm{E}_t[\tilde{r}_{n,t+1}] + \beta_{\mathrm{HML},n} \cdot f_{\mathrm{HML},t+1} + \varepsilon_{n,t+1} \\ &= \underbrace{\left( \mathrm{E}_t[\tilde{r}_{n,t+1}] - \beta_{\mathrm{HML},n} \cdot \mathrm{E}_t[f_{\mathrm{HML},t+1}] \right)}_{\alpha_{n,t}} + \beta_{\mathrm{HML},n} \cdot \left( f_{\mathrm{HML},t+1} - \mathrm{E}_t[f_{\mathrm{HML},t+1}] \right) + \varepsilon_{n,t+1} \end{align*}$

One way to test this hypothesis would be to create a group of $N$ test assets, run $N$ versions of the time series regression specified in Equation (4) above to collect the $\alpha_{n,t}$ and $\beta_{\mathrm{HML},n}$ coefficients, and test to see if a nice linear relationship holds between the realized excess returns and each stock/portfolio’s loading on the HML factor:

(5) $\begin{align*} \alpha_{n,t} &= \mu + \lambda_{\mathrm{HML}} \cdot \beta_{\mathrm{HML},n} + \xi_{n,t} \end{align*}$

Figure 3 below shows that causal diagram assumed in Fama and French (1993) linking the each stock’s average excess returns, $\alpha_{n,t}$ , to its loading on the HML factor, $\beta_{\mathrm{HML},n}$ . Figure 4 then shows that controlling for exposure to size and book-to-market ratio explains away much of the residual variation in the excess returns of the $25$ test assets in Fama and French (1993) that isn’t explained by their comovement with the market.

FIGURE 3. Causal diagram linking the coefficients $\alpha_{n,t}$ and $\beta_{\mathrm{HML},n}$ assumed in Fama and French (1993).

FIGURE 4. Average excess return vs. excess return predicted by the Fama and French (1993) 3-factor model computed for $25$ portfolios sorted on the basis of size and book-to-market ratio using monthly data over the time period from July 1963 to December 1993. Left Panel: Data points connected by book-to-market ratio with $\mathrm{BM}_{\mathrm{Low}}$ denoting the $5$ portfolios in the lowest book-to-market quintile. Right Panel: Data points connected by size with $\mathrm{S}_{\mathrm{Low}}$ denoting the $5$ portfolios in the lowest size quintile. Plots correspond to Figures 20.12 and 20.13 in Cochrane (2001).

3. Characteristics-Based Pricing

In this section I describe Daniel and Titman (1997)‘s alternative interpretation of the value premium. These authors start with a similar first stage regression model:

(6) $\begin{align*} \tilde{r}_{n,t+1} &= \mathrm{E}_t[\tilde{r}_{n,t+1}|D_n] + \beta_{\mathrm{HML},n} \cdot f_{\mathrm{HML},t+1} + \varepsilon_{n,t+1} \end{align*}$

but replace the unconditional expectation $\mathrm{E}_t[\tilde{r}_{n,t+1}]$ with the conditional expectation $\mathrm{E}_t[\tilde{r}_{n,t+1}|D_n]$ . i.e., they propose that there is an omitted variable related to the fundamental “distressed-ness” of each firm $n$ . Under this hypothesis, as a firm gets more and more financially distressed, its average excess returns must rise by an amount $\lambda_D$ in order to induce traders to hold the stock. Thus, the time series regression in Equation (4) becomes:

(7) $\begin{align*} \tilde{r}_{n,t+1} &= \underbrace{\left( \mathrm{E}_t[\tilde{r}_{n,t+1}] - D_n \cdot \lambda_D - \beta_{\mathrm{HML},n} \cdot \mathrm{E}_t[f_{\mathrm{HML},t+1}] \right)}_{\alpha_{n,t}} \\ &\qquad \qquad + \ \beta_{\mathrm{HML},n} \cdot \left( f_{\mathrm{HML},t+1} - \mathrm{E}_t[f_{\mathrm{HML},t+1}] \right) + \varepsilon_{n,t+1} \end{align*}$

with the second stage regression:

(8) $\begin{align*} \alpha_{n,t} &= \mu + \lambda_{\mathrm{HML}} \cdot \beta_{\mathrm{HML},n} + \lambda_D \cdot D_n + \xi_{n,t} \end{align*}$

Figure 5 below shows the causal diagram assumed in Daniel and Titman (1997) linking the each stock’s average excess returns, $\alpha_{n,t}$ , to its loading on the HML factor, $\beta_{\mathrm{HML},n}$ , and its distressed-ness, $D_n$ . The dotted line linking $\beta_{\mathrm{HML},n}$ and $D_n$ captures the idea that distressed firms are likely to have larger loadings on the $\mathrm{HML}$ factor in the same way that people with higher IQs are more likely to go to college.

FIGURE 5. Causal diagram linking the coefficients $\alpha_{n,t}$ and $\beta_{\mathrm{HML},n}$ assumed in Daniel and Titman (1997).

The natural way to break this logjam and determine whether the value premium is due to a factor loadings or characteristic-based explanation would be to use an instrument. e.g., find some variable that is correlated with each firm’s factor loading, $\beta_{\mathrm{HML},n}$ , but uncorrelated with its distress status, $D_n$ ; or, find some variable that is correlated with each firm’s distress status but uncorrelated with its factor loading. Similarly, to solve the graduation vs. IQ debate from the introduction, you would need either an instrument that randomly assigns people with the same IQ to college and non-college groups or an instrument that randomly shocks people’s IQs once they have made their college decision one way or another.

Daniel and Titman (1997) instrument for each firm’s level of distress, $D_n$ . Note that the analogy to the instrumental variables approach here is imprecise since we can’t actually observe each firm’s level of distress directly. e.g., it would be impossible to predict the variable $D_n$ in a regression. Within each size and book-to-market bucket, Daniel and Titman (1997) use a firm’s exposure to the $\mathrm{HML}$ factor prior to the portfolio formation period as the instrument:

(9) $\begin{align*} Z_n &= \{ z_L, z_2, z_3, z_4, z_H\} \end{align*}$

The logic behind this instrument is the following: If characteristics drive expected returns, there should be firms with characteristics that do not match their factor loadings. All the stocks in the same size and book-to-market deciles will have the same loading on the $\mathrm{HML}$ factor. However, within each of the size and book-to-market buckets, there will be firms whose returns have been highly correlated with the $\mathrm{HML}$ factor in the past as well as firms whose returns have been weakly correlated with the $\mathrm{HML}$ factor in the past. Daniel and Titman (1997) think about this within group historical variation as exogenous and use it to instrument for each firm’s true level of distress.

I use $Z_n = z_H$ to denote the firms with the highest historical correlation with the $\mathrm{HML}$ factor and $Z_n = z_L$ to denote the firms with the lowest historical correlation. To empirically estimate whether or not more distressed firms earn higher average excess returns independent of their $\mathrm{HML}$ factor loading, Daniel and Titman (1997) first sort stocks into size and book-to-market buckets to create a residual $\tilde{\alpha}_{n,t}$ that captures the excess returns not explain by firms’ factor loadings:

(10) $\begin{align*} \tilde{\alpha}_{n,t} &= \alpha_{n,t} - \left( \mu + \lambda_{\mathrm{HML}} \cdot \beta_{\mathrm{HML},n} \right) \end{align*}$

They then compute:

(11) $\begin{align*} \mathrm{E}[\tilde{\alpha}_{n,t} | Z_n = z_H] - \mathrm{E}[\tilde{\alpha}_{n,t} | Z_n = z_L] &= \lambda_D \cdot \left( \mathrm{E}[D_n | Z_n = z_H] - \mathrm{E}[D_n | Z_n = z_L ] \right) \\ &\qquad \qquad - \ \left( \mathrm{E}[\xi_{n,t} | Z_n = z_H] - \mathrm{E}[\xi_{n,t} | Z_n = z_L ] \right) \end{align*}$

which captures the mean effect of being more distressed, $\lambda_D$ , times the average level of additional distressed experienced by firms with a high historical correlation with the $\mathrm{HML}$ factor:

(12) $\begin{align*} \mathrm{E}[D_n | Z_n = z_H] - \mathrm{E}[D_n | Z_n = z_L ] \end{align*}$

4. Empirical Analysis

This section replicates the main empirical results in Daniel and Titman (1997). I calculate each stock’s book equity using COMPUSTAT data as the stock holder’s equity plus any deferred taxes and any investment tax credit, minus the value of any preferred stock. I calculate each stock’s market equity using CRSP data as the number of shares outstanding times its share price. To compute the book-to-market ratio in year $t$ , I use the book equity value from any point in year $(t - 1)$ , and the market equity on the last trading day in year $(t - 1)$ . The market equity value used in forming the size portfolios is the last trading day of June of year $t$ . I exclude firms that have been listed on COMPUSTAT for less than $2$ years or have a book-to-market ratio of less than $0$ . I demand that firms have prices available on CRSP in both December of $(t - 1)$ and June of year $t$ . See Figure 6 below for a summary of the timing.

FIGURE 6. Timing of the portfolio creation and holding periods associated with the size and book-to-market portfolios analyzed in Daniel and Titman (1997).

I use the size and book-to-market ratio data to create the Fama and French (1993) $\mathrm{SMB}$ and $\mathrm{HML}$ factors as follows. For the $\mathrm{SMB}$ factor, big stocks $(B)$ are above the median market equity of NYSE firms and small stocks $(S)$ are below the median. For the $\mathrm{HML}$ factor, low book-to-market ratio stocks $(L)$ are below the $30$ th percentile of the book-to-market ratios of NYSE firms, medium book-to-market ratio stocks $(M)$ are in the middle $40{\scriptstyle \%}$ percent, and high book-to-market ratio stocks $H$ are in the top $30{\scriptstyle \%}$ . Using these buckets, I then form $6$ value-weighted portfolios and then estimate the $\mathrm{SMB}$ and $\mathrm{HML}$ factors as the intersection of these portfolio returns:

(13) $\begin{align*} f_{\mathrm{HML},t} &= \left( \frac{r_{S,H,t} + r_{B,H,t}}{2} \right) - \left( \frac{r_{S,L,t} + r_{B,L,t}}{2} \right) \\ f_{\mathrm{SMB},t} &= \left( \frac{r_{S,H,t} + r_{S,M,t} + r_{S,L,t}}{3} \right) - \left( \frac{r_{B,H,t} + r_{B,M,t} + r_{B,L,t}}{3} \right) \end{align*}$

To create the $25$ size and book to market portfolio returns, I use cutoffs at $20{\scriptstyle \%}$ , $40{\scriptstyle \%}$ , $60{\scriptstyle \%}$ , and $80{\scriptstyle \%}$ for both the size and book-to-market ratio dimensions. To create the $9$ size and book to market portfolio returns, I use cutoffs at $33{\scriptstyle \%}$ and $66{\scriptstyle \%}$ for both the size and book-to-market ratio dimensions.

To estimate a firm’s historical exposure to the $\mathrm{HML}$ factor, I take all of the firms in each of the $9$ size and book-to-market ratio buckets as of July each year $t$ . For each of these firms, I then estimate the following time series regression from January of $(t-3)$ to December of $(t-1)$ for a total of $36$ months:

(14) $\begin{align*} r_{n,t} &= \alpha_n + \beta_{\mathrm{Mkt},n} \cdot r_{\mathrm{Mkt},t} + \beta_{\mathrm{HML},n} \cdot f_{\mathrm{HML},t+1} + \beta_{\mathrm{SMB},n} \cdot f_{\mathrm{SMB},t} + \varepsilon_{n,t} \end{align*}$

I harvest the regression coefficients and sort the stocks into $5$ buckets based on the realized $\beta_{\mathrm{HML},n}$ loadings to assign a value of $Z_n$ to each firm using cutoffs at $20{\scriptstyle \%}$ , $40{\scriptstyle \%}$ , $60{\scriptstyle \%}$ , and $80{\scriptstyle \%}$ . Thus, a firm in the $Z_n = z_H$ bucket in July $2005$ had a $\beta_{\mathrm{HML},n}$ loading from January $2002$ to December $2004$ that was among the highest $20{\scriptstyle \%}$ within its size and book-to-market grouping. I drop the $6$ month period between July $2005$ and December $2004$ because it appears that the returns to stocks in the $\mathrm{HML}$ portfolio behave abnormally over this sample period as illustrated in Figure 7 below.

FIGURE 7. Pre-formation returns to stocks in the HML portfolio for formation dates during the period from July 1963 to July 1993. The thick black line represents the mean value, the vertical bars represent the $95{\scriptstyle \%}$ confidence bounds around this mean in each month, and the $2$ -digit numbers label the realized returns to stocks in the HML portfolio $\tau$ months prior to portfolio formation in the year $19\mathrm{YY}$ . This figure corresponds to Figure 1 in Daniel and Titman (1997).

Now comes the punchline of the paper: a portfolio that is long firms in the high distress group, $z_H$ , and short firms in the low distress group, $z_L$ , within each of the $9$ size and book-to-market buckets generates abnormal returns relative to the Fama and French (1993) $3$ factor model. To see this, first take a look at Figure 8 below. Just as in Figure 2, it’s clear that a stock’s average excess returns rise as it becomes smaller and its book-to-market ratio gets larger. i.e., the average height of the numbers increases as you move northwest across the panels. However, Figure 8 also shows that, within each of the $9$ size and book-to-market portfolios, firms with higher historical loadings on the $\mathrm{HML}$ factor tend to earn higher excess returns. i.e., the average height of the numbers increases as you move from left to right within each of the panels. What’s more, moving to Figure 9 reveals that this effect is robust to the Fama and French (1993) $3$ factor model. Figure 9 plots the coefficient estimates and standard errors to the $9$ time series regressions:

(15) $\begin{align*} r_{z_H,t+1} - r_{z_L,t+1} &= \alpha + \beta_{\mathrm{Mkt},t+1} \cdot r_{\mathrm{Mkt},t+1} + \beta_{\mathrm{HML},t+1} \cdot f_{\mathrm{HML},t+1} + \beta_{\mathrm{SMB},t+1} \cdot f_{\mathrm{SMB},t+1} + \varepsilon_{t+1} \end{align*}$

All of the estimated $\alpha$ s are positive except for $1$ , $2$ are statistically significant at the $5{\scriptstyle \%}$ level, and $2$ more are quite close to this threshold. By contrast, a purely factor model explanation would predict that all of these $\alpha$ s should be $0$ .

FIGURE 8. Mean monthly excess returns of the $45$ portfolios sorted on size, book-to-market, and pre-formation HML factor loading using data from July 1973 to December 1993. The blue numbers labelled “Actual” correspond to the values reported in Table 3 of Daniel and Titman (1997). The red numbers labelled “Estimated” correspond to the values that I calculated. e.g., this figure reads that I estimate the average, value-weighted, monthly excess return of stocks in the lowest size tercile, highest book-to-market tercile, and lowest pre-formation HML factor loading quintile to be $0.906{\scriptstyle \%/\mathrm{mo}}$ while the value reported in Table 3 of Daniel and Titman (1997) is $1.211{\scriptstyle \%/\mathrm{mo}}$ .

FIGURE 9. Estimated coefficients and $R^2$ s from the regression in Equation (15) estimated within each of the $9$ size and book-to-market buckets. The dots represent the point estimates. The vertical lines represent the $95{\scriptstyle \%}$ confidence intervals. All statistically significant coefficients are flagged in red. e.g., this figure reads that within the group of stocks with the lowest book-to-market ratio and the highest market capitalization (e.g., the bottom left panel), firms with the highest historical loading on the $\mathrm{HML}$ factor (i.e., the most distressed firms) had excess returns that were $0.87{\scriptstyle \%/\mathrm{mo}}$ higher than firms with the lowest historical loading on the $\mathrm{HML}$ factor (i.e., the least distressed firms). The estimated values in this figure correspond to the values reported in Table 6 of Daniel and Titman (1997).

5. Discussion

Daniel and Titman (1997) is a really nice paper that makes a very simple and insightful point: factor loadings do not imply a causal relationship. They support this point by giving evidence that even after controlling for factor exposure, firm’s which are more distressed prior to portfolio formation (i.e., have a distress characteristic) earn higher returns. However, there is a big caveat that comes with the findings. Namely, any characteristics-based model of stock returns necessarily admits arbitrage. After all, a characteristics-based explanation for the value premium says that by choosing stocks with different characteristics, you can change your portfolio’s average return without adjusting its risk loadings. i.e., you can create an arbitrage opportunity. This fact makes it difficult to interpret the phrase “characteristics-based explanation.” As Arthur Eddington (1934) wrote, “it is a good rule not to put overmuch confidence in the observational results that are put forward until they have been confirmed by theory.”