Research Notebook

Why do ‘as if’ critiques only apply to survey evidence?

November 10, 2020 by Alex

Milton Friedman laid out his methodological approach to doing economics in his 1953 essay, The Methodology of Positive Economics. This essay gives his answer to the question: What constitutes a good economic model? Or, put differently, how would you recognize a good economic model if you saw one?

According to Friedman, “the only relevant test of the validity of a hypothesis is the comparison of its predictions with experience. The hypothesis is rejected if its predictions are contradicted; it is accepted if its predictions are not contradicted.” All that matters is whether or not a model fits the data. Assumptions? Priors? Intuition? All that stuff is just moonshine. Empirical fit reigns supreme. This is an extreme view!

For example, in Friedman’s eyes, a good model of how leaves are distributed about the canopy of an oak tree is a model in which each leaf optimally chooses its position and orientation relative to its neighbors. Yes, we know that leaves don’t have brains. They can’t actually make decisions like this. But it is ‘as if’ they could. So a model in which each leaf strategically chooses where to grow is a good model of leaf placement.

A good model of how an expert billiards player makes difficult shots would be a model in which “he knew the complicated mathematical formulas that would give the optimum directions of travel, could estimate accurately by eye the angles, could make lightning calculations from the formulas, and could then make the balls travel in the direction indicated by the formulas.” So what if the player can’t do these things? We know he regularly makes difficult shots, so it’s ‘as if’ he can. Friedman tells us to just model him like that anyway.

In Friedman’s view, “a theory cannot be tested by comparing its ‘assumptions’ directly with ‘reality.’ Indeed, there is no meaningful way in which this can be done.” In fact, Friedman argues that insisting on reasonable assumptions can be misleading. “The more significant the theory, the more unrealistic the assumptions.”

Every economist knows about Friedman’s ‘as if’ approach to model evaluation. If asked, most economists will say that Friedman’s methodological approach is, if not correct, then at least reasonable. They will argue that it’s at least important to consider ‘as if’ justifications when evaluating a model.

But here’s the thing: no working economist actually evaluates models this way! Aside from one glaring exception, no economist actually thinks ‘as if’ models are helpful. Ask yourself: Is the factor zoo a problem for asset pricing? Yes. But what is a spurious factor? It’s a factor that fits the data for wrong reasons. It is ‘as if’ investors were using it to price assets even though they aren’t. And that’s precisely the problem!

The idea that we can’t test (or shouldn’t even bother testing) the assumptions behind our economic models is simply preposterous. It’s a claim that Steven Pinker would call a “conventional absurdity: a statement that goes against all common sense but that everyone believes because they dimly recall having heard it somewhere and because it is so pregnant with implications.” No economist does research this way!

Why not replace all economic models with uninterpretable machine-learning (ML) algorithms? ML algorithms can fit the data well precisely because they contain no economic assumptions. But TANSTAAFL! It is precisely the economic assumptions about what agents are trying to do that give us confidence a model’s predictions will hold up when conditions change. In other words, these assumptions are what allow economists to use the model for counterfactual analysis—i.e., to make predictions in new and as-yet-unseen environments. The right assumptions embedded in a good economic model are responsible for its robust predictions. If you’re going to ignore all such economic restrictions, then there’s no point in writing down an economic model in the first place. There are better ways to do pure prediction.

I’m by no means the first person to highlight these issues. They long predate the factor zoo and the popularity of ML algorithms. If I had to pick one person to judge the quality of an economic model, that person would be Paul Samuelson. And Samuelson strongly disagreed with Friedman’s ‘as if’ approach. Samuelson clearly recognized the importance of evaluating your assumptions, disparagingly referring to Friedman’s ‘as if’ methodology as the “F-Twist” in a 1963 discussion paper.

Moreover, in almost every context, economists approach research in a manner more consist with Samuelson than with Friedman. They firmly believe it’s important to verify one’s assumptions. This is why we see papers with titles like Do Measures of Financial Constraints Measure Financial Constraints? getting hundreds of cites a year. This influential paper is entirely concerned with testing our working assumptions.

As far as I can tell, there is only one context in which economists actually use ‘as if’ reasoning to constrain the research process—namely, when interpreting survey data. Standard asset-pricing models assume investors are solving an optimization problem that looks something like

(1)   \begin{equation*} \begin{array}{rl} \text{maximize} & \Exp\left[ \, \sum_{t=0}^\infty \, \beta^t \cdot U_t \, \right] \\ \text{subject to} & \quad\,\;\; U_t = \mathrm{u}(C_t) \\ & \Delta W_{t+1} = \mathrm{f}(C_t, \, X_t; W_t) \\ & \qquad\,\, 0 \leq \mathrm{g}_n(C_t, \, X_t; W_t) \qquad \text{constraints } n=1, \ldots,\,N \end{array} \end{equation*}

Economists regularly test assumptions about investor preferences U = \mathrm{u}(C), the law of motion for wealth, \Delta W = \mathrm{f}(C, \, X; W), and various other kinds of economic constraints, 0 \leq \mathrm{g}_n(C, \, X; W). However, for some reason, it’s entirely taboo to ask investors whether they are actually trying to this problem in the first place.

Friedman directly calls out survey data in his 1953 essay, writing that “questionnaire studies of businessmen’s or others’ motives or beliefs about the forces affecting their behavior… seem to me almost entirely useless as a means of testing the validity of economic hypotheses.” However, he offers no concrete reasons why economists should think about the “maximize” part of investors’ optimization problem any differently than the “subject to” part. Both are assumptions. In Friedman’s eyes, both are untestable.

Yes, survey data can be misleading. Above I describe a situation where surveying economists about their views on ‘as if’ reasoning would yield specious evidence. But all data can be misleading. It’s not like NOT using survey data has resolved the factor zoo. Sometimes investors give uninformative answers which might lead researchers down the wrong path. But this doesn’t mean that we can’t learn anything concrete about how investors price assets from a well-constructed survey. Not every regression result is informative. Some regression estimates can even be misleading. None of this implies that regression analysis is worthless.

Friedman’s 1953 essay outlines a bad approach to model evaluation. There’s more to a good model than R^2 = 100\%. Paul Samuelson knew this to be true. And, except when they’re looking at survey data, every other economist knows it to be true as well. There’s no reason for us to continue applying ‘as if’ reasoning only in this particular context. It’s just not a valid argument for dismissing survey evidence about a model.

Filed Under: Uncategorized

Consumption Risk In Modern Macro-Finance Models

October 8, 2020 by Alex

Stocks returns are 8\% per year higher than bond returns on average. It’s hard to explain such a large equity premium using the standard consumption-based model because consumption growth isn’t risky enough. So, to fix this problem, modern macro-finance models introduce new state variables capturing other kinds of risk investors might care about, such as the surplus consumption ratio (habits; Campbell-Cochrane 1999) and news about long-run consumption growth (long-run risks; Bansal-Yaron 2004).

Because exposure to one of these new state variables typically explains most of the 8\% per year equity premium in modern macro-finance models, there’s a sense among researchers that it doesn’t matter whether investors are trying to insure themselves against shocks to consumption growth.

Not so!

This post shows why. The new state variables used in modern macro-finance models are not separate from consumption risk. They’re ways of amplifying the effects of consumption risk. Arguing that investors don’t care about consumption risk because exposure to the surplus consumption ratio explains most of the 8\% per year equity premium is like arguing that Hollywood doesn’t care about beauty because plastic surgery has a bigger effect on casting decisions than the face actors are born with. It’s nonsense.

Consumption CAPM

Investors in the consumption capital asset-pricing model (CCAPM; Lucas 1978) try to maximize their expected discounted utility by choosing how much to consume, C_t, how much to invest in the stock market, S_t, and how much to invest in riskless bonds, B_t:

(1)   \begin{equation*} \begin{array}{rl} \text{maximize} & \Exp\left[ \, \sum_{t=0}^\infty \, \beta^t \cdot U_t \, \right] \\ \text{subject to} & \quad U_t = C_t^{1-\gamma} / \, (1-\gamma) \\ & W_{t+1} = (W_t - C_t) \cdot (1 + R_f) + S_t \cdot (R_{t+1} - R_f) \\ & \,\,\,\,\,W_t = C_t + B_t + S_t \end{array} \end{equation*}

\beta \in (0, \, 1) is investors’ subjective time preference, U_t is their utility from consumption, W_t represents their wealth, and \gamma > 0 is their coefficient of risk aversion. (1+R_{t+1}) = (P_{t+1} + D_{t+1})/P_t is the gross return on the value-weighted stock market, and (1+R_f) is the return on a riskless bond.

The CCAPM predicts that the expected excess return on stocks, \Exp[R_{t+1}] - R_f, will be proportional to the covariance between consumption growth and stock returns:

(2)   \begin{equation*} \Exp[R_{t+1}] - R_f \approx \gamma \times \Cov[\Delta \log C_{t+1}, \, R_{t+1}] \end{equation*}

The price of stocks is inversely related to the expected market return, \Exp[1+R_{t+1}] = \Exp[P_{t+1} + D_{t+1}]/P_t. So, the CCAPM says that investors pay more for stocks when stock returns tend to offset negative consumption shocks, \Cov[\Delta \log C_{t+1}, \, R_{t+1}] < 0. In other words, Equation (2) says CCAPM investors view the stock market as a way to insure future consumption shocks.

Equity Premium Puzzle

Unfortunately for the CCAPM, investors’ desire to hedge future consumption shocks can’t explain the entire \Exp[R_{t+1}] - R_f \approx 8\% per year equity premium that we observe in the data on its own. To see why, consider rewriting the right-hand side of Equation (2) using the definition of a covariance:

(3)   \begin{equation*} \gamma \times \Cov[\Delta \log C_{t+1}, \, R_{t+1}]  =  \gamma \times \big( \, \rho \cdot \sigma_{\Delta \log C} \cdot \sigma_R \, \big) \end{equation*}

The parameter \rho = \Corr[\Delta \log C_{t+1}, \, R_{t+1}] is the correlation between consumption growth and stock returns, \sigma_{\Delta \log C} = \Sd[\Delta \log C_{t+1}] is the volatility of consumption growth, and \sigma_R = \Sd[R_{t+1}] is the volatility of stock returns. In the data, we observe values of roughly \rho \approx 0.20, \sigma_{\Delta \log C} \approx 1\% per year, and \sigma_R \approx 16\% per year. Thus, investors would need a risk aversion of \gamma = 250 for the CCAPM to explain an 8\% equity premium.

A risk aversion of \gamma = 250 seems too high. But, if we assume a lower risk aversion of merely \gamma = 10, the equity premium should only be 10 \times (0.20 \cdot 1\% \cdot 16\%) \approx 0.32\% per year according to the CCAPM. Given the power of compounding to increase investor wealth over long horizons, a difference of 8\% - 0.32\% = 7.68\% per year is a big deal. CCAPM investors with a \gamma = 10 should be putting much more money in stocks than they actually are, which would drive up stock prices and thereby lower the expected returns. So, if the basic logic of the CCAPM is correct, there must be something else about stock returns scaring investors away.

External Habit

Campbell-Cochrane (1999) argue that the something else is captured by a variable called the surplus consumption ratio. Their starts with Problem (1) and plugs in a modified utility function:

(4)   \begin{equation*}  U_t = (C_t-X_t)^{1-\gamma} / \, (1-\gamma) \end{equation*}

In this specification, investors care about their consumption in excess of the level of consumption they have become accustomed to, X_t. This level corresponds to a weighted average of past consumption:

(5)   \begin{equation*} \log X_t  = \lambda \cdot {\textstyle \sum_{\ell=0}^{\infty}} \, \phi^{\ell} \cdot \log C_{t-\ell} \qquad \qquad \lambda > 0, \, \phi \in (0, \, 1) \end{equation*}

So, drops in consumption following prolonged periods of high consumption are extremely painful for investors. Conversely, an increase in consumption following a long hungry spell will be really enjoyable.

The surplus consumption ratio is Z_t = (C_t - X_t) / C_t. The model says expected stock returns could be high either because stock returns covary with consumption growth or because they covary with growth in the surplus consumption ratio:

(6)   \begin{equation*} \Exp[R_{t+1}] - R_f  \approx  \gamma \times \Cov[\Delta \log C_{t+1}, \, R_{t+1}]  +  \gamma \times \Cov[\Delta \log Z_{t+1}, \, R_{t+1}] \end{equation*}

Yes, the covariance between consumption growth and stock returns isn’t strong enough explain the 8\% equity premium on its own. But, stock market crashes tend to sucker punch investors, occurring just when investors’ consumption falls following a prolonged boom. \Cov[\Delta \log Z_{t+1}, \, R_{t+1}] > 0 explains nearly all of the 8\% per year equity premium according to this habit-formation model.

Long-Run Risk

Bansal-Yaron (2004) add a different state variable to the CCAPM. This model also starts with Problem (1) and plugs in a new utility function based on Epstein-Zin (1989) recursive preferences rather than habit formation:

(7)   \begin{equation*} U_t = \left\{ \, (1 - \beta) \cdot C_t^{1-\alpha} + \beta \cdot \big(\Exp_t[U_{t+1}^{1-\gamma}]^{\frac{1}{1-\gamma}}\big)^{1-\alpha} \, \right\}^{\frac{1}{1-\alpha}} \end{equation*}

These preferences are recursive because they indicate that investors care not only about their consumption today, (1 - \beta) \cdot C_t^{1-\alpha}, but also the present value of their expected future consumption, \beta \cdot \big(\Exp_t[U_{t+1}^{1-\gamma}]^{\frac{1}{1-\gamma}}\big)^{1-\alpha}.

\beta and \gamma represent investors’ time preferences and risk aversion just like in the original power utility specification. The only new parameter is \alpha > 1. The ratio 1/\alpha represents investors’ elasticity of intertemporal substitution (EIS). This parameter captures how much investors want to resolve future uncertainty about consumption, not because they want to do something with the information but because resolving uncertainty as soon as possible makes them happy. The guy on the subway platform who’s leaning dangerously out onto the tracks staring down the tunnel so that he can be the first to spot the next train is someone with a very high EIS. He wants to know as soon as possible if the next train is immanent, not because it will allow him to board sooner (everyone boards at the same time) but because knowing the train is about to arrive makes him happy. For a long-run risk model to work, we need \gamma \cdot (1/\alpha) > 1.

Let P_t denote the current price of an asset whose payout is aggregate consumption in the following period. The key new state variable in the long-run risk model is \log Z_{t+1} = \log (P/C)_{t+1}. The model says that the equity premium will be determined as follow:

(8)   \begin{equation*} \Exp[R_{t+1}] - R_f  \approx \gamma \times \Cov[ \, \Delta \log C_{t+1}, \, R_{t+1} \, ] + \mathrm{f}(\gamma, \, \alpha) \times \Cov[ \, \log Z_{t+1}, \, R_{t+1} \, ] \end{equation*}

\mathrm{f}(\gamma, \, \alpha) \leq 0 is a function that stems from a Campbell-Shiller (1988) approximation of Z_t. Thus, the long-run risk model says expected stock returns could be high either because future stock returns tend to covary with consumption growth or because these stock returns tend to covary with the future price-to-dividend ratio of the aggregate consumption claim. This price-to-dividend ratio will partly reflect current changes in consumption, \Delta \log C_{t+1}. But, since consumption growth is persistent and investors have recursive preferences, it will also reflect future consumption shocks as well. In calibrations, most of the 8\% per year equity premium is explained by variation in \log Z_{t+1} coming from consumption shocks far off in the future.

Source Of Confusion

What would it mean for investors not to care about consumption risk in one of these models? \rho = \Corr[\Delta \log C_{t+1}, \, R_{t+1}] captures the stock market’s exposure to consumption risk. When \rho \approx 1, stock market booms always coincide with increases in consumption. When \rho = 0, knowing that the stock market is booming tells you nothing about whether aggregate consumption is increasing or decreasing. So, investors in a particular model would be indifferent to changes in consumption risk if

(9)   \begin{equation*} \partial_{\rho} (\Exp[R_{t+1}] - R_f) = 0 \end{equation*}

In other words, they wouldn’t care about consumption risk if increasing the amount of consumption risk had no effect on their demand and thus no effect on equilibrium prices.

We saw above that, if we assume a risk aversion coefficient of \gamma = 10, then the first term in Equations (6) and (8) is very small. \gamma \times \Cov[\Delta \log C_{t+1}, \, R_{t+1}] \approx 0.3\%. And, as a result, the effect of an increase in consumption risk on asset prices coming from this first term is quite small as well:

(10)   \begin{equation*} \partial_{\rho} (\gamma \times \Cov[\Delta \log C_{t+1}, \, R_{t+1}]) = \gamma \times \sigma_{\Delta \log C} \cdot \sigma_R \approx 0.016 \end{equation*}

Judged only by the effect of this initial term, an increase in consumption risk from \rho = 0.00 to \rho = 0.40 would only increase the expected excess return on the stock market by 0.016 \times 0.40 = 0.64\% per year. We observe a correlation between stock returns and consumption growth of \rho = 0.20 in the data. So, these numbers imply that a 2\times swing around the mean \rho would explain less than a tenth of the total 8\% per year equity premium puzzle if consumption risk only affected asset prices via the \gamma \times \Cov[\Delta \log C_{t+1}, \, R_{t+1}] term.

However, consumption risk doesn’t only affect asset prices via the \gamma \times \Cov[\Delta \log C_{t+1}, \, R_{t+1}] term in Equations (6) and (8). Therefore

(!!!)   \begin{equation*} \partial_{\rho} (\gamma \times \Cov[\Delta \log C_{t+1}, \, R_{t+1}]) \approx 0 \qquad \text{does \underline{\textbf{not}} imply} \qquad \partial_{\rho} (\Exp[R_{t+1}] - R_f) \approx 0 \end{equation*}

Such a conclusion would only be valid if the new state variables introduced in Campbell-Cochrane (1999) and Bansal-Yaron (2004) happened to be unrelated to consumption growth. This is absolutely not the case! Changes in the surplus consumption ratio are highly correlated with consumption growth. And, the long-run risk model assumes that consumption growth is very persistent, so price changes due to anticipated consumption shocks in the far distant future will be highly correlated with consumption growth today too.

Plugging In Numbers

How much does the Campbell-Cochrane (1999) model suggest expected excess returns should increase in response to a move from \rho = 0 to \rho = 0.40? Campbell-Cochrane (1999) talk about habit formation as “amplification mechanism for consumption risks in marginal utility. (page 240)” Mathematically, this shows up as a scaling up of the risk-aversion coefficient from \gamma to \gamma / \Exp[Z_t]. The authors use \phi = 0.87. With \sigma_{\Delta \log C} = 1\% per year and \gamma = 10, the average surplus consumption ratio is \Exp[Z_t] = \sigma_{\Delta \log C} \cdot \sqrt{\frac{\gamma}{1 - \phi}} \approx 0.088. So, in the external habit model, the effect of consumption risk on asset prices will be:

(11)   \begin{equation*} \partial_{\rho} (\Exp[R_{t+1}] - R_f) = (\gamma / 0.088) \times \sigma_{\Delta \log C} \cdot \sigma_R \approx 0.18 \end{equation*}

Because increasing the stock market’s correlation with consumption growth must also increase its correlation with growth in the surplus consumption ratio, a \Delta \rho = 0.40 increase in consumption risk will increase the annual expected excess return on the stock market by 0.18 \times 0.40 = 7.30\% in a habit model.

How much does the Bansal-Yaron (2004) model suggest expected excess returns should increase in response to a \Delta \rho = 0.40 increase in consumption risk? Cochrane (2017) describes how this model “ties its extra state variables… to observables by the assumption of a time-series process in which short-run consumption growth is correlated with… long-run news.” When 1/\alpha \approx 1, the function \mathrm{f}(\gamma, \, \alpha) = 1 - \gamma and Equation (8) can be re-written as:

(12)   \begin{equation*} \Exp[R_{t+1}] - R_f  \approx \gamma \times \Cov[ \, \Delta \log C_{t+1}, \, R_{t+1} \, ] + (1 - \gamma) \times \Cov[ \, \log (P/C)_{t+1}, \, R_{t+1} \, ] \end{equation*}

Changing an asset’s correlation with consumption growth also changes its correlation with the future log price-to-consumption ratio. I estimate \log (P/C)_{t+1} = 3.61 - 30.26 \cdot \Delta \log C_{t+1} + \varepsilon_{t+1}, which would imply that \partial_{\rho} (\Exp[R_{t+1}] - R_f) = [\gamma - (1-\gamma) \cdot 30.26] \cdot \sigma_{\Delta \log C} \cdot \sigma_R \approx 0.45. Thus, a \Delta \rho = 0.40 increase in consumption risk will increase annual expected excess returns by 0.45 \times 0.40 \approx 18\% in the long-run risk model!

Filed Under: Uncategorized

Factor Models, Little Green Men, And Machine Learning

June 28, 2019 by Alex

Economists use machine learning (ML) to study asset prices in two different ways. Approach #1: use these techniques to predict the cross-section of expected returns—i.e., to predict which stocks are most likely to have high or low future returns. e.g., see here, here, or here. Approach #2: use them to try to uncover the “true asset-pricing model”—a.k.a., the “set of priced risk factors”.

Many economists dismiss approach #1, arguing that predicting future stock returns is a job for traders not academics. Instead, it’s much more common for researchers to adopt approach #2. The conventional wisdom is that we, as researchers, will learn something deep and fundamental about how financial markets work if one of these new ML techniques uncovers a factor model that perfectly explains the cross-section of expected returns. There’s a widely held view that doing empirical asset-pricing research means attributing differences in expected returns to some risk-return tradeoff with an intuitive story attached to it.

But… not so fast. There’s actually something paradoxical about the logic of approach #2. There’s a problem with the conventional wisdom. And, the goal of this post is to explain what that special something is.

Factor Models

But first: factor models. What are economists talking about when they say they’re trying to find the “true asset-pricing model” or the “set of priced risk factors”? To get a handle on this terminology, consider regressing the returns of each stock, R_{n,t}, on lagged values of some predictive variable, X_{n,t-1}:

    \begin{equation*} R_{n,t} = \hat{a} + \hat{b} \cdot X_{n,t-1} + \hat{e}_{n,t} \end{equation*}

The results of a predictive regression like this one can be interpreted as trading-strategy returns. You can read the estimated \hat{b} as the return to a zero-cost portfolio that’s long high-X stocks and short low-X stocks:

    \begin{equation*} \hat{b} \propto {\textstyle \frac{1}{N} \cdot \sum_n} \, (R_{n,t} - \bar{R}_t) \cdot (X_{n,t-1} - \bar{X}_{t-1}) \end{equation*}

Thus, \hat{b} > 0 implies both that stocks with high predictor values yesterday, (X_{n,t-1} - \bar{X}_{t-1}) > 0, tended to have high excess returns today, (R_{n,t} - \bar{R}_t) > 0, and also that it would have been profitable to trade on X today.

It could be that an estimated \hat{b} > 0 represents arbitrage profits. But, maybe trading on X is only profitable because it requires investors to bear lots of non-diversifiable risk? Imagine that investors are all really worried about not having enough money during future market crashes, R_{\mathit{Mkt},t} \ll 0. Then, if the predictive variable X_{n,t-1} turned out to be capturing exposure to market risk,

    \begin{equation*} X_{n,t-1} = {\textstyle \frac{\mathrm{Cov}[R_{n,t}, \, R_{\mathit{Mkt},t}]}{\mathrm{Var}[R_{\mathit{Mkt},t}]}} \end{equation*}

the profits earned by trading on X would represent compensation for holding a portfolio that will deliver terrible returns during market crashes—i.e., at the worst possible time as far as investors are concerned. And, when economists think this is what’s going on, they typically write the predictive variable as \beta_{n,t-1}^{(\mathit{Mkt})} rather than X_{n,t-1}. This is what they’re talking about when they speak of “market beta”.

So far so good. Now, for the final step. Notice that this compensation-for-risk logic doesn’t just apply when the risk factor is market returns. You can replace R_{\mathit{Mkt},t} with any variable so long as the variable defines some sort of bad aggregate outcome in investors’ eyes. e.g., think about something like a drop in market liquidity. So, looking for the “true asset-pricing model” or the “set of priced risk factors” means looking for a collection of K \geq 1 variables \{R_{1,t}, \ldots,\,R_{K,t} \} such that, if we assume investors are worried about not having enough money when these risk factors are negative, then every difference in expected returns is perfectly explained by differences exposure to these K priced risk factors:

    \begin{equation*} \mathrm{E}_{t-1}[R_{n,t}] = {\textstyle \sum_{k=1}^K} \, \lambda_t^{(k)} \cdot \beta_{n,t-1}^{(k)} \end{equation*}

Above, each \lambda_t^{(k)} > 0 is a market-wide constant called the price of risk associated with the kth factor.

I really want to emphasize the logic here. When an economist says a factor model explains the cross-section of expected returns, he’s saying that investors all have the same K \geq 1 risk factors in mind when making their respective portfolio choices. If one of these risk factors were to go negative, investors would consider it a bad state of the world; if all of them were to go negative, it’d be apocalyptic. The clain is that investors are all really worried about having enough money when these various kinds of bad outcomes occur. So, as a result, they’re willing to pay extra for assets whose returns are less correlated with these K risk factors—i.e., for assets that are more likely to have positive returns when risk factors are negative. Therefore, in equilibrium, these assets will have higher prices today and thus lower expected future returns.

Little Green Men

By now, researchers have proposed lots of different candidate factor models. Some might even say there’s a “factor zoo”. Each model makes its own claim about a specific set of risk factors that all investors are worried about. And yet, there’s no general consensus among researchers (let alone investors) about which is correct. This disagreement should already give you pause, but now ask yourself this: If you have to use an ML algorithm to identify the correct “set of priced risk factors” in investors’ “true asset-pricing model”, how did investors find these variables in the first place? A few investors certainly understand the ML toolkit today, but most certainly do not. And, no one was aware of these ideas twenty something years ago.

As a thought experiment, suppose that tomorrow while doing other research you encounter an ML algorithm, which was first discovered in 2010, that always outputs a factor model which perfectly explains the cross-section of expected returns. Does it make sense to claim that this ML algorithm is able to find the “true asset-pricing model” at work in, say, 1985? By assumption, when you feed data from 1985 into the algorithm, the output will be a “set of priced risk factors” that perfectly explains the cross-section of expected returns in 1985. But, could these risk factors possibly reflect how Madonna-loving 1985 investors were thinking about risk and return? No. Of course not. If the algorithm wasn’t discovered until 2010, could 1985 investors have known about this “set of priced risk factors”?

Let’s make the thought experiment even more extreme. Suppose that little green men come to earth tomorrow and secretly give you an alien computer that operates based on principles never before seen by humans. There’s absolutely nothing like it here on earth. And, this advanced computer comes pre-programmed with correspondingly advanced ML algorithms. And, imagine that one of these algorithms works like the algorithm described above. It always outputs a set of K \geq 1 risk factors that perfectly explain the cross-section of expected returns. Do these risk factors tell us anything about how human investors view risk in earthly markets? Again: No. Of course not. To discover them you had to use an advanced alien technology with absolutely no analog here on earth. So, how could this risk factors be capturing earthly investors’ views about risk and return? The algorithm simply produces an excellent set of predictive variables that take the form of partial correlations with each asset’s returns—i.e., that take the form of \beta_{n,t-1}^{(k)}s.

Machine Learning

I’m quite bullish about the prospects of ML in asset pricing. I think researchers have barely scratched the surface. I just don’t think that approach #2—i.e., searching for the “true asset-pricing model”/”set of priced risk factors”—is a sensible way to apply the ML toolkit. Although academics tend to poo poo approach #1 as lacking in economic content, it’s simply not true. There are lots of situations where we’re perfectly happy to have good return predictions at the price of not understanding where this fit comes from. Traders are obviously OK with this Faustian bargain. But, so too are researchers. It’s not like the Fama-French 3-factor model is popular because we have an economic understanding of what the size and value factors represent.

Financial economists like to think about the market and its investors as something separate. But, it’s just not so. We are the investors in our asset-pricing models. There’s no separation. And, this fact should be reflected in our models. For me, this is the most interesting economic insight that comes with applying ML algorithms to study asset prices. If the tools that we use to find predictors change, then the predictors that our theoretical investors find should change, too. In his AFA presidential address, John Cochrane writes that, “to address these questions in the zoo of new variables, I suspect we will have to use different methods… For one variable, portfolio sorts and regressions both work. But we cannot chop portfolios 27 ways… so, I do not see how to do it by a high-dimensional portfolio sort.” Whatever those different methods end up being (ML or otherwise), we’d better not be modeling asset-pricing equilibria the same way after they get introduced.

Filed Under: Uncategorized

Risk-Factor Identification: A Critique

May 26, 2019 by Alex

In standard cross-sectional asset-pricing models, expected returns are governed by exposure to aggregate risk factors in a market populated by fully rational investors. Here’s how these models work. Because investors are fully rational, they correctly anticipate which assets are most likely to have low returns in especially inconvenient future states of the world—i.e., returns that are highly correlated with aggregate risk factors. They won’t be willing to pay as much for the high risk-exposure assets today. So, the price of high risk-exposure assets will drop in equilibrium, giving these assets high expected returns going forward.

With this standard framework in mind, financial economists are constantly on the lookout for assets with similar risk exposures but different average returns. e.g., in a CAPM world, value and growth stocks would have similar average returns after adjusting for market beta; however, in the real world, there’s a 4%-per-year value premium. Assuming they are fully rational, this finding suggests that investors are worried about more than just aggregate market risk when pricing assets. It suggests they’re also paying attention to another as-yet-unknown risk factor(s). The central challenge in this literature is to figure out which one(s).

Unfortunately, after decades of work, there’s still no general consensus about which aggregate risk factors matter to real-world investors. Instead, the academic literature contains a zoo of candidate risk factors. Correlation with any of these factors will help predict an asset’s expected returns. But, it’s hard to believe that all of these aggregate risk factors actually matter to real-world investors, especially when they “have little in common economically with each other”.

Lax econometric standards are certainly one explanation for this factor zoo. The goal of this post is to suggest another: full rationality. Notice that full rationality plays two different roles in the discussion above. The first is to make sure that investors correctly anticipate the correlation between each asset’s future returns and the aggregate risk factors. If investors are fully rational, then changes in an asset’s risk exposure must be due to changes in fundamentals. The second role is to remove any logical limits on what these aggregate risk factors might be. If investors are fully rational, then they might potentially be worried about any future state of the world a researcher might dream up… and more! The whole premise of learning about the true risk factors requires real-world investors to know things that researchers haven’t yet noticed. And, if investors are fully rational, this additional knowledge might be arbitrarily subtle.

Below I show that, if researchers assume that investors are fully rational in both of the above senses, then identifying the true set of aggregate risk factors used by real-world investors is an impossible goal.

RCT Protocol

Economists think about randomized controlled trials (RCTs) as the gold standard for identification. Here’s how the RCT protocol works. Imagine you’re a medical researcher who’s just discovered a new cancer-treatment drug. You think your new discovery has promise, but the only way to know if it actually works is to give it to cancer patients and see whether they’re more likely to recover. But, how should you do this?

You could just distribute flyers advertising your new drug at the nearest hospital, give your drug to all the cancer patients who respond to the flyers, and then compare the recovery rate of the patients who took your drug to that of the remaining cancer patients. However, this is a bad idea. People try to make the best decision possible given all available knowledge about their current circumstances. So, we should expect that the cancer patients who respond to your flyer will be different from those who do not. We should expect them to be sicker, having exhausted all other treatment options. This means that any difference in recovery rates could be due to your new drug or to underlying differences in patient populations.

What’s more, if patients are optimizing based on information that’s unobservable to you (the researcher), then it doesn’t help to control for the differences in patient populations that you can see. Suppose you found two cancer patients, one who took your drug and one who decided not to, that looked identical in every conceivable way you could measure: both male, both white, both 43 years old, same height and weight, etc… If you really believed that these patients were making fully rational choices based on all the available information they had, then you must be missing something about each of their respective situations. Two identical fully rational people wouldn’t make two radically different life choices given the same information.

In short, to learn whether your new drug works, you have to break the link between drug treatment and patients’ optimal decisions based on (potentially) unobservable information. And, the RCT protocol does this by randomizing which cancer patients get your new drug and which get a sugar pill. You need to find a bunch of patients willing to participate in your study knowing that they have only a 50:50 chance of receiving the new experimental treatment. Then, with enough patients, the law of large numbers makes it very unlikely that the treated patient population will systematically differ from the untreated population. Thus, any difference in the recovery rates of these two groups must be due to your drug regimen.

Model Testing

Now, think about what’s going on when we test a cross-sectional asset-pricing model. A model is just a list of K \geq 1 aggregate risk factors. A fully rational investor will anticipate which assets have returns that are highly correlated with these K aggregate risk factors. So, if the model is correct, differences in expected returns across assets will be explained by differences in exposure to these K aggregate risk factors.

This logic suggests a straightforward empirical approach. To test a cross-sectional asset-pricing model, first separately regress the excess returns of each asset n = 1,\ldots,\,N on the K aggregate risk factors:

(1)   \begin{equation*} \mathit{rx}_{n,t} = \bar{a}_n + {\textstyle \sum_{k=1}^K} \, \bar{b}_{n,k} \cdot f_{k,t} + e_{n,t} \end{equation*}

Run a time-series regression involving t=1,\ldots,\,T observations for each asset. Then, take the estimated slope coefficients from these N regressions, which capture each asset’s exposure to the K aggregate risk factors, \bar{b}_{n,k} \overset{\scriptscriptstyle \text{def}}{=} \overline{\mathrm{Cov}}[\mathit{rx}_{n,t}, \, f_{k,t}] \, \big/ \, \overline{\mathrm{Var}}[f_{k,t}], and test whether differences in risk-factor exposure across assets explain differences in expected returns across assets:

(2)   \begin{equation*} \overline{\mathit{rx}}_n = \hat{\alpha} + {\textstyle \sum_{k=1}^K} \, \hat{\lambda}_k \cdot \bar{b}_{n,k} + \varepsilon_n \end{equation*}

Run one cross-sectional regression involving n=1,\ldots,\,N observations. If you’ve found the true factor model that real-world investors are using, then i) \hat{\lambda}_k > 0 for all k=1,\ldots,\,K, ii) \hat{\alpha} \approx 0, and iii) \widehat{\mathrm{Var}}[\varepsilon_n] \approx 0.

But, satisfying these three criteria is only a necessary condition. It’s not sufficient for proving you’ve got the right model. Even if a cross-sectional asset-pricing model passes these hurdles, real-world investors might not be using those K aggregate risk factors to price assets. Exposure to the K aggregate risk factors could be the result of correlations with other omitted variables that real-world investors really care about.

This is a question about identification. And, the RCT protocol suggests we can solve it by looking for random variation in an asset’s exposure to each of the K risk factors that has nothing to do with changes in fundamentals. The whole point of using an RCT is to make sure that patient decisions based on unobserved information aren’t causing a spurious link between drug treatment and recovery. And, we want to make sure that investor decisions based on unobserved fundamentals aren’t causing a spurious link between risk exposure and expected returns. We need to block any possibility of an unobserved link between risk-factor exposure and asset fundamentals.

So, imagine that investors perceive a noisy version of each asset’s exposure to the kth risk factor:

(3)   \begin{equation*} \bar{b}_{n,k} = \bar{b}_{n,k}^{\star} + \tilde{b}_{n,k} \end{equation*}

Above, \bar{b}_{n,k}^{\star} denotes the nth asset’s true risk exposure and \tilde{b}_{n,k} denotes noise that’s unrelated to fundamentals. The only way to know that investors are using a particular set of K aggregate risk factors and not some other correlated set of factors is to study how \tilde{b}_{n,k} predicts expected returns. After all, differences in expected returns that are associated with estimation errors, \tilde{b}_{n,k}, can’t be attributed to investors acting strategically based on unobserved information about asset fundamentals.

Impossible Goal

By now, you probably see the logical trap that’s been laid. A fully rational investor might potentially be reacting to any piece of unobserved information about an asset’s fundamentals. So, non-fundamental variation in their perception of risk exposure is crucial to identifying the model they’re using. But, non-fundamental variation in perceived risk exposure would represent an error. And, fully rational investors don’t make errors. Thus, if we are adamant that real-world investors are fully rational, then we must give up any hope of identifying the cross-sectional asset-pricing model they’re using.

Note that this impossibility result doesn’t say that investors need to be completely irrational… far from it. The true \bar{b}_{n,k}^{\star} has to have some bearing on investors’ perceived \bar{b}_{n,k}. If investors aren’t strategically adjusting their demand today in response to actual future risks, then cross-sectional asset-pricing models have no content. Rather, the impossibility result says that, for researchers to identify the cross-sectional asset-pricing model that real-world investors are using, these perceptions can’t be perfectly accurate. For a useful analogy, think about every spy thriller with a canary trap that you’ve ever seen. In order for one spy to figure out what the other knows, he’s got to see how his adversary reacts to planted fake intel. If his foe always sees through the ploy (i.e., if his foe is “fully rational” in High Economyan), then there’s no hope of any success.

This impossibility result also suggests a new use for many of the cognitive errors documented by behavioral economists: as tools for testing whether or not real-world investors care about exposure to particular risk factors. The existing behavioral-finance literature contains a ready supply of \tilde{b}_{n,k}s.

Filed Under: Uncategorized

The Basic Recipe For Rationalizing Errors In Belief

February 3, 2019 by Alex

Behavioral-finance models are often written down so that, although each individual trader holds incorrect beliefs, market events nevertheless unfold in such a way that traders can rationalize their own errors. e.g., consider the model in Scheinkman and Xiong (2003). In this model, each individual trader knows that every other trader is over-confident, and he knows that every other trader thinks that he himself is over-confident. He just doesn’t think that they’re correct. He’s pig-headedly insists that he’s the only unbiased trader. And yet, in spite of this error, the model is set up so that he can interpret the realized price path in his own internally consistent way. Each trader thinks the price distortion caused by his own over-confidence is actually coming from the value of the option to resell at a later date to some other over-confident trader.

There’s a good reason why researchers write down models this way. The idea is to write down a model that’s exactly one-step away from a rational benchmark. That way, any new predictions made by the model can be attributed to the behavioral bias. In this post, I first outline the basic recipe for rationalizing traders’ errors in beliefs. Then, I point out something slightly paradoxical about this recipe—namely, it requires fine-tuning the model parameters. And, while a researcher can to do this fine-tuning in a theoretical model, it’s not clear who can turn the appropriate knobs in the real world. These models are like stage magic. And, while we can learn about which cognitive biases people suffer from by studying a good magician’s sleight of hand, most missing coins wind up between the couch cushions rather than in The Amazing Randi‘s pocket.

Errors In Belief

Here’s a simple framework for digesting errors in beliefs. To start with, consider a market where a trader has correct beliefs. i.e., suppose that a trader receives a noiseless signal:

    \begin{equation*} \mathit{Signal} = \mathit{News} \qquad \text{where} \qquad \mathit{News} \overset{\scriptscriptstyle \text{iid}}{\sim} \mathrm{N}(0, \, 1) \end{equation*}

Then, given the trader’s optimal demand in response to this noiseless signal, suppose that the structural relationship between the trader’s noiseless signal and realized returns is given by:

    \begin{equation*} \mathit{Return} = \beta^{\star} \cdot \mathit{Signal} + \varepsilon^{\star} \qquad \text{where} \qquad \varepsilon^{\star} \overset{\scriptscriptstyle \text{iid}}{\sim} \mathrm{N}(0, \, 1) \end{equation*}

We typically think that \beta^\star \in (0, \, 1) with larger values of \beta^\star indicating more informative prices. I’m using the term “structural relationship” for \beta^{\star} \overset{\scriptscriptstyle \mathrm{def}}{=} {\textstyle \frac{\partial \phantom{s}}{\partial s}} \mathrm{E}^{\star}[ \, \mathit{Return} \, | \, \mathrm{do}(\mathit{Signal} = s) \, ] because this parameter reflects the expected change in returns due to an exogenous shift in the trader’s signal. Note that this structural relationship could reflect other traders’ errors in belief, as was the case in Scheinkman and Xiong (2003).

But, in reality, suppose that the trader is over-confident about the precision of his signal. While he thinks it’s noiseless, his signal actually contains noise:

    \begin{equation*} \mathit{Signal} = \alpha \cdot \mathit{Noise} + \sqrt{1 - \alpha^2} \cdot \mathit{News} \qquad \text{where} \qquad \mathit{Noise} \overset{\scriptscriptstyle \text{iid}}{\sim} \mathrm{N}(0, \, 1) \end{equation*}

And, the parameter \alpha \in [0, \, 1] governs the relative contribution of noise to the trader’s signal: \alpha = 0 corresponds to correct beliefs; whereas, \alpha = 1 corresponds to a signal that is pure noise. Then, given the trader’s optimal demand, suppose that the structural relationship between the trader’s noisy signal and realized returns is actually given by:

    \begin{equation*} \mathit{Return} = \beta \cdot \mathit{Signal} + \varepsilon \qquad \text{where} \qquad \varepsilon \sim \mathrm{N}(0, \, 1) \end{equation*}

Notice that, in reality, idiosyncratic-return shocks are no longer drawn IID. Let \rho \overset{\scriptscriptstyle \mathrm{def}}{=} \mathrm{Cor}[\mathit{News}, \, \varepsilon] denote the correlation between the news about fundamentals in the trader’s signal and idiosyncratic-return shocks. e.g., in a model of disagreement, you might think about \rho < 0 due to the existence of another trader whose disagreement stems from negatively correlated signals or negatively correlated mistakes.

The Basic Recipe

Suppose that the trader, who doesn’t realizing that he’s getting a noisy signal, is still carefully monitoring price informativeness. i.e., he’s carefully monitoring the relationship between his signal and realized returns. Here’s what it would take for this trader to rationalize his error in beliefs. Notice that the covariance of the trader’s signal and market returns is given by:

    \begin{align*} \mathrm{Cov}[\mathit{Return}, \, \mathit{Signal}] &= \mathrm{Cov}[\beta \cdot \mathit{Signal} + \varepsilon, \, \mathit{Signal}] \\ &= \mathrm{Cov}[\beta \cdot \mathit{Signal}, \,\mathit{Signal}] + \mathrm{Cov}[\varepsilon, \, \mathit{Signal}] \\ &= \beta + \mathrm{Cov}\big[ \, \varepsilon, \, \alpha \cdot \mathit{Noise} + \sqrt{1-\alpha^2} \cdot \mathit{News} \, \big] \\ &= \beta + \rho \cdot \sqrt{1-\alpha^2} \end{align*}

So, since the variance of his signal is \mathrm{Var}[\mathit{Signal}] = 1, if the trader regresses realized returns on his signal, he’ll find a slope coefficient of

    \begin{equation*} \hat{\beta}^{\text{OLS}} \overset{\scriptscriptstyle \mathrm{def}}{=} {\textstyle \frac{\mathrm{Cov}[\mathit{Return}, \, \mathit{Signal}]}{\mathrm{Var}[\mathit{Signal}]}} = \beta + \rho \cdot \sqrt{1-\alpha^2} \end{equation*}

Thus, if a researcher chooses the values of \rho and \alpha so that \beta^{\star} = \hat{\beta}^{\text{OLS}} = \beta + \rho \cdot \sqrt{1-\alpha^2}, then the trader will see data that’s consistent with his erroneous belief about his signal being noiseless.

It’s important to emphasize that, when a researcher chooses \alpha and \rho so that \beta^{\star} - \beta = \rho \cdot \sqrt{1-\alpha^2}, he’s not giving the trader correct beliefs, though. Although price informativeness will look correct to the trader, his error in beliefs will still cause returns to respond to pure noise. The covariance of noise and returns will be:

    \begin{align*} \mathrm{Cov}[\mathit{Return}, \, \mathit{Noise}] &= \mathrm{Cov}[\beta \cdot \mathit{Signal} + \varepsilon, \, \mathit{Noise}] \\ &= \mathrm{Cov}\big[ \, \beta \cdot \big( \, \alpha \cdot \mathit{Noise} + \sqrt{1-\alpha^2} \cdot \mathit{News} \, \big), \, \mathit{Noise} \, \big] + \mathrm{Cov}[\varepsilon, \, \mathit{Noise}] \\ &= \beta \cdot \alpha \end{align*}

So, returns will react to pure noise whenever \alpha > 0. And, in principle, the trader could notice this fact if he cared to inspect \mathrm{Cov}[\mathit{Return}, \, \mathit{Noise}] rather than just \mathrm{Cov}[\mathit{Return}, \, \mathit{Signal}].

Like Stage Magic

That’s how you write down a model where biased traders can rationalize their own errors in belief. The basic recipe is simple enough. Just introduce a hidden correlation into the information structure of the model (i.e., the parameter \rho) and then fine-tune this correlation so that it cancels out the effects of the trader’s behavioral bias (the parameter \alpha). It’s really pretty when this sort of cancellation takes place. Models that manage to use this basic recipe, such as Scheinkman and Xiong (2003), are really beautiful. But, this approach raises an obvious question: in the real world, why should we expect \alpha and \rho to take on the precise values needed to hide a trader’s error? Where does the required fine-tuning come from? What’s the underlying mechanism at work?

These models are like stage magic. They’re expertly scripted illusions that demonstrate how behavioral biases can go undetected… even by traders who are actively trying to detect them. And, this is not a slight. This is really informative in the same way that going to a good magic show is really informative. It teaches you something useful about the limits of human perception, about how your attention can be managed, about how you can be deceived. But, you don’t leave magic shows thinking that the next deck of cards you open will contain 52 copies of the 6\clubsuit because you happened to be thinking of that card when you opened the box. No one expects everyday situations to operate by the rules of stage magic. Most of the time, there’s no magician to carefully script the illusion. And, the same logic applies to financial markets. It’s useful to know that you can fine-tune parameters to hide an error, but we shouldn’t assume that markets typically operate with the parameters dialed in this way. Why should we? Who exactly would be the one turning the knobs?

Filed Under: Uncategorized

« Previous Page
Next Page »

Pages

  • Publications
  • Working Papers
  • Curriculum Vitae
  • Notebook
  • Courses

Copyright © 2026 · eleven40 Pro Theme on Genesis Framework · WordPress · Log in