Research Notebook

Generalizing from lab experiments to real-world markets

May 20, 2022 by Alex

When you ask people to trade an asset with an unknown terminal payout in a lab experiment, it’s really common to observe a boom in the asset’s price followed by a sudden crash right before the trading session ends. In other words, it’s very common to observe price paths that resemble a bubble. Bubbles are a regular occurrence in the lab even when participants are given no additional info about the assets they are trading and are not allowed to talk with one another.

By contrast, the recent literature on real-world asset bubbles tends to emphasize the role played by specific asset characteristics and social interactions. For example, Greenwood, Shleifer, and You (2019) document that price booms in industries with lots of young firms are more likely to followed by a crash. In my recent paper, The Ex Ante Likelihood of Bubbles, I show that speculative bubbles are more likely in assets where small price increases make excited speculators much more persuasive to their friends.

What are we to make of these two contradictory sets of results? I have a hunch that the boom/bust price paths observed in lab experiments are not the same phenomenon as the asset bubbles observed out in the wild. But this is just a hunch. And, as highlighted above, I’m definitely not an impartial judge on this topic.

So how might you check if the trading phenomenon that you’re studying in the lab is the same as the one you observe in real-world financial markets? It’s common to hear asset-pricing researchers say something like: “There’s *ALWAYS* a chance that a lab experiment has missed some important real-world consideration.” It’s certainly true that researchers should worry about omitted variables. But is it really the case that this problem is totally unsolvable? The *ALWAYS* part of the claim strikes me as too strong.

In this post, I describe an analogous problem from the physical sciences and show how you might solve it using dimensional analysis. My point is not that asset-pricing researchers should use dimensional analysis (but that ain’t a terrible idea). My point is that you can theoretically analyze the relationship between the lab and real-world asset markets. Asset-pricing researchers should be doing more of that. Is trading in the lab governed by the same economic forces as trading in real life? This need not be a philosophical question.

An analogous problem

Imagine you’re an engineer who works at a company that makes personal submarines, and your team’s most recent prototype turned out to be much slower in field tests than expected. Something about the craft must be generating excess drag, and you want to figure out what.

You have the data on the prototype submarine’s performance during the recent field tests done out in the ocean. But this craft is 18ft long and weighs several tons. It’s entirely impractical to tinker with. So you create a 1ft mock-up that you can quickly and cheaply experiment on in a wave pool.

But how can you be sure that these results will apply to the full-sized submarine? How can you be sure that your lab experiment isn’t omitting an important real-world consideration? And what is the correct way to scale up the results in the wave pool to results in the ocean?

This an analogous problem to the one outlined above in the introduction. Instead of worrying about the drag force experienced by a submarine, asset-pricing researchers are interested in some property of speculative bubbles (i.e., timing/severity of the crash, likelihood of future occurrence, etc). And, instead of using a 1ft model in a wave pool, they run market simulations in university labs. But the resulting question are identical: are the findings in the lab informative about the real world?

Here are the relevant variables for determining the submarine’s drag, F: the length of the ship, L, the speed at which the ship is traveling, S, the density of the liquid it is traveling through, D, and the viscosity of this liquid, V. The functional form of the relationship between these variables

(1)   \begin{equation*} F = \mathrm{g}( L, \, S, \, D, \, V ) \end{equation*}

will be determined by the nitty-gritty details of the submarine design. You don’t know this function. But, for your wave-pool experiments to be informative about the drag experienced by a full-sized submarine, this function needs to be the same for both. Any differences between the drag experienced by an 18ft submarine in the ocean and the drag experienced by a 1ft in your lab should be determined by the inputs L, S, D, and V.

Consider the alternative. Suppose there are other important factors in determining drag in the open ocean besides L, S, D, and V. Or suppose your wave pool is introducing a new variable that is irrelevant to drag force in the open ocean. In either case, your analysis of a toy 1ft submarine will not generalize to the real-world 18ft object. If it turns out to be the case that

(2)   \begin{equation*} \mathrm{g}(\cdot) = \begin{cases} \mathrm{g}_{\text{sea}}(\cdot) &\text{for 18ft sub in the ocean} \\ \mathrm{g}_{\text{lab}}(\cdot) &\text{for 1ft sub in a wave pool} \end{cases} \end{equation*}

then your lab experiments will not be informative.

Dimensional analysis

You want to know when/how/whether there is any way to address this concern. Dimensional analysis offers one way to do this. “We are allowed to add, or subtract, various quantities together only if they are expressed in the same units. Thus, the left hand and right hand sides of an equation… must have the same physical dimensions.” And it turns out that it’s possible to leverage this seemingly banal observation to test whether \mathrm{g}_{\text{sea}}(\cdot) = \mathrm{g}_{\text{lab}}(\cdot). Here’s how.

I will use [\cdot] to denote the dimension of a variable. The length of the ship, L, has dimension [L] = \mathit{length}. It doesn’t matter whether this length has units of feet, meters, or parsecs. The speed at which the submarine is traveling, S, has dimensions of length per unit of time, [S] = \mathit{length} / \mathit{time}. Drag, F, is a force, and Newton taught us that force equals mass times acceleration:

(3)   \begin{equation*} [F] = \mathit{mass} \cdot \underbrace{(\mathit{length} / \mathit{time}^2)}_{\substack{\text{dimension of} \\ \text{acceleration}}} \end{equation*}

Density, D, measures how much stuff there is per unit of volume, [D] = \mathit{mass} / \mathit{length}^3. Finally, viscosity, V, measures how much force is needed to deform a given area per unit of time, [V] = \mathit{mass} / (\mathit{length} \cdot \mathit{time}).

Suppose it’s possible to input L = \ell_0, S = s_0, D = d_0, and V = v_0. When you do this, the output is \mathrm{g}(\ell_0, \, s_0, \, d_0, \, v_0) = f_0. If the functional form in Equation (1) holds across a wide-range of values, then it should also hold in the special case where L = \ell_0, S = s_0, D = d_0/d_0 = 1 and V = v_0/d_0. The original values of L = \ell_0, S = s_0, D = d_0, and V = v_0 were just arbitrary choices from the domain of \mathrm{g}(\cdot).

But notice the problem that this creates. By setting D = d_0/d_0 = 1 and V = v_0/d_0, we’ve just stripped the dimension of \mathit{mass} from the inputs to \mathrm{g}(\cdot): [L] = \mathit{length}, [S] = \mathit{length} / \mathit{time}, and [V/D] = \mathit{length}^2 / \mathit{time}. So there is no way for us to balance the dimensions on the right-hand and left-hand sides of Equation (1) without making an additional change.

The simplest possible change would be to multiply by d_0. If we do that, we get:

(4)   \begin{equation*} d_0 \times \mathrm{g}\Big( \, \ell_0, \, s_0, \, 1, \, \frac{v_0}{d_0} \, \Big) \end{equation*}

That way, we would have [d_0] = \mathit{mass} / \mathit{length}^3 times [\mathrm{g}( \, \ell_0, \, s_0, \, 1, \, v_0/d_0 \, )] = \mathit{length}^4 / \mathit{time}^2. This yields a combined dimension of \mathit{mass} \cdot (\mathit{length} / \mathit{time}^2) for the right-hand side, which is exactly what we’re after.

Of course, there’s nothing special about the density input to \mathrm{g}(\cdot). We could do a similar trick for length, L, and speed, S. If we set L = \ell_0/\ell_0, S = s_0/s_0, and V = v_0 / (d_0 \cdot s_0 \cdot \ell_0), then we would need to multiply the right-hand side by an additional factor of (s_0^2 \cdot \ell_0^2) to preserve dimensional consistency:

(5)   \begin{equation*} d_0 \cdot (s_0^2 \cdot \ell_0^2) \times \mathrm{g}\Big( \, 1, \, 1, \, 1, \, \frac{v_0}{d_0 \cdot s_0 \cdot \ell_0} \, \Big) \end{equation*}

Again, s_0^2 and \ell_0^2 are chosen to balance the dimensions on either side of the equal sign in Equation (1).

What’s more, there was nothing special about our original choice of inputs: L = \ell_0, S = s_0, D = d_0, and V = v_0. These could have been anything in domain so long as we used the same values throughout. So, if we drop the ones as inputs to \mathrm{g}(\cdot) and rearrange things a bit, we get:

(6)   \begin{equation*} \underbrace{\frac{F}{D \cdot S^2 \cdot L^2}}_{\text{Drag coefficient}} = \mathrm{G}\Big( \, \underbrace{L \cdot \Big\{\frac{D \cdot S}{V} \Big\}}_{\substack{\text{Reynolds} \\ \text{number}}} \, \Big) \end{equation*}

The left-hand side is dimensionless, [F/ (D \cdot S^2 \cdot L^2)] = 1, and commonly called the “drag coefficient”. The input to \mathrm{G}(\cdot) is called the Reynolds number, \mathit{Re}, and is also dimensionless, [L \cdot \{(D \cdot S) / V\}] = 1.

Why this helps

We didn’t know anything about the functional form of \mathrm{g}(\cdot) when we started our analysis. And we still don’t know anything about the functional form of \mathrm{G}(\cdot). We have to empirically estimate both function based on observed data. So it might not be immediately obvious how dimensional analysis has improved things.

What’s more, every textbook I’ve read has talked about the benefits of dimensional analysis in terms of variable elimination. \mathrm{g}(L, \, S, \, D, \, V) was a function of four variables. \mathrm{G}(\mathit{Re}) is a function of a single variable. These textbooks argue that it’s easier to estimate \mathrm{G}(\cdot) than \mathrm{g}(\cdot) because L, S, D, and V have been combined into a single variable, \mathit{Re}. This is true. But it’s not why dimensional analysis is helpful here.

Dimensional analysis is helpful for assessing the external validity of your wave-pool experiments because you know the way that the single variable, \mathit{Re} = L \cdot \{(D \cdot S) / V\}, combines the four other inputs L, S, D, and V. As a result, you can compare predictions made using Reynolds numbers that were arrived at in very different empirical settings.

For example, suppose that you observe a particular drag coefficient in the data from the recent open-ocean field tests involving the full-sized 18ft submarine:

(7)   \begin{equation*} \frac{F_{\text{18ft, Sea water}}}{D \cdot S^2 \cdot \{18\mathrm{ft}\}^2} = \mathrm{G}\Big( \, \{ 18\mathrm{ft}\} \cdot \Big\{\frac{D \cdot S}{V_{\text{Sea water}}} \Big\} \, \Big) \end{equation*}

If you fill your wave pool with a liquid that is 18-times as viscous as sea water, like honey V_{\text{Honey}} = 18 \cdot V_{\text{Sea water}}, do you observe a similar drag coefficient on your 1ft model submarine?

(8)   \begin{equation*} \frac{F_{\text{1ft, Honey}}}{D \cdot S^2 \cdot \{1\mathrm{ft}\}^2} = \mathrm{G}\Big( \, \{ 1\mathrm{ft}\} \cdot \Big\{\frac{D \cdot S}{V_{\text{Honey}}} \Big\} \, \Big) \end{equation*}

The first calculation comes from real-world data. The second comes from your lab.

Do they match? If yes: great!! Your lab experiment captures the main forces at work in your open-ocean field test, \mathrm{g}_{\text{sea}}(\cdot) = \mathrm{g}_{\text{lab}}(\cdot). If no: bummer. Either your lab experiment is introducing unwanted variables or it is missing an important consideration. Either way, you should not assume your lab results can be applied to the next full-sized prototype submarine your company turns out.

The original problem

Asset-pricing researchers regularly see prices that boom and bust in trading experiments. These episodes look a whole helluva lot like speculative bubbles. But are they? Are the forces that explain these episodes the same ones that produced the Dotcom bubble? Or are these two different phenomenon? I really want to know the answer to this question.

The point of this post is not to argue that dimensional analysis is the obvious way forward. I wrote about an “analogous problem” rather than the problem I’m actually interested in because it’s not immediately obvious how to apply dimensional analysis to study speculative bubbles. It might be possible. But I haven’t managed to select the right variables. If I knew how to do this, this would be a paper and not a blog post.

Instead, my goal is to argue that it’s possible to assess whether the phenomenon observed in the lab is the same as the one observed out in the real world. This is true when the lab experiment involves a 1ft submarine and a wave pool. It’s also true when the lab experiment involves undergrads buying and selling a fictitious stock. Lab experiments are a major source of knowledge in the social sciences. Rather than complaining that a given experiment might be missing something important about the real world, asset-pricing researchers should work towards developing methods for verifying external validity. The application of dimensional analysis to the submarine example above is a proof of concept. It can be done. It is done in other fields.

Filed Under: Uncategorized

Causal inference as a tool for publishing robust results

December 17, 2021 by Alex

Imagine you’re an asset-pricing researcher. You’ve just thought up a new variable, X, that might predict the cross-section of returns. And you’ve regressed returns on X in a market environment e of your choosing (i.e., using data on some specific time period, country, asset class, set of test assets, etc):

(1)   \begin{equation*} R(i) = \alpha_e + \beta_e \cdot X(i) + \epsilon_e(i) \qquad  \text{for assets } i=1,\ldots,\,I \end{equation*}

If differences in X predict differences in returns in your chosen market environment e, the estimated slope coefficient will large, |\beta_e| \gg 0. It would’ve been profitable to trade on the predictor in sample.

Suppose you find \beta_e \gg 0. Assets with higher X values today tend to have higher returns tomorrow. You now face a choice about whether to publish this finding. If you do, then other researchers will read your paper and try to replicate it in other market environments you haven’t yet looked at, e' \neq e. Let \mathsf{OoS}_e denote the collection of all out-of-sample market environments that your colleagues might examine.

Obviously, you shouldn’t publish if X isn’t a good cross-sectional predictor in most of these out-of-sample environments—i.e., you shouldn’t publish if \text{average}_{e' \in \mathsf{OoS}_e} \, \beta_{e'} = \frac{1}{|\mathsf{OoS}_e|} \cdot \sum_{e' \in \mathsf{OoS}_e} \, \beta_{e'} < 0. But, even if X is a good predictor on average, you still worry about worst-case scenarios. If there’s one market environment e' \in \mathsf{OoS}_e where \beta_{e'} \ll 0, then one of your colleagues will surely discover it and you’ll look utterly foolish when he tells the world. You only want to publish if X robustly predicts returns out-of-sample:

(2)   \begin{equation*} (1-\lambda)  \cdot  \underset{e' \in \mathsf{OoS}_e}{\text{average}} \, \beta_{e'} + \lambda  \cdot \underset{e' \in \mathsf{OoS}_e}{\text{minimum}\phantom{j}} \!\beta_{e'} \geq  0 \end{equation*}

\lambda \in (0, \, 1] captures the relative importance of these two considerations to your publication decision. The larger the \lambda, the more you care about saving face by not publishing any really bad predictions.

Importantly, let’s assume that all you care about when doing research is solving this robust out-of-sample prediction problem. You don’t care at all about whether investors actually price assets based on X. All that matters is whether X reliably predicts returns out-of-sample. You’re completely drunk on Friedman’s “as if” Kool-Aid. Before deciding whether to publish, you have a choice as to which market environment to examine. What sort of environment should you choose? What should your empirical strategy be?

The key insight in this post is that, even if all you care about is robust out-of-sample performance, causal inference still turns out to be a useful tool for achieving this goal. If investors always use the same model to price assets, then understanding this model will allow you to always make good predictions. Your empirical strategy should be to choose an empirical environment e that identifies the causal effect of X on returns.

Investors’ model

I begin by defining investors’ model. Suppose that, in every market environment, investors price each asset so that its returns are governed by the following linear structural model:

(3)   \begin{equation*} R \leftarrow  \theta_{\star} \cdot X + \vartheta_{\star} \cdot U + \sigma_{\star} \cdot N \end{equation*}

Moreover, assume that the parameters, (\theta_{\star}, \, \vartheta_{\star}, \, \sigma_{\star}), are the same in every market environment. X is the cross-sectional predictor that you’re working on, and U is an omitted variable. This is a variable that investors might be using to price assets but researchers have yet to discover. If it’s 1981 and X is firm size, then U might be liquidity. N is a noise term and \sigma_{\star} > 0 captures its affect on returns.

Crucially, either X affects returns, \theta_{\star} \neq 0, or U affects returns, \vartheta_{\star} \neq 0, but not both in investors’ model. If \theta_{\star} \neq 0, then X reliably predicts the cross-section of returns since \theta_{\star} is the same in every environment—i.e., in every time period, country, etc. If \vartheta_{\star} \neq 0, then any predictability associated with X is spurious. Let

(4)   \begin{equation*} \mathsf{\Theta} = \{ \, (\theta, \, \vartheta) \in [-1, \, 1]^2 \, | \, 1_{\{\theta \neq 0\}} + 1_{\{ \vartheta \neq 0 \}} = 1 \, \} \end{equation*}

denote the entire range of possible values that \theta_{\star} and \vartheta_{\star} might take on.

To keep things simple, suppose that the realized values of X, U, and N for each asset in a given market environment are drawn IID normal:

(5)   \begin{equation*} \begin{pmatrix} X \\ U \\ N \end{pmatrix} \overset{\text{IID}}{\sim}  \mathrm{Normal} \left( \begin{bmatrix} 0 \\ 0 \\ 0 \end{bmatrix}, \begin{bmatrix} 1 & \rho_e & 0 \\ \rho_e & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \right) \end{equation*}

X, U, and N all have mean zero and unit variance. The noise term N is uncorrelated with both X and U in every market environment, \Corr_e[X, \, N] = \Corr_e[U, \, N] = 0. However, X and U may be correlated across stocks, \Corr_e[X, \, U] = \rho_e \neq 0. Moreover, this correlation can differ across market environments. In other words, X and U may be highly correlated in one time period/country/asset class/etc but not in another.

Note that Equations (3) and (5) imply asset returns are zero on average, \Exp_e[R] = 0, in every market environment. I’m making this assumption to keep the math simple. If it really bothers you, just think about R as an asset’s residual return that unexplained by other trading signals. Let’s also assume that \Var_e[R] = 1 in every market environment for the same reasons. This assumption implies that \sigma_{\star} = \sqrt{1 - (\theta_{\star}^2 + \vartheta_{\star}^2)}.

Two explanations

When you regressed the cross-section of returns on X in your chosen market environment e, you found that \beta_e \gg 0. Given the structure of investors’ model, we know that either X predicts the cross-section of returns or U predicts the cross-section of returns but not both:

(6)   \begin{equation*} \beta_e = \begin{cases} \theta_{\star} &\text{if } \theta_{\star} \neq 0 \\ \vartheta_{\star} \cdot \rho_e &\text{if } \theta_{\star} = 0 \end{cases} \end{equation*}

It might be that you estimated \beta_e \gg 0 in market environment e because \theta_{\star} \gg 0 in every environment. Or it might be that you estimated \beta_e \gg 0 in market environment e because X happened to be correlated with an omitted variable in that environment, \vartheta_{\star} \cdot \rho_e \gg 0. These are the two possible explanations.

Since you’re focused on robust out-of-sample performance in other market environments e' \neq e, the reason why \beta_e \gg 0 in-sample is very important. If \beta_e \gg e merely because \vartheta_{\star} \cdot \rho_e \gg 0, then X will only be a good cross-sectional predictor in other market environments e' \in \mathsf{OoS}_e where X and U are similarly correlated, \mathbb{S}\mathrm{ign}[\rho_e] = \mathbb{S}\mathrm{ign}[\rho_{e'}]. Under this explanation, it’s possible to imagine market environments where X is an abysmal predictor. Just look for environments where \mathbb{S}\mathrm{ign}[\rho_e] \neq \mathbb{S}\mathrm{ign}[\rho_{e'}].

Causal inference

What needs to be true about market environment e if you want to be able to distinguish between these two explanations? The answer boils down to an identifying assumption about the range of values that \Corr_e[X, \, U] = \rho_e might take on:

(7)   \begin{equation*} \mathsf{P}_e = \{ \, \rho \in [-1, \, 1] \, | \, \text{it could be that $\Corr_e[X, \, U] = \rho$ in market environment $e$}  \, \} \end{equation*}

A market environment, e = \{ \theta, \, \vartheta; \mathsf{P}, \, \rho \}, consists of a set of structural parameters, (\theta, \, \vartheta); a range of possible values for the correlation between X and U, \mathsf{P}; and, a particular choice for this value, \rho \in \mathsf{P}.

If market environments e and e' have the same structural parameters, (\theta,\,\vartheta,\,\rho_e) = (\theta,\,\vartheta,\,\rho_{e'}), then the cross-sectional slope coefficient will be the same in both environments, \beta_e = \beta_{e'}. Yet, you will interpret the slope coefficient differently in each environment if \mathsf{P}_e \neq \mathsf{P}_{e'}. By analogy, medical researchers will draw different conclusions about a drug’s efficacy from an RCT than from an observational study even if the joint distribution of patient outcomes and observable characteristics is the same in both datasets. If \{ 0 \} = \mathsf{P}_e, then \beta_e \gg 0 identifies \theta_{\star} \gg 0 as the correct explanation. There’s no way to have \beta_e = \vartheta_{\star} \cdot 0 \gg 0 in such an environment. By contrast, if \{ 0\} \subset \mathsf{P}_{e'}, then \beta_{e'} \gg 0 could be explained either by \theta_{\star} \gg 0 or by \vartheta_{\star} \cdot \rho_{e'} \gg 0.

Note that it isn’t possible to choose a market environment where \mathsf{P}_{e'} consists of an arbitrarily small neighborhood around zero. The omitted variable U can explain no more than 100\% of the variation in returns across assets. That would occur if |\vartheta_{\star}| = 1 since we are assuming \Var_{e'}[R] = 1. Hence, if \theta_{\star} = 0 and \beta_{e'} = \vartheta_{\star} \cdot \rho_{e'} \gg 0 due to a spurious correlation, then this correlation must be bounded away from zero:

(8)   \begin{equation*} \beta_{e'} =  \vartheta_{\star} \cdot \rho_{e'} \leq  1 \cdot \rho_{e'} \qquad \Rightarrow \qquad (-\beta_{e'}, \, 0) \cup (0, \, \beta_{e'}) \not\subset \mathsf{P}_{e'} \end{equation*}

This digital zero/non-zero distinction is why it’s possible to map out causal effects using path diagrams. A path between two variables must be contemplated whenever they could have a non-zero correlation.

Out-of-sample environments

When you regressed the cross-section of returns on X in market environment e, you found \beta_e \gg 0. We can now give a precise definition for the set of all out-of-sample market environments that other researchers might try to replicate this finding in. Let

(9)   \begin{equation*} \mathsf{\Theta}_e = \{ \, (\theta, \, \vartheta) \in \mathsf{\Theta} \, | \, \text{$\beta_e = \theta + \vartheta \cdot \rho$ for some $\rho \in \mathsf{P}_e$}  \, \} \end{equation*}

denote the range of possible values for \theta_{\star} and \vartheta_{\star} that are consistent with your initial estimate \beta_e \gg 0 given \mathsf{P}_e. If X is guaranteed to be uncorrelated with the omitted variable, \{ 0 \} = \mathsf{P}_e, then \{ (\beta_e, \, 0) \} = \mathsf{\Theta}_e and we say that market environment e identifies changes in X as having a causal affect on the cross-section of returns.

Given the set of all \theta_{\star} and \vartheta_{\star} values that are consistent with your initial result \beta_e \gg 0, the range of potential out-of-sample market environments is defined as follows:

(10)   \begin{equation*} \mathsf{OoS}_e = \{  \,  e' = (\theta, \, \vartheta; \mathsf{P}, \, \rho )  \, | \,  \theta, \, \vartheta \in \mathsf{\Theta}_e \text{ and }  \rho \in [-1, \, 1] = \mathsf{P} \,  \} \end{equation*}

This collection of market environments consists of any environment which could be generated by some \theta_{\star} and \vartheta_{\star} that’s consistent with your initial result combined with any possible value of \rho \in [-1, \, 1].

Research strategy

If you chose a market environment e that identified the causal effect of X on the cross-section of returns for your initial test, then your estimate of \beta_e \gg 0 would imply that \beta_{e'} = \theta_{\star} \gg 0 in every out-of-sample market environment. The left-hand side of Equation (2) would reduce to:

(11)   \begin{equation*} \begin{split} (1-\lambda)  \cdot  \underset{e' \in \mathsf{OoS}_e}{\text{average}} \, \beta_{e'} + \lambda  \cdot \underset{e' \in \mathsf{OoS}_e}{\text{minimum}\phantom{j}} \!\beta_{e'} &= (1 - \lambda) \cdot \beta_e + \lambda \cdot \beta_e \\ &=  \beta_e \end{split} \end{equation*}

The finding would be robust out-of-sample, and you should publish it.

By contrast, if you chose a market environment e that did not identify the causal effect of X on the cross-section of returns, then your estimate of \beta_e \gg 0 would be harder to interpret. It could be that \beta_e = \theta_{\star} \gg 0 or it could be that \beta_e = \vartheta_e \cdot \rho_e \gg 0. If the latter is true, then we would say that \beta_e reflects a spurious correlation. And to make this spurious correlation look as bad as possible out-of-sample, other researchers should look for a market environment e' \in \mathsf{OoS}_e where \rho_{e'} = - 1:

(12)   \begin{equation*} \underset{e' \in \mathsf{OoS}_e}{\text{minimum}\phantom{j}} \!\beta_{e'} = -1 \end{equation*}

Absent identification, you have to entertain this possibility. So you may refuse to publish strong results with good average-case out-of-sample performance for fear of being embarrassed by worst-case predictions.

Thus, as outlined at the beginning, even if all you care about as a researcher is publishing results that have robust out-of-sample performance, causal inference still turns out to be relevant. It’s a very a useful tool for achieving this goal. If investors are always using the same model to price assets, then understanding this model will allow you to always make good predictions. So you should consider adopting a research strategy whereby you insist on testing each new predictor X in an identified market environment e.

No free lunch

I recognize that identifying causal effects is hard. Running RCTs is hard. Finding valid instrumental variables is hard. It’s hard to find a market environment that identifies the causal effect of a change in X on returns—i.e., to find a market environment e where it’s reasonable to assume that \{ 0 \} = \mathsf{P}_e.

So you might be thinking: “Can’t I just get around the problem by checking lots of different market environments before publishing? If \beta_{e'} = \beta_{e} \gg 0 in lots of different market environments e' \in \mathsf{OoS}_e, then shouldn’t I be more confident in X‘s out-of-sample performance? After all, in real life, no researcher would (or could!) publish a result about cross-sectional predictability based on one regression.”

It’s absolutely true that you do learn something about X‘s out-of-sample performance when you verify that \beta_{e'} = \beta_e \gg 0 in many different market environments e' \in \mathsf{OoS}_e. Unfortunately, the something that you learn only applies to X‘s average-case performance, \text{average}_{e' \in \mathsf{OoS}_e} \, \beta_{e'}. For example, if \beta_{e'} = 0.99 in more than half of all possible out-of-sample environments, then there’s no way for \text{average}_{e' \in \mathsf{OoS}_e} \, \beta_{e'} < 0 since we know that \beta_{e'} \geq -1 in every remaining market environment as we saw in Equation (12).

Yet, until you check every imaginable out-of-sample environment, you can say nothing new about the worst-case outcome. No matter how many environments you check in \mathsf{OoS}_e, you can never be certain that \beta_{e'} \neq -1 in one of the remaining environments in \mathsf{OoS}_e that you haven’t checked. Thus, if you care about never publishing a result that makes an embarrassingly bad out-of-sample prediction in some situation, \lambda > 0, then simply doing lots of in-sample checks isn’t a viable research strategy on its own. It certainly doesn’t hurt. But it DOES NOT tell you anything about out-of-sample robustness in the setup thusfar.

The thing that makes causal inference difficult is that it requires making a strong assumption about the joint distribution of X and an unobserved/unknown/omitted variable U in a particular market environment e. You have to assume that \{ 0 \} = \mathsf{P}. Such identifying assumptions can be hard to stomach. However, the assumption that you would need to make for in-sample robustness to guarantee out-of-sample robustness is even less palatable. TANSTAAFL. Instead of requiring that \rho_e = 0 in some specific market environment, you would need to assume that \vartheta_{\star} \cdot \rho_{e'} \gg 0 in all out-of-sample environments e' \in \mathsf{OoS}_e. Such an assumption is tantamount to simply assuming the result you’re after—namely, out-of-sample robustness.

Filed Under: Uncategorized

Market data, investor surveys, and lab experiments

December 2, 2021 by Alex

An asset-pricing model is a claim about which optimization problem people are solving when they choose their investment portfolios. One way to make such a claim testable is to derive a condition that should hold if people were actually solving this optimization problem. And the standard approach to testing whether an asset-pricing model is correct involves using market data to estimate the key parameters in this condition, plugging in the results, and checking if it does hold.

For example, the consumption capital asset-pricing model (CCAPM) argues that people view the stock market as a way to insure consumption shocks. If this is correct, then the expected return on the market should be equal to the riskfree rate plus a multiple of the market’s consumption beta, \beta = \frac{\Cov[R_{t+1}, \, \Delta \log C_{t+1}]}{\Var[\Delta \log C_{t+1}]}:

(1)   \begin{equation*} \Exp[R_{t+1}] = R_f + \lambda \cdot \beta \end{equation*}

The standard test of the CCAPM involves computing average stock returns and the market’s consumption beta using historical data, plugging in these values to Equation (1), and checking whether it holds.

Of course, this isn’t the only way to test an asset-pricing model. Researchers can use surveys to learn about which optimization problem people are solving (see here, here, and here). They can also run lab experiments where participants trade with one another in a simulated market environment (see here, here, and here). These two alternative approaches have gained popularity in recent years (here, here, and here).

Unfortunately, many researchers view investor surveys and lab experiments as just another way of generating the data needed to test the moment conditions implied by an asset-pricing model. The prevailing view is that the data produced by these two empirical approaches is just a poor substitute for market data. Not so! These empirical approaches are not substitutes. They are each good for doing different things. Investor surveys and lab experiments can answer questions that market data can’t address.

What sorts of empirical questions are best suited to being answered via sample statistics computed using market data? Where do survey responses have a comparative advantage? When can we learn the most from a well-designed lab experiment? This post outlines how I think about the answers to these questions.

Canonical asset-pricing model

To make things concrete, I’m going to talk about testing a particular asset-pricing model—namely, the CCAPM. This model says that a representative investor solves the following optimization problem:

(2)   \begin{equation*} \begin{split} \text{maximize} &\quad \Exp_0\left[ \, {\textstyle \sum_{t=0}^{\infty}} \, e^{-\rho \cdot t} \cdot \mathrm{U}(C_t) \, \right] \\ \text{subject to} &\quad \parbox[b]{1.00cm}{\raggedleft $\mathrm{U}(C)$} = {\textstyle \frac{C^{1-\gamma} - 1}{1-\gamma}} \\ &\quad \parbox[b]{1.00cm}{\raggedleft $W_t$} \geq C_t + P_t \cdot S_t + B_t \\ &\quad \parbox[b]{1.00cm}{\raggedleft $W_{t+1}$} = (P_{t+1} + D_{t+1}) \cdot S_t + (1+R_f) \cdot B_t \\ &\quad \parbox[b]{1.00cm}{\raggedleft $D_{t+1}$} = D_t \cdot e^{\mu + \sigma \cdot \varepsilon_{t+1}}  \quad  \varepsilon_{t+1} \overset{\scriptscriptstyle \textnormal{IID}}{\sim}  \mathrm{Normal}(0, \, 1) \\ \text{on inputs} &\quad \parbox[b]{1.00cm}{\raggedleft $W_0$}, \, D_0 \! > \! 0, \, \mu \! > \! \rho \! > \! 0, \, \sigma \! > \! 0, \, \gamma \! > \! 1, \, R_f \! > \! 0 \end{split} \end{equation*}

Here’s a line-by-line breakdown of what all this means. (Line 1; Objective) The investor tries to maximize the present discounted value of his future utility from consumption given time preference parameter \rho > 0. (Line 2; Preferences) The investor has power utility with risk aversion parameter, \gamma > 1. (Line 3; Budget Constraint) At time t, the investor must have enough wealth W_t to cover his consumption C_t AND to pay for the S_t shares of the market that he buys at a price of P_t dollars per share AND to pay for the B_t dollars he invests in riskless bonds. (Line 4; Wealth Evolution) The investor’s will receive a D_{t+1} dollar dividend for each share of the market he bought at time t AND he will be able to sell the share for P_{t+1} dollars AND he will earn the riskfree rate of R_f > 0 on each dollar he invested in bonds at time t. (Line 5; Dividend Process) The total market dividend grows at an average rate of \mu > 0 per period, and it has a per period volatility of \sigma > 0.

The solution to this model reveals that:

(3)   \begin{equation*} P_t = \Exp \left[ \, e^{-(\rho + \gamma \cdot \Delta \log C_{t+1})} \times (D_{t+1} + P_{t+1}) \,  \right] \end{equation*}

The price an investor is willing to pay at time t for a share of the stock market (think: level of the S&P 500) is equal to his expectation of the total future payout to owning this share at time t+1 (total payout = the dividend D_{t+1} plus the sale price P_{t+1}) discounted back at a rate that accounts for both his preference for getting paid earlier rather than later, \rho, and his aversion to fluctuations in consumption, \gamma \cdot \Delta \log C_{t+1}.

According to the CCAPM, the investor should be willing to pay more for the aggregate stock market if it provides insurance against drops in consumption, \Delta \log C_{t+1} < 0. And, if we define the return to holding a share of the market portfolio as 1 + R_{t+1} = (D_{t+1} + P_{t+1}) \, / \, P_t, then standard manipulations of Equation (3) result in the moment condition in Equation (1). Thus, if investors are solving the optimization problem in Equation (2), then the expected return on the market, \Exp[R_{t+1}], should be equal to the riskfree rate plus a term proportional to the market’s consumption beta, \beta = \frac{\Cov[R_{t+1}, \, \Delta \log C_{t+1}]}{\Var[\Delta \log C_{t+1}]}.

Why we care if investors are using this model

Physicists use the principle of least action to model the path that an object will take through a field. Under this approach, it is as if the object chooses the route that minimizes the total difference between its kinetic and potential energies through time. What will the arc of a baseball look like as it travels from the center fielder’s hand to the catcher’s mitt? We know that a baseball cannot actually CHOOSE anything. Yet the predictions made by acting as if it can describe the ball’s trajectory so well that physicists find it productive to pretend otherwise.

You can imagine a world where the same reasoning applies to the CCAPM. It could be that the CCAPM makes predictions that fit the data so well that it’s useful to pretend that investors solve the optimization problem in Equation (2) regardless of whether or not they actually do. We do not live in such a world. The central empirical prediction of the CCAPM (Equation 1) is not true on average. And, even if it were, most of the variation in expected returns over time and across assets cannot be explained by differences in consumption betas. Consumption growth is nowhere near volatile enough.

In general, even when an asset-pricing model’s predictions are correct on average, the fit is poor. Financial markets are complex. Asset-pricing models are simple caricatures of the forces at work. The goal is to write down a model that captures some small kernel of truth about these forces. An asset-pricing model is only useful to the extent that we can count on its noisy predictions holding up in novel as-yet-unseen time periods, countries, asset classes, etc. What would asset prices look like if X changed? Researchers write down asset-pricing models so that we can analyze such counterfactual settings. This is the point of a model.

Researchers care about whether investors are actually solving the optimization problem associated with a given asset-pricing model because this is what gives us confidence that we can count on the model’s noisy predictions in those settings. If investors are usually solving the optimization problem in Equation (2), then we can trust the model’s predictions to be roughly correct in a new market setting where the input parameters had changed. Why? Investors are usually solving that problem. A researcher who knows which optimization problem investors are solving can make robust out-of-sample predictions.

With this background in place, I can now describe the sort of question that’s best suited to be answered by each kind of empirical approach: market data, investor surveys, and lab experiments.

Are the model’s empirical predictions correct?

An asset-pricing model is a claim about which optimization problem investors are solving. The resulting first-order conditions imply that certain moment conditions should hold in the data. If your goal is to test whether these predictions are true in a given market setting, then start by looking at market data. For instance, if you want to test whether the expected return on the market is equal to the riskfree rate plus a multiple of the market’s consumption beta as predicted by the CCAPM in Equation (1), then it makes sense to estimate \hat{\Exp}[R_{t+1}] and \hat{\beta} using historical data on market returns and aggregate consumption.

Of course, the \Exp[R_{t+1}] in Equation (1) represents INVESTORS’ beliefs about what future returns will look like on average. Likewise, the \beta = \frac{\Cov[R_{t+1}, \, \Delta \log C_{t+1}]}{\Var[\Delta \log C_{t+1}]} represents INVESTORS’ beliefs about what the covariance between market returns and consumption growth will be. So you should also be able to ask investors questions about these two statistical objects. And the results should also satisfy Equation (1).

This is the usual way that researchers use investor surveys to test asset-pricing models (see here, here, and here). They ask investors about the key parameters in some moment condition. Then, they plug the results into this condition and check whether it holds. This is a perfectly fine thing to do. But it’s not an application where surveys have a comparative advantage over other empirical approaches.

Are they correct for the right reasons?

An asset-pricing model’s predictions can be correct on average for reasons that have nothing to do with the ones in the model. We know that differences in consumption betas do not explain differences in expected returns as predicted by the CCAPM. But, even if they did, it would not imply investors were actually solving the optimization problem in Equation (2) when choosing investment portfolios. Any observed correlation between expected returns and consumption betas could be a spurious correlation.

As I point out in my recent JF paper with Sam Hartzmark and Abby Sussman, investor surveys give researchers a way to investigate why a model’s predictions do or don’t hold in the data. Surveys give investors an opportunity to show their work—to describe how they arrived at the outcomes we observe in the data. This is something that you can’t examine using market data alone. This is where investor surveys have a comparative advantage over other empirical approaches (see this post for more details).

At this point, you might be thinking: “But prices can move for reasons that no individual investor can understand.” I totally agree. But this is beside the point when testing an asset-pricing model. Asset prices can move for lots of reasons. An asset-pricing model makes a claim about what one of those reasons is. The fact that there are also other reasons is neither here nor there.

We expect the set of moment conditions implied by an asset-pricing model to hold in the data because investors are maximizing the objective function in the model. So there should be some evidence that investors are actually trying to do this. And surveys are the best way to gather such evidence. Every investor doesn’t need to be thinking exactly like the agents in a model. But it’s problem if no investor thinks that way.

For what parameters does the model apply?

This brings us to the last empirical approach: lab experiments. The main concern with this sort of approach is that the experimental setup in a lab might be missing something important about how real-world markets operate, and it’s hard to know what that missing something is. The only evidence might be that your results from the lab don’t line up with what you observe in the real world.

This is a feature not a bug.

Suppose you think that some optimization problem like the one in Equation (2) captures the essence of some important market phenomenon. The resulting moment conditions are satisfied when you plug in parameter estimates using market data. And, when you ask traders about how they are trading, you get responses that are consistent with the logic of your model. But, when you set up a trading game where your model should govern participant behavior, you find no evidence of the phenomenon you’re interested in. You’ve just learned that you’re missing something. At the very least, you need tweak the input parameters.

When you run a trading game, do you get results that are consistent with the model for sensible input parameters? Do you tend to get similar results when you run lab experiments with the same input parameters? When you adjust these input parameters, at what point do the predictions of the asset-pricing model start to diverge from the results of the experiments? These are the right sorts of questions to be asking via lab experiments. A lab experiment SHOULD NOT be the first test of an asset-pricing model. In epidemiology, “Check whether a healthy person gets sick if you inject him with the cultured microorganism.” is the third of Koch’s postulates not the first.

If you show that some outcome can be generated by a trading game, you have no idea whether the conditions responsible for the outcome occur in the wild. Lab experiments are useful when working in the other direction. Suppose you think you fully understand the conditions that lead to a certain outcome in real-world financial markets. Ok, then you should be able to reproduce this outcome by simulating those conditions in the lab. Once you do that, you can also use the experiment to see how far can you adjust the input parameters and still get valid predictions. In addition to verifying your understanding, lab experiments also offer a way to assess the effective range of an asset-pricing model.

Filed Under: Uncategorized

Many explanations for the same fact

June 18, 2021 by Alex

Asset-pricing research consistently produces many different explanations for the same empirical facts. As a rule of thumb, you should expect asset-pricing researchers to wildly overachieve. Behavioral researchers can typically point to several psychological biases which might explain the same anomaly. e.g., it is possible to argue that the excess trading puzzle is due to a preference for gambling, the disposition effect, and social interactions among other things. In cross-sectional asset-pricing, there is an entire zoo of different explanations for the size and value effects, most of which “have little in common economically with each other.” In a 2017 review article, John Cochrane lists ten different explanations for the equity premium puzzle.

Unfortunately, the existence of so many different explanations for the same few facts is a real problem for the field. With so many to choose from, which explanation should a researcher use when evaluating counterfactuals or doing policy analysis? While each explanation is consistent with observed market data, different explanations will generally make different predictions out-of-sample in novel market environments. e.g., even if firm size and liquidity were perfectly correlated in all past data, researchers would still care which was the correct explanation for the size effect because they’d want to know what to expect if they ever encountered a bunch of large illiquid stocks.

So what is it about the asset-pricing research process that consistently produces multiple explanations for the same set of facts? That’s the topic of this post.

The Research Process

To answer this question, I need a working model of what it means to do asset-pricing research. What does the process entail? What counts as a new empirical fact? When asset-pricing researchers encounter one, how do they go about constructing a model to explain it? What primitives do they start with? What’s the general form of the model they eventually arrive at? How do different models differ from one another?

Asset-pricing models study the behavior of investors who can take actions today and want things tomorrow. I will use a to denote an action that investors can take today. e.g., think about allocating a portfolio, deciding how much to consume, or making a capital improvement. I will use w to denote a future outcome that investors want. e.g., think about consumption, wealth, or eternal happiness in the afterlife.

An asset-pricing model is a constrained optimization problem of the form

(1)   \begin{equation*} \begin{split} {\textstyle \max_a} &\phantom{=} \Exp[ \, \mathrm{Utility}(w) \, ] \\ \text{s.t.} &\phantom{=} 0 \geq \mathrm{Constraints}(a) \end{split} \end{equation*}

The \max_a(\cdot) operator captures the idea that investors strategically choose which actions to take today, a. They realize that their choice of actions today, a, affects how much of the thing they want they get to enjoy tomorrow, w = \mathrm{W}(a). The function \mathrm{Utility}(w) denotes the utility that investors get from having a given amount of w tomorrow. The expectation operator \Exp[w] = \int \, w \cdot \mathrm{pdf}(w|a) \cdot \mathrm{d}w represents their conditional beliefs about the likelihood of w tomorrow given their choice of actions today with \mathrm{pdf}(w|a) denoting their subjective probability distribution function. This is the distribution that investors have in their heads, and because it is a subjective distribution, it might not be objectively correct. Investors also realize that they face various constraints, which I write as the requirement 0 \geq \mathrm{Constraints}(a). Not all actions are possible today.

All asset-pricing models agree on this basic setup. Different asset-pricing models just make different claims about the functional forms of investor preferences, \mathrm{Utility}(w); beliefs, \mathrm{pdf}(w|a); and, constraints, \mathrm{Constraints}(a). Models based on habit formation, recursive utility, hyperbolic discounting, and loss aversion all monkey around with investor preferences. When a model says that investors have rational expectations, that they extrapolate, or that they seek sparsity, it is making a choice about the functional form of investors’ beliefs. The limits-to-arbitrage literature gives a taxonomy of investor constraints (short sale, margin, etc).

Researchers test whether an asset-pricing model’s specific choices for \mathrm{Utility}(w), \mathrm{pdf}(w|a), and \mathrm{Constraints}(a) fit the data by examining the model’s first-order conditions (FOCs). e.g., the stochastic discount factor (SDF) implied by an asset-pricing model comes from the FOC with respect to consumption. See Chapter 1 verse 1 of the Book of John. The function \mathrm{f}(x) = 1 - x^2 is an upside down parabola, which has a maximum value of \mathrm{f}(0)=1. You could double check that x=0 maximizes \mathrm{f}(x)=1-x^2 by verifying that \mathrm{f}'(0) = 0. Likewise, you can double check whether investors’ actions are optimal by verifying that

(2)   \begin{equation*} {\textstyle \frac{\mathrm{d}\phantom{a}}{\mathrm{d}a}} \big\{ \, \Exp[ \mathrm{Utility}(w) ] - \lambda \cdot \mathrm{Constraints}(a) \, \big\} = 0 \end{equation*}

If an asset-pricing model’s FOCs aren’t satisfied in the data, then the model’s choice of functional forms must be missing something about investor behavior. The new parameter \lambda in the equation above is a Lagrange multiplier, which captures how much a particular constraint distorts investors’ optimal choice of actions.

Here’s the asset-pricing research process in action. At time t, the literature contains a bunch of models and data on various empirical settings (different countries, time periods, asset classes, etc). Each asset-pricing model makes different assumptions about \mathrm{Utility}(w), \mathrm{pdf}(w|a), and \mathrm{Constraints}(a). But they all take the general form outlined in Equation (1). Asset-pricing researchers check whether the FOCs implied by each existing model hold in the various empirical settings observed at time t. If no model adequately explains market data in some important empirical setting, then we call it a new empirical fact. Researchers then work backwards from the non-zero values of the FOCs in Equation (2) to try and guess the correct functional forms for \mathrm{Utility}(w), \mathrm{pdf}(w|a), and \mathrm{Constraints}(a). If successful, we add a new asset-pricing model to the literature and the process repeats at time (t+1), perhaps with data on a few more empirical settings.

Ill-Posed Inverse Problem

The whole goal of the asset-pricing research process is to guess the correct functional forms for \mathrm{Utility}(w), \mathrm{pdf}(w|a), and \mathrm{Constraints}(a) to plug into Equation (1). Researchers do this by working backwards from violations of known models’ FOCs in Equation (2). If researchers guess correctly, they will find that the resulting model’s FOCs fit the observed market data in all empirical settings. But this excellent empirical fit is just a signature verifying that the model is correct. The point of guessing the true asset-pricing model is not to maximize empirical fit. With enough free parameters, any model can fit the data arbitrarily well.

Researchers care about guessing the true model (i.e., the optimization problem that investors are actually trying to solve) because the true model will allow them to predict how investors will behave in novel market environments that they haven’t encountered before. Put differently, knowing the correct functional forms for \mathrm{Utility}(w), \mathrm{pdf}(w|a), and \mathrm{Constraints}(a) allows researchers to evaluate counterfactual scenarios and do policy analysis. Knowing the correct asset-pricing model allows you to answer questions like, ‘What should the expected return of a large illiquid stock be?’, even if you have never encountered such a stock in the past.

This is an inverse problem. Asset-pricing research doesn’t involve finding the maximum of some known widely-agreed-upon optimization program. It involves staring at a bunch of data that is assumed to maximize some optimization problem and trying to figure out the details of that unknown problem. Put another way, asset-pricing researchers aren’t in the business of finding the maximum of a known function, \mathrm{f}(x) = 1 - x^2. They observe (x, \, y) = (0, \, 1) and assume this data is the maximum of some unknown function, \mathrm{f}(x) = y. Then they try to guess the details of this unknown function.

The study of inverse optimization problems represents an entire mathematical field of inquiry. But there’s one detail in particular that’s relevant here: inverse problems are typically ill-posed. Put simply, many different functions can share the same derivative values. Functions with quite different global behavior can have the same slope locally. e.g., \mathrm{f}(x) = 1 - x^2, \mathrm{g}(x) = e^{-x^2}, and \mathrm{h}(x) = \cos(\pi/2 \cdot x) all achieve a maximum value of 1 at x=0. These curves are all roughly the same when |x| < 1/2. Yet, they have very different behavior globally: \mathrm{f}(4) = -15, \mathrm{g}(4) \approx 0, and \mathrm{h}(4) = 1.

Neighborhood of x=0

Neighborhood of x=0

Global behavior

Global behavior

Asset-pricing researchers are solving an inverse problem. They are trying to reverse engineer an entire optimization problem by studying finite data about its first-order conditions. And inverse problems are typically ill-posed. This is why they consistently produce multiple explanations for the same facts. We should expect many different optimization problems to produce equivalent FOCs in the observed market data. Moreover, we should expect this to be true even though the optimization problems, which look equivalent locally, will generally display quite different global behavior in novel as-yet-unseen market environments. And these global differences are what matter when evaluating counterfactuals and doing policy analysis.

A Potential Solution

If asset-pricing researchers generate multiple explanations for the same fact because they’re solving an ill-posed inverse problem, then how might we make this problem well-posed? The mathematical literature on inverse optimization problems makes one suggestion: severely limit the class of functions you are willing to consider. If you see derivative data in the neighborhood of x=0 that was generated by either \mathrm{f}(x) = 1 - x^2, \mathrm{g}(x) = e^{-x^2}, or \mathrm{h}(x) = \cos(\pi/2 \cdot x), you must find some grounds for ruling out two of the three options. e.g., if you were only willing to consider polynomial solutions, then \mathrm{g}(x) = e^{-x^2} and \mathrm{h}(x) = \cos(\pi/2 \cdot x) would be off limits. Your only remaining choice would be \mathrm{f}(x) = 1 - x^2, making the inverse problem well-posed.

Unfortunately, this isn’t a particularly promising route for asset-pricing researchers. There aren’t good economic grounds for ruling out functional forms for \mathrm{Utility}(w), \mathrm{pdf}(w|a), and \mathrm{Constraints}(a) that yield similar FOCs in the observed data. In fact, some of the most cited papers in asset pricing involve dreaming new and exotic functional forms for these objects. e.g., think about models involving recursive preferences, which argue that investors’ utility takes the functional form:

(3)   \begin{equation*} \mathrm{Utility}_t(w_t) = \Big\{ \, (1 - \delta) \cdot w_t^{1 - 1/\psi} + \delta \cdot \big(\Exp_t[\mathrm{Utility}_{t+1}(w_{t+1})]^{1-\gamma}\big)^{\frac{1-1/\psi}{1-\gamma}} \, \Big\}^{\frac{1}{1 - 1/\psi}} \end{equation*}

There’s simply no way this heifer would have been accepted into the asset-pricing cannon if researchers had decided to severely restrict the kinds of utility functions they were willing to consider sometime back in the early 1980s. If you think the discovery of recursive utility was progress, then that’s a problem.

Luckily, this isn’t the only way forward. Asset-pricing models do more than just make predictions about which FOCs should hold in the observed market data. They also say why these FOCs should be satisfied: because investors are optimizing with a specific tradeoff in mind. There is economic content in the \max_a(\cdot) operator as well as in the FOCs this operator produces. An asset-pricing model doesn’t say that a set of FOCs should just happen to hold in the data. It says that these FOCs should hold because investors are optimizing the tradeoff embodied by the model’s choice of \mathrm{Utility}(w), \mathrm{pdf}(w|a), and \mathrm{Constraints}(a). So in most cases it should be possible to ask investors whether they are thinking about this tradeoff.

This represents a practical step we could take towards converting the asset-pricing research process into a well-posed inverse problem. It is a straightforward way of ruling out lots of spurious solutions. e.g., Cross-sectional differences in expected returns can’t be explained by investors demanding a risk premium for holding assets with exposure to X if real-world investors show no desire to hedge their exposure to X when given the chance. Likewise, if the excess trading puzzle is due to a widespread preference for gambling among retail investors, then it should be easy to find retail investors who express a preference for gambling. Yes, it is possible for markets to move in ways that no individual investor understands. But asset-pricing models don’t explain those kinds of market fluctuations. Asset-pricing models make predictions about how investors strategically respond to market fluctuations they do understand, so it should be possible to ask them about these fluctuations and the logic behind their responses.

Filed Under: Uncategorized

Why do ‘as if’ critiques only apply to survey evidence?

November 10, 2020 by Alex

Milton Friedman laid out his methodological approach to doing economics in his 1953 essay, The Methodology of Positive Economics. This essay gives his answer to the question: What constitutes a good economic model? Or, put differently, how would you recognize a good economic model if you saw one?

According to Friedman, “the only relevant test of the validity of a hypothesis is the comparison of its predictions with experience. The hypothesis is rejected if its predictions are contradicted; it is accepted if its predictions are not contradicted.” All that matters is whether or not a model fits the data. Assumptions? Priors? Intuition? All that stuff is just moonshine. Empirical fit reigns supreme. This is an extreme view!

For example, in Friedman’s eyes, a good model of how leaves are distributed about the canopy of an oak tree is a model in which each leaf optimally chooses its position and orientation relative to its neighbors. Yes, we know that leaves don’t have brains. They can’t actually make decisions like this. But it is ‘as if’ they could. So a model in which each leaf strategically chooses where to grow is a good model of leaf placement.

A good model of how an expert billiards player makes difficult shots would be a model in which “he knew the complicated mathematical formulas that would give the optimum directions of travel, could estimate accurately by eye the angles, could make lightning calculations from the formulas, and could then make the balls travel in the direction indicated by the formulas.” So what if the player can’t do these things? We know he regularly makes difficult shots, so it’s ‘as if’ he can. Friedman tells us to just model him like that anyway.

In Friedman’s view, “a theory cannot be tested by comparing its ‘assumptions’ directly with ‘reality.’ Indeed, there is no meaningful way in which this can be done.” In fact, Friedman argues that insisting on reasonable assumptions can be misleading. “The more significant the theory, the more unrealistic the assumptions.”

Every economist knows about Friedman’s ‘as if’ approach to model evaluation. If asked, most economists will say that Friedman’s methodological approach is, if not correct, then at least reasonable. They will argue that it’s at least important to consider ‘as if’ justifications when evaluating a model.

But here’s the thing: no working economist actually evaluates models this way! Aside from one glaring exception, no economist actually thinks ‘as if’ models are helpful. Ask yourself: Is the factor zoo a problem for asset pricing? Yes. But what is a spurious factor? It’s a factor that fits the data for wrong reasons. It is ‘as if’ investors were using it to price assets even though they aren’t. And that’s precisely the problem!

The idea that we can’t test (or shouldn’t even bother testing) the assumptions behind our economic models is simply preposterous. It’s a claim that Steven Pinker would call a “conventional absurdity: a statement that goes against all common sense but that everyone believes because they dimly recall having heard it somewhere and because it is so pregnant with implications.” No economist does research this way!

Why not replace all economic models with uninterpretable machine-learning (ML) algorithms? ML algorithms can fit the data well precisely because they contain no economic assumptions. But TANSTAAFL! It is precisely the economic assumptions about what agents are trying to do that give us confidence a model’s predictions will hold up when conditions change. In other words, these assumptions are what allow economists to use the model for counterfactual analysis—i.e., to make predictions in new and as-yet-unseen environments. The right assumptions embedded in a good economic model are responsible for its robust predictions. If you’re going to ignore all such economic restrictions, then there’s no point in writing down an economic model in the first place. There are better ways to do pure prediction.

I’m by no means the first person to highlight these issues. They long predate the factor zoo and the popularity of ML algorithms. If I had to pick one person to judge the quality of an economic model, that person would be Paul Samuelson. And Samuelson strongly disagreed with Friedman’s ‘as if’ approach. Samuelson clearly recognized the importance of evaluating your assumptions, disparagingly referring to Friedman’s ‘as if’ methodology as the “F-Twist” in a 1963 discussion paper.

Moreover, in almost every context, economists approach research in a manner more consist with Samuelson than with Friedman. They firmly believe it’s important to verify one’s assumptions. This is why we see papers with titles like Do Measures of Financial Constraints Measure Financial Constraints? getting hundreds of cites a year. This influential paper is entirely concerned with testing our working assumptions.

As far as I can tell, there is only one context in which economists actually use ‘as if’ reasoning to constrain the research process—namely, when interpreting survey data. Standard asset-pricing models assume investors are solving an optimization problem that looks something like

(1)   \begin{equation*} \begin{array}{rl} \text{maximize} & \Exp\left[ \, \sum_{t=0}^\infty \, \beta^t \cdot U_t \, \right] \\ \text{subject to} & \quad\,\;\; U_t = \mathrm{u}(C_t) \\ & \Delta W_{t+1} = \mathrm{f}(C_t, \, X_t; W_t) \\ & \qquad\,\, 0 \leq \mathrm{g}_n(C_t, \, X_t; W_t) \qquad \text{constraints } n=1, \ldots,\,N \end{array} \end{equation*}

Economists regularly test assumptions about investor preferences U = \mathrm{u}(C), the law of motion for wealth, \Delta W = \mathrm{f}(C, \, X; W), and various other kinds of economic constraints, 0 \leq \mathrm{g}_n(C, \, X; W). However, for some reason, it’s entirely taboo to ask investors whether they are actually trying to this problem in the first place.

Friedman directly calls out survey data in his 1953 essay, writing that “questionnaire studies of businessmen’s or others’ motives or beliefs about the forces affecting their behavior… seem to me almost entirely useless as a means of testing the validity of economic hypotheses.” However, he offers no concrete reasons why economists should think about the “maximize” part of investors’ optimization problem any differently than the “subject to” part. Both are assumptions. In Friedman’s eyes, both are untestable.

Yes, survey data can be misleading. Above I describe a situation where surveying economists about their views on ‘as if’ reasoning would yield specious evidence. But all data can be misleading. It’s not like NOT using survey data has resolved the factor zoo. Sometimes investors give uninformative answers which might lead researchers down the wrong path. But this doesn’t mean that we can’t learn anything concrete about how investors price assets from a well-constructed survey. Not every regression result is informative. Some regression estimates can even be misleading. None of this implies that regression analysis is worthless.

Friedman’s 1953 essay outlines a bad approach to model evaluation. There’s more to a good model than R^2 = 100\%. Paul Samuelson knew this to be true. And, except when they’re looking at survey data, every other economist knows it to be true as well. There’s no reason for us to continue applying ‘as if’ reasoning only in this particular context. It’s just not a valid argument for dismissing survey evidence about a model.

Filed Under: Uncategorized

« Previous Page
Next Page »

Pages

  • Publications
  • Working Papers
  • Curriculum Vitae
  • Notebook
  • Courses

Copyright © 2025 · eleven40 Pro Theme on Genesis Framework · WordPress · Log in

 

Loading Comments...
 

You must be logged in to post a comment.