Randomized Market Trials

1. Motivation

How much can traders learn from past price signals? It depends on what kind of assets sell. Suppose that returns are (in part) a function of $K = \Vert {\boldsymbol \alpha} \Vert_{\ell_0}$ different feature-specific shocks:

(1) $\begin{align*} r_n &= \sum_{q=1}^Q \alpha_q \cdot x_{n,q} + \epsilon_n \qquad \text{with} \qquad \epsilon_n \overset{\scriptscriptstyle \mathrm{iid}}{\sim} \mathrm{N}(0,\sigma_{\epsilon}^2) \end{align*}$

If ${\boldsymbol \alpha}$ is identifiable, then different values of ${\boldsymbol \alpha}$ have to produce different values of $r_n$ . This is only the case if assets are sufficiently different from one another. e.g., consider the analogy to randomized control trials. In an RCT, randomizing which subjects get thrown in the treatment and control groups makes it exceptionally unlikely that, say, all the people in the treatment group will by chance happen to all have some other common trait that actually explains their outcomes. Similarly, randomizing which assets get sold makes makes it exceptionally unlikely that $2$ different choices of ${\boldsymbol \alpha}$ and ${\boldsymbol \alpha}'$ can explain the observed returns.

This post sketches a quick model relating this problem to housing prices. To illustrate, imagine $N = 4$ houses have sold at a discount in a neighborhood that looks like this:

The shock might reflect a structural change in the vacation home market whereby there is less disposable income to buy high end units—i.e., a permanent shift. Alternatively, the shock might have been due to a couple of out-of-town second house buyers needing to sell quickly—i.e., a transient effect. The houses in the picture above are all vacation homes of a similar quality with owners living in LA. Since there is so little variation across units, both these explanations are observationally equivalent. Thus, the asset composition affects how informative prices are in an important way. The main empirical prediction is that in places with less variation in housing amenities, there should be more price momentum since it’s harder to distinguish between noise and amenity-specific value shocks.

2. Toy Model

Suppose you’ve seen $N$ sales in the area. Most of the prices looked just about right, but some of the houses sold for a bit more than you would have expected and some sold for a bit less than you would have expected. You’re trying to decide whether or not to buy the $(N+1)$ th house if the transaction costs are $\mathdollar c$ today:

(2) $\begin{align*} U &= \max_{\{\text{Buy},\text{Don't}\}} \left\{ \, \mathrm{E}\left[ r_{N+1} \right] - \frac{\gamma}{2} \cdot \mathrm{Var}\left[ r_{N+1} \right] - c, \, 0 \, \right\} \end{align*}$

You will buy the house if your risk adjusted expectation of its future returns exceeds the transaction costs, $\mathrm{E}[r_{N+1}] - \sfrac{\gamma}{2} \cdot \mathrm{Var}[r_{N+1}] \geq c$ .

This problem hinges on your ability to estimate ${\boldsymbol \alpha}$ . What’s the best you could ever hope to do? Well, suppose you knew which $K$ features mattered ahead of time and the elements of $\mathbf{X}$ were given by $x_{n,q} \overset{\scriptscriptstyle \mathrm{iid}}{\sim} \mathrm{N}(0,\sfrac{1}{K})$ . In this setting, your average estimation error per relevant feature is given by:

(3) $\begin{align*} \Omega^\star = \mathrm{E}\left[ \, \frac{1}{K} \cdot \sum_{q=1}^Q \left( \widehat{\alpha}_q - \alpha_q \right)^2 \, \right] &= \frac{K \cdot \sigma_{\epsilon}^2}{N} \end{align*}$

i.e., it’s as if you ran an OLS regression of the $N$ price changes on the $K$ relevant columns of $\mathbf{X}$ . You will buy the house if:

(4) $\begin{align*} \mathbf{x}_{N+1}^{\top} \widehat{\boldsymbol \alpha} - \frac{\gamma}{2} \cdot \left( \frac{K + N}{N} \right) \cdot \sigma_{\epsilon}^2 &\geq c \end{align*}$

In the real world, however, you generally don’t know which $K$ features are important ahead of time and each house’s amenities are not taken as an iid draw. Instead, you must solve $\ell_1$ -type inference problem:

(5) $\begin{align*} \widehat{\boldsymbol \alpha} &= \arg \min_{\boldsymbol \alpha} \sum_{n=1}^N \left( r_n - \mathbf{x}_n^{\top} {\boldsymbol \alpha} \right)^2 \qquad \text{s.t.} \qquad \left\Vert {\boldsymbol \alpha} \right\Vert_{\ell_1} \leq \lambda \cdot \sigma_{\epsilon} \end{align*}$

with a correlated measurement matrix, $\mathbf{X}$ , using something like LASSO. In this setting, you face feature selection risk. i.e., you might focus on the wrong causal explanation for the past price movements. If $\Omega^{\perp}$ denotes your estimation error when each of the elements $x_{n,q}$ are drawn independently and $\Omega$ denotes your estimation error in the general case when $\rho(x_{n,q},x_{n',q}) \neq 0$ , then:

(6) $\begin{align*} \Omega^{\star} \leq \Omega^{\perp} \leq \Omega \end{align*}$

Since your estimate of $\widehat{\boldsymbol \alpha}$ is unbiased, feature selection risk will simply increase $\mathrm{Var}[r_{N+1}]$ making it less likely that you will buy the house in this stylized model:

(7) $\begin{align*} \mathbf{x}_{N+1}^{\top} \widehat{\boldsymbol \alpha} - \frac{\gamma}{2} \cdot \left( K \cdot \Omega + \sigma_{\epsilon}^2 \right) &\geq c \end{align*}$

More generally, it will make prices slower to respond to shocks and allow for momentum.

3. Matrix Coherence

Feature selection risk is worst when assets all have really correlated features. Let $\mathbf{X}$ denote the $(N \times Q)$ -dimensional measurement matrix containing all the features of the $N$ houses that have already sold in the market:

(8) $\begin{align*} \mathbf{X} &= \begin{bmatrix} x_{1,1} & x_{1,2} & \cdots & x_{1,Q} \\ x_{2,1} & x_{2,2} & \cdots & x_{2,Q} \\ \vdots & \vdots & \ddots & \vdots \\ x_{N,1} & x_{N,2} & \cdots & x_{N,Q} \\ \end{bmatrix} \end{align*}$

Each row represents all of the features of the $n$ th house, and each column represents the level to which the $N$ assets display a single feature. Let $\widetilde{\mathbf{x}}_q$ denote a unit-normed column from this measurement matrix:

(9) $\begin{align*} \widetilde{\mathbf{x}}_q &= \frac{\mathbf{x}_q}{\sqrt{\sum_{n=1}^N x_{n,q}^2}} \end{align*}$

I use a measure of the coherence of $\mathbf{X}$ to quantify the extent to which all of the assets in a market have similar features.

(10) $\begin{align*} \mu(\mathbf{X}) &= \max_{q \neq q'} \left\vert \left\langle \widetilde{\mathbf{x}}_q, \widetilde{\mathbf{x}}_{q'} \right\rangle \right\vert \end{align*}$

e.g., the coherence of a matrix with $x_{n,q} \overset{\scriptscriptstyle \mathrm{iid}}{\sim} \mathrm{N}(0,\sfrac{1}{N})$ is roughly $\sqrt{2 \cdot \log(Q)/N}$ corresponding to the red line in the figure below. As the correlation between elements in the same column increases, the coherence increases since different terms in the above cross-product are less likely to cancel out.

4. Selection Risk

There is a tight link between the severity of the selection risk and how correlated asset features are. Specifically, Ben-Haim, Eldar, and Elad (2010) show that if

(11) $\begin{align*} \alpha_{\min} \cdot \left( 1 - \{2 \cdot K - 1\} \cdot \mu(\mathbf{X}) \right) &\geq 2 \cdot \sigma_{\epsilon} \cdot \sqrt{2 \cdot (1 + \xi) \cdot \log(Q)} \end{align*}$

for some $\xi > 0$ , then:

(12) $\begin{align*} \sum_{q=1}^Q \left( \widehat{\alpha}_q - \alpha_q \right)^2 &\leq \frac{2 \cdot (1 + \xi)}{(1 - (K-1)\cdot \mu(\mathbf{X}))^2} \times K \cdot \sigma_{\epsilon}^2 \cdot \log(Q) = \Omega \end{align*}$

with probability at least:

(13) $\begin{align*} 1 - Q^{-\xi} \cdot \left( \, \pi \cdot (1 + \xi) \cdot \log(Q) \, \right)^{-\sfrac{1}{2}} \end{align*}$

where $\alpha_{\min} = |\arg \min_{q \in \mathcal{K}} \alpha_q|$ . Let’s plug in some numbers. If $\alpha_{\min} = 0.10$ and $\sigma_{\epsilon} = 0.05$ , then the result means that $\Vert \widehat{\boldsymbol \alpha} - {\boldsymbol \alpha} \Vert_{\ell_2}^2$ is less than $0.185 \times K \cdot \log(Q)$ with probability $\sfrac{3}{4}$ .

There are a couple of things worth pointing out here. First, the recovery bounds only hold when $\mathbf{X}$ is sufficiently incoherent:

(14) $\begin{align*} \mu(\mathbf{X}) < \frac{1}{2 \cdot K - 1} \end{align*}$

i.e., when the assets are too similar, we can’t learn anything concrete about which amenity-specific shocks are driving the returns. Second, the free parameter $\xi > 0$ links the probability of seeing an error rate outside the bounds, $p$ , to the number of amenities that houses have:

(15) $\begin{align*} \xi &\approx \frac{\log(\sfrac{1}{p}) - \frac{1}{2} \cdot \log\left[ \pi \cdot \log Q \right]}{\sfrac{1}{2} + \log(Q)} \end{align*}$

If you want to lower this probability, you need to either use a larger constant or decrease the number of amenities. For $\xi$ large enough we can effectively regard the error bounds as the variance. Importantly, this quantity is increasing in the coherence of the measurement matrix. i.e., when assets are more similar, I am less sure that I am drawing the correct conclusion from past returns.

5. Empirical Predictions

The main empirical prediction is that in places with less variation in housing amenities, there should be more price momentum since it’s harder to distinguish between noise and amenity-specific value shocks. e.g., imagine studying the price paths of $2$ neighborhoods, $A$ and $B$ , which have houses of the exact same value, $\mathdollar v$ . In neighborhood $A$ , each of the houses has a very different collection of amenities whose values sum to $\mathdollar v$ ; whereas, in neighborhood $B$ , each of the houses has the exact same amenities whose values sum to $\mathdollar v$ . e.g., you can think about neighborhood $A$ as pre-war and neighborhood $B$ as tract housing. The theory says that the price of houses in the neighborhood $B$ should respond slower to amenity-specific value shocks because houses have more correlated amenities—i.e., $\Omega$ is larger. As a result, home prices in neighborhood $B$ should also display more momentum… though this is not in the toy model above.