Uncategorized – Page 19 – Research Notebook

Correct Prices Are Not Free

July 16, 2011 by Alex

1. Introduction

It takes hard work to maintain prices at their fundamental values. Accurate, responsive and informative prices do not occur by magic. Analysts have to diligently monitor firms prospects and security prices. Market makers have to sustain a trustworthy and relatively liquid trading environment. Firms have to issue and honor publicly traded shares.

Markets with publicly posted prices¹ are a form of institutional capital that needs to be operated and maintained just like any other form of capital. Given these properties, what is the optimal amount of this institutional capital to use in our production function?

2. Nickle and Dimed

A common mis-perception is that, as technology improves and the number of market participants increases, the costs of maintaining markets drops to a negligible level. For example, Eugene Fama and Ken French write on their blog² that:

“If some informed active investors turn passive, prices tend to become less efficient. But the effect can be small if there is sufficient competition among remaining informed active investors. The answer also depends on the costs of uncovering and evaluating relevant knowable information. If the costs are low, then not much active investing is needed to get efficient prices.” — Eugene Fama and Ken French

However, even if it only takes each analyst a couple of seconds to look at his portfolio every day, due to the size of modern financial markets this effort will still add up to a meaningful total. Consider the example below that illustrates this point:

Example (Google’s Pac-Man Homage): Analysts at the software firm Rescue Time studied the browsing habits and Google usage of roughly $11K$ users in May, 2010. The firm makes time tracking software that keeps an eye on what workers do and where they go online.

On a typical day, people in the sample conducted roughly $22$ searches on the Google page. Each one of these searches lasts about $11$ seconds. Putting Pac-Man on the Google homepage increased the average time spent on the page by an average of about $36$ seconds.³

Extrapolating this up across the $504$ million unique users who visit the main Google page day-to-day, this represents an increase of $4.8$ million hours – equal to about $549$ years! In dollar terms, assuming people are paid roughly $\$25/\mathtt{hr}$ , this equates to about $\$120M$ in lost productivity.

Screen shot of the playable version of Pac-Man posted on Google's front page on 5/21/10 to celebrate 30 years since the launch of Pac-Man in Japan.

3. The Brain Drain

There are a few papers like Abel, Eberly and Panageas (2007) which address this problem of optimal inattention, but I do not know of any papers which explicitly examine the welfare effects of too much or too little attention to asset markets.

To my knowledge, the closest paper to this line of analysis is Philippon (2007) which posits a simple general equilibrium model to digest the welfare effects of the recent growth in the output of US financial sector. The plot below shows that by 2006, the financial sector was generating roughly $8\%$ of US GDP. However, this study focuses on expenditures in the corporate finance side of the financial industry.

Financial industry fraction of US GDP. Source: Philippon (2007). Estimation based on U.S. Annual Industry Accounts, Kuznet (1941), Martin (1939), U.S. Census, and Historical Statistics of the U.S. (2006).

I am interested in a slightly different question. i.e., how much time should we worry about the markets? To see the importance of this trade off consider the example below. Lots of very smart physicists have left academic and industrial engineering positions to work on Wall Street over the course of the last $20$ years⁴:

Example (Physics Brain Drain): Suppose that all these skilled analysts were making market more efficient by pinning asset prices closer to their “correct” values. How much better off are we due to this marginal improvement in asset pricing accuracy? How much more productive would our world be if these people had foregone their finance careers and focused on the material physics behind computer hardware or the complex programming problems that underpin parallel computing?

4. Why Do Public Markets Persist?

In spite of their operating and maintenance costs, markets with publicly posted prices remain a common method of allocating physical goods, control rights and risk. For instance, the total value of U.S. equity markets exceeds the total U.S. GDP by a factor of at least $1.5$ as shown below⁵:

Total market capitalization of the NYSE-NASDAQ as a % of U.S. GDP.

There must be some positive externality to having fairly accurate and publicly displayed asset prices, otherwise these types of exchanges wouldn’t be so popular.⁶ I think that this insight that accurate, public prices must confer a positive externality is the key to understanding how making asset prices more accurate will improve welfare.

Alternative methods include procedures such as over-the-counter exchanges or dark pools. ↩
This post relates to their 2007 paper: Disagreement, Tastes and Asset Prices. ↩
These figures will form the basis of a back of the envelope calculation which may be biased either up or down. On one hand, the figure could over estimate the number of Google searches done per day as the software is voluntarily used by technically savvy employees. On the other hand, the estimates could be biased downward as only a minority of users realised that the logo was playable. To play, people had to click on the “insert coin” button which replaced the more familiar “I’m Feeling Lucky” button. ↩
See Steve Hsu’s (2010) review of The Quants for Physics World. ↩
Source: The Big Picture. ↩
…even in prison! ↩

Dyson Brownian Motion

July 15, 2011 by Alex

1. Introduction

I outline the construction of Dyson Brownian motion which governs the evolution of the eigen-values of an $(N \times N)$ -dimensional stochastic process of Hermitian matrices. For instance, if $A(t)$ is such a process, then:

(1) $\begin{align*} A(t+dt) \ &= \ A(t) \ + \ (dt)^{1/2} \cdot G, \end{align*}$

where $G$ in an $(N \times N)$ -dimensional random Hermitian matrix drawn from the Gaussian Unitary Ensemble ( $\mathtt{GUE}_N$ ).

Why study the eigen-values of a stream of Hermitian matrices? At first glance, this seems like a rather obscure mathematical object. Before I can answer this question, in Section 2 I first define what a Hermitian matrix is and discuss how I would select this random matrix $G$ . Then, in Section 3 I can give some practical examples in which Dyson Brownian motion would be a useful construction. I also give an alternative interpretation of Dyson Brownian motion related to non-intersecting Brownian processes and explain the use of complex matrices in an economic context. Finally, in Section 4, I construct Dyson Brownian motion.

The main source for the material in this post is Terry Tao‘s set of lecture notes on Random Matrix Theory, though I also used Mehta (2004) and Anderson, Guinnet and Zeitouni (2009) as references.

2. Mathematical Foundation

First things first: “What is a Hermitian matrix?”

Definition (Hermitian Matrix): A square $N \times N$ matrix $A$ is called Hermitian if it is self-adjoint:

(2) $\begin{align*} A \ &= \ A^* \end{align*}$

Each element $(i,j)$ of a Hermitian matrix $A$ is the complex conjugate of element $(j,i)$ in $A$ . Thus, the diagonal elements have to be real. Consider an example Hermitian matrix $A'$ :

(3) $\begin{align*} A' \ &= \ \begin{bmatrix} 1 & - i \\ i & 2 \end{bmatrix} \end{align*}$

Hermitian matrices are just the complex extension of real symmetric matrices. For instance, the matrix $A''$ below is a real instance of a Hermitian matrix:

(4) $\begin{align*} A'' \ &= \ \begin{bmatrix} 1 & 0 \\ 0 & 2 \end{bmatrix} \end{align*}$

Next, when I defined the stochastic process $A(t)$ above, I characterized each lurch forward by the addition of a random matrix $G$ scaled by the square root of the time interval. I pull this random matrix from the Gaussian Unitary Ensemble.

Definition (Gaussian Unitary Ensemble): The Gaussian Unitary Ensemble $\mathtt{GUE}(N)$ is a probability space over the vector space of $N \times N$ dimensional Hermitian matrices governed by the measure $\mu$ defined as:

(5) $\begin{align*} \mu(A) \ &= \ \frac{1}{2^{\frac{N}{2}}\cdot \pi^{\frac{N^2}{2}}} \cdot e^{-\frac{N}{2} \cdot \mathtt{Tr}(A^2)} \end{align*}$

So, each upper triangular element $a_{i,j}$ will be drawn from $\mathcal{N}(0,1)_{\mathbb{C}}$ while each element $a_{i,i}$ along the diagonal will be drawn from $\mathcal{N}(0,1)_{\mathbb{R}}$ . To get a feel for what this definition really means, consider a concrete example. Suppose that we want to know the probability $\mu(A')$ for the example $A'$ above. Using these paramters, I find that:

(6) $\begin{align*} \begin{split} (A')^2 \ &= \ \begin{bmatrix} 2 & 3 \cdot i \\ - 3 \cdot i & 5 \end{bmatrix} \\ \mathtt{Tr}[(A')^2] \ &= \ 2 \ + \ 5 \ = \ 7 \\ \mu(A') \ &= \ \frac{1}{2^{\frac{2}{2}} \cdot \pi^{\frac{2^2}{2}}} \cdot e^{-\frac{2}{2} \cdot 7} \\ &= \ \frac{1}{2 \cdot \pi^2} \cdot e^{-7} \\ &= \ 0.0000462 \end{split} \end{align*}$

The analysis below will follow through if you consider only adding random matrices $G$ drawn from an ensemble of real symmetric matrices. This ensemble is known as the Gaussian Orthogonal Ensemble.

Note that each of the elements of the matrices $A(t)$ do not follow their own independent Brownian motion processes. In a loose sense, this would be “too little” structure. The main thrust of the Dyson Brownian motion construction is that the eigen-values of this process follow a Brownian motion plus a twist term. The eigen-values are an attractive summary statistic for $2$ reasons. First, we know that they have a simple spectrum due to the fact that at each point in time $A(t)$ is a Hermitian matrix. What’s more, the Ky Fan inequality tells us that the eigen-values should have a smooth transition function over time.

3. Motivating Examples

With these terms in hand, I can now ask: “Why worry about matrix valued stochastic processes?”

First, consider some financial applications. Financial theory is built around the $\beta$ -pricing models which measure the correlation between the returns of assets and various risk factors. Models which assume that these $\beta$ measures remain constant for long periods of time do poorly in empirical tests.¹ It would be nice to characterize the evolution of these variance-covariance matrices.² Alternatively, suppose that you were in the business of identifying the principle components of stock returns–either explicitly or by hunting for additional factors.³ Here, you might want to check to work out how likely it is that the largest principle component has moved by $d\lambda$ in order to test your model.

In an entirely different context, Dyson Brownian motion can also be thought of as characterizing the evolution of $N$ Brownian motions $\begin{bmatrix} \lambda_1(t) & \ldots & \lambda_N(t) \end{bmatrix}$ that have been restricted to never intersect. The problem of modelling the eigen-values of Hermitian matrices and non-intersecting Brownian processes are not link a priori. However, constructing these non-intersecting processes is hard and an elegant solution emerges via solving the eigen-value process problem for Hermitian matrices which have a simple spectrum. For an example of the economic usefulness of such a trick, conside modelling the real option of a worker to switch jobs.⁴ At each point in time, he has a next best option but the exact nature of that next best option will change over time. Rather than keeping track of all possibilities, you could just model the evolution of the best option via Dyson Brownian motion.

Finally, I want to make a quick note about the use of complex valued rather than real matrices. Physicists declare that real symmetric matrices preserve time reversal symmetry while complex Hermitian matrices do not. For instance, in the financial application above where each of the entries in the variance-covariance matrix process have to be real, you can always undo the last step of the stochastic process by hitting $A(t+dt)$ by a well chosen inverse. However, when using complex valued matrices, this inverse is no longer possible as complex numbers are periodic; i.e.,

(7) $\begin{align*} \begin{split} i \ &= \ \sqrt{-1} \\ i^2 \ &= \ -1 \\ i^3 \ &= \ - i \\ i^4 \ &= \ 1 \end{split} \end{align*}$

To see the implications of this fact in a macroeconomic setting, consider a complex valued extension of a Leontif production model as follows. Suppose that prices are local⁵ and form the real part of each $a_{i,j}$ entry while the magnitude of the transaction forms the complex part. So, for instance, a transaction of $3$ tons of steel between a builder $i$ and a steel maker $j$ at a price of $5$ dollars per ton would manifest itself as $a_{i,j} = 5 + 3 \cdot i$ and $\bar{a}_{i,j} = a_{j,i} = 5 - 3 \cdot i$ . Thus, by introducing an additional dimension to the Leontif matrix, the physical properties of the process it represents changes dramatically.

4. Construction

Finally, I actually get around to defining Brownian motion. Below, I state the result:

Theorem (Dyson Brownian Motion): Let $t > 0$ , $dt > 0$ and $\begin{bmatrix} \lambda_1(t) & \ldots & \lambda_N(t) \end{bmatrix}$ be the spectrum of eigen-values of the $N \times N$ Hermitian matrix valued process $\{A(t)\}_{t \geq 0}$ . Then, we have:

(8) $\begin{align*} d \lambda_i(t) \ &= \ d B_i(t) \ + \ \sum_{1 \leq j \leq N: j \neq i} \ \frac{d t}{\lambda_i(t) - \lambda_j(t)} \end{align*}$

for all $1 \leq i \leq N$ , where $d \lambda_i(t) = \lambda_i(t + dt) - \lambda_i(t)$ and $\begin{bmatrix} dB_1 & \ldots & dB_N \end{bmatrix}$ are independent Brownian motion processes.

In words, this theorem says that the eigen-values of a stochastic process of Hermitian matrices behave like independent Brownian motions plus a repulsion force which is inversely proportional to the distances between any $2$ eigen-values. What’s more, this repulsive force is not-localized. Each eigen-value $\lambda_i(t)$ is pushed an pulled by $\lambda_{i-1}(t)$ and $\lambda_{i+1}(t)$ but also $\lambda_1(t)$ and $\lambda_N(t)$ as well as each and every eigen-level in between.

To formulate the construction, I use a Lemma from Hadamard given below:

Lemma (Hadamard Operator): The eigen-values of $A$ have the following first and second derivatives with respect to time $t$ :

(9) $\begin{align*} \dot{\lambda}_i \ &= \ u_i^* \ \dot{A} \ u_i \\ \ddot{\lambda}_i \ &= \ u^*_i \ \ddot{A} \ u_i \ + \ 2 \cdot \sum_{j \neq k} \ \frac{\left\vert u_i^* \ \dot{A} \ u_i \right\vert^2}{\lambda_j - \lambda_k} \end{align*}$

Proof (Hadamard Operator):

(10) $\begin{align*} \begin{split} A \ u_i \ &= \lambda_i \ u_i \\ u_i^* \ u_i \ &= \ 1 \end{split} \end{align*}$

(11) $\begin{align*} \begin{split} \dot{A} \ u_i \ + \ A \ \dot{u}_i \ &= \dot{\lambda}_i \cdot u_i \ + \ \lambda_i \cdot \dot{u}_i \\ \dot{u}_i^* \ u_i \ + \ u_i^* \ \dot{u}_i \ &= \ 0 \end{split} \end{align*}$

(12) $\begin{align*} \dot{\lambda}_i \ &= \ u_i^* \ \dot{A} \ u_i \end{align*}$

(13) $\begin{align*} \begin{split} \ddot{\lambda}_i \ &= \ \dot{u}^*_i \ \dot{A} \ u_i \ + \ u^*_i \ \ddot{A} \ u_i \ + \ u^*_i \ \dot{A} \ \dot{u}_i \\ 0 \ &= \ \dot{u}_j^* \ \dot{A} \ u_i \ + \ (\lambda_j - \lambda_i) \cdot u_j^* \ \dot{u}_i \end{split} \end{align*}$

Finally, I use this lemma together with the properties of Hermitian matrices to finish the construction of Dyson Brownian motion.

Proof (Dyson Brownian Motion):

(14) $\begin{align*} A(t + dt) \ &= \ A(t) \ + \ (dt)^{1/2} \cdot G \end{align*}$

(15) $\begin{align*} \lambda_i(t + dt) \ &= \ \lambda_i(t) \ + \ (dt)^{1/2} \cdot \nabla_G \lambda_i(t) \ + \ \frac{dt}{2} \cdot \nabla_G^2 \lambda_i(t) \ + \ \ldots \end{align*}$

(16) $\begin{align*} \begin{split} \nabla_G \lambda_i(t) \ &= \ u_i^* \ G \ u_i \\ \nabla_G^2 \lambda_i(t) \ &= \ 2 \cdot \sum_{i \neq j} \ \frac{\left\vert u_j^* \ G \ u_i \right\vert^2}{\lambda_i(t) - \lambda_j(t)} \end{split} \end{align*}$

(17) $\begin{align*} d \lambda_i(t) \ &= \ (dt)^{1/2} \cdot \left\{ \ u_i^* \ G \ u_i \ \right\} \ + \ dt \cdot \left\{ \ \sum_{i \neq j} \ \frac{\left\vert u_j^* \ G \ u_i \right\vert^2}{\lambda_i(t) - \lambda_j(t)} \ \right\} \end{align*}$

(18) $\begin{align*} d \lambda_i(t) \ &= \ (dt)^{1/2} \cdot \varepsilon_{i,i} \ + \ dt \cdot \left\{ \ \sum_{i \neq j} \ \frac{\left\vert \varepsilon_{i,j} \right\vert^2}{\lambda_i(t) - \lambda_j(t)} \ \right\} \end{align*}$

Business Cycle Patterns

July 15, 2011 by Alex

The chart below contains macroeconomic and financial data on the US economy from 1775 to 1943. The chart is from the St. Louis Federal Reserve Fraser Archives. I’ve run across this chart several times over the course of the past year and I find new and interesting trends each time I look at it. The only thing I find a bit annoying about the chart is its size: it is hard to look at such a wide image online as an embedded PNG. To solve this problem, I’ve uploaded the image to Zoom.it and posted the resulting zommable image here.

For me, the most striking part about this plot is the correlation between spikes in commodity prices and the start of wars. Most economics models of rare disasters focus on the stock and bond pricing implications of rare drops in GDP due to wars and depressions. However, this infographic shows that commodity prices for goods like energy, housing, oil and metals are the most dramatically impacted by these rare events.

Be sure to expand the image to full screen mode if you want a more detailed look.

Using the Front Door Criterion

July 14, 2011 by Alex

1. Introduction¹

I show how to use the front door criterion rather than an instrumental variables approach to identify causal effects in non-experimental settings.

Motivation

Every econometrician is familiar with the experimental ideal²: in order to test a hypothesis a scientist should collect a large group of identical subjects, split them into $2$ groups, administer a treatment to only $1$ of the groups, and then quantify the difference in outcomes between the groups. For instance, consider an application of this experimental design to test the hypothesis that speculative traders destabilize asset prices:

Example (Price Impact of Speculative Trading): Does speculative trading destabilize prices? To answer this question in an idealized world, I would execute the following experiment. First, I would find $1000$ identical asset markets for which I knew the true value of the asset. Then, in $500$ of these markets, I would air-drop speculative traders into the market. This half of the markets would be the treatment group, while the remaining half of the markets would be the control group.

After allowing the speculators to trade, I would then compare the average volatility of the asset prices centered around their true values in each of the groups. If the asset prices in the treatment group were measurably farther from their fundamental value, I would conclude that the introduction of speculators into an asset market causes mispricing.

Although this is a nice benchmark, econometricians generally don’t have the luxury of executing this idealized experiment design due to physical, financial or ethical constraints.³ The setting above often breaks down leaving the analysts to account for confounding effects like dissimilar treatment and control groups.

For instance, if the treatment and control groups are observably different, an econometrician can include control variables in a regression framework in order to account for this variation. If the treatment and control groups are unobservably different, i.e. there is endogenous treatment selection, the problem is more challenging. Here, the standard approach in the economics literature has been to use an instrumental variables approach whereby, as an econometrician, I would look for an instrumental variable which co-varies with treatment assignment but not with any confounding effects.

In this post, I provide an alternative route to identifying a causal effect in the presence of unobservable variation in the treatment and control groups called the front door criterion. This approach was introduced by Judea Pearl in the mid-1990’s⁴. Rather than focusing on exogenous variation in treatment selection, this approach exploits exogenous variation in the strength of the treatment. Thus, even if agents endogenously selected their broad treatment groups, if there is exogenous within group variation in how intensely each agent was treated, we can use this variation to identify the causal effect of treatment.

Outline

I proceed as follows. First, to build intuition for the approach, in the next section I give $2$ examples of how the front door criterion has been used in the economics literature without being explicitly named. I also use these examples to formally define the causal inference problem facing the econometrician in the language of directed graphs outlined in Pearl (2000).

Then, in section 3 I define the front door criterion approach to causal inference explicitly and show how it applies to the problem of identifying the price impact of introducing speculative traders to a market. I also illustrate how to implement this identification strategy using an OLS regression framework

2. Terminology and Examples

In this section I illustrate how the front door criterion is fundamentally different from the instrumental variables approach and also introduce some additional terminology in the context of $2$ examples.

An Introductory Example

The front door criterion has been used without a name in the economics literature since at least the early 1990’s in the form of Blanchard, Katz, Hall and Eichengreen (1992)‘s work on macro-laboreconomics. Cohen and Malloy (2010) execute one of the cleanest quasi-experiments using this approach. These authors are in interested in the effect of social ties on congressional voting outcomes. In the example below, I outline their experimental design and discuss their results:

Example (Social Ties and Congressional Voting): Do social ties between U.S. senators affect their voting behavior? For example, when a senator from another state has an important bill to pass, are congressmen who attended the same college more likely to vote in favor of the bill? Cohen and Malloy (2010) find this exact result.

More precisely, that authors find that the $\%$ of senators in congressman $i$ ‘s alumni network that vote for a given bill predicts congressman $i$ ‘s voting behavior on the bill. So, for example, consider the voting behavior of senator $i$ who attended Harvard. The result reads that senator $i$ is more likely to vote yes on the bill when $90\%$ of the other congressmen who have a tie to Harvard vote “yes” on a bill relative to when only $10\%$ do.

What’s more, this social network effect on congressman $i$ ‘s voting decisions is increasing in the strength of network. So, for instance, senators that attended the same school at the same time have more correlated votes than senators that just went to the same school. Finally, the results seem to be robust to school, ideology, time period and senator fixed effects.

The trouble with interpreting the results above is that school choice is not an exogenous variable. For instance, students who chose to go to UC Berkeley in the 1960’s were very different that those that chose to go to the University of Alabama during the same period. It could be that some omitted variable related to each senator’s upbringing is driving both school choice and voting decisions. In order to solve this problem, Cohen and Malloy (2010) use the front door criterion by exploiting the fact that social ties between congressmen work through discussion on the floor of the senate chamber (shown below).

Example (Social Ties and Congressional Voting, Ctd…): In order to get a few extra votes for an important bill, a congressmen might turn to the people sitting just to his left and right and try to convince them to vote his way. This coercion becomes increasingly hard the longer a senator must travel to get in a few quiet words with his colleague. Importantly, the seating is strictly assigned with senior senators getting the best seats and rookie senators getting whatever seats are left over. Thus, the seating of rookie senators is randomized.

Cohen and Malloy (2010) find that within the group of rookie senators the congressmen who were randomly assigned a seat which was closer to school mates had more correlated voting outcomes.

Map of the seating arrangement of the Senate chamber.

The key idea embedded in this example is that the Cohen and Malloy (2010) exploited exogenous variation in the treatment intensity rather than group assignment. If the authors had followed an instrumental variables approach and looked for exogenous variation in subject group assignment, they would have needed to find an instrument that randomly assigned future senators to different colleges at the age of $18$ –a near impossible task. The authors instead identify the pathway through which the social network treatment effect travels (i.e., quiet discussions in the congressional chamber during senate recesses.) and look for exogenous variation in how wide this channel is.

The identification comes from the following comparison. Consider a situation in which $2$ rookie senators, $a$ and $b$ , each went to the same school, Faber College, but senator $a$ gets randomly assigned a seat next to $3$ other Faber College grads while senator $b$ is isolated from any Faber alumni. Even though both senators selected into the same school treatment group, only senator $a$ has a valid mechanism through which the social network treatment effect can travel. Intuitively, the within school vote correlation experienced by senator $b$ should be due only to background effects, while the correlation experienced by senator $a$ should be due to both background effects and the social network treatment.

Some Terminology

In order to make the above intuition a more concrete, I need to introduce some new notation from Pearl (2000). My goal is to be able to extend the natural intuition in the example above to more complicated settings.

Cohen and Malloy (2010) critically rely on defining the precise channel through which the treatment effect travels. If I want to extend their intuition, I first need a more precise definition of what constitutes a causal model (…and thus a causal chanel).

Definition (Causal Model): A causal model $M$ is a triple $(U,V,F)$ where:

$U = \begin{bmatrix} U_1, U_2, \ldots, U_K \end{bmatrix}$ is a vector of variables that are determined by factors outside the model,

$V = \begin{bmatrix} V_1, V_2, \ldots, V_N \end{bmatrix}$ is a set of variables that are endogenously determined within the model, and

$F = \begin{bmatrix} F_1, F_2, \ldots, F_N \end{bmatrix}$ is a set of functions
$\begin{align*}f_n: U \cup \{ V \setminus V_n \} \mapsto V_n\end{align*}$

such that the entire set of $F$ forms an acyclical mapping from $U$ to $V$ .

For simplicity, I will denote the $\sigma$ -algebra of all realizations of $U \cup V$ as $\Omega$ and the associated probability space as $(\Omega, p)$ . So, for example, $p(\omega)$ is the probability of observing a particular realization $\omega$ :

$\begin{align*} \omega &= \{\bar{U}_1, \bar{U}_2, \ldots, \bar{U}_K, \bar{V}_1, \bar{V}_2, \ldots, \bar{V}_N\} \end{align*}$

This definition has $2$ nice features. First, as Pearl (2000) emphasizes, it allows for the representation of causal models as graphs where each node is a random variable and directed edge is an affect. For instance, in panel $a)$ of the figure below, I show how to graph a causal chain in which the outcome variable $y \in V$ is affected by an endogenous variable $x \in V$ which is in turn affected by an exogenous variable $z \in U$ that has no direct effect on $y$ . In panel $b)$ of the figure, I show a graph of a causal model that is not well defined. Here, there are no exogenous variables on which to stand. The values of $y$ are determined by the values of $x$ , which are determined by the values of $e$ , which are in turn determined by the values of $y$ . Finally, in panel $c)$ of the figure, I show a graph of a causal model which would admit identification via an instrumental variables approach. Here, even though $x$ has a direct effect on $y$ , there exists a confounding variable $e$ which jointly determines both $x$ and $y$ . The instrumental variables approach suggests that an analyst use the exogenous variation in $x$ predicted by $z$ to circumvent the effect of $e$ .

Directed graphs proposed in Pearl (2000) as convenient representations of causal models.

Second, this definition explicitly models the channels through which different variables interact in the form of the function $F$ . This allows us to think about, not just adjusting which nodes are connected in the graphs above, but also the strength and nature of the connection.

With definition in hand, I now need to explicitly define how to quantify a causal effect:

Definition (Causal Effect): Let $M$ be a causal model, $X$ be a particular variable of the causal model $M$ , and $\bar{x}$ be a particular realization of this variable. Then, the effect $\Delta$ of taking the action $\mathtt{do}(X = \bar{x}_a)$ rather than $\mathtt{do}(X = \bar{x}_o)$ can be written as a distance metric over the elements of the probability space $(\Omega,p)$ :

$\begin{align*} \Delta &= \sum_{B \in \mathcal{B}} \left\{ \delta_B(M \mid \mathtt{do}(X = \bar{x}_a)) - \delta_B(M \mid \mathtt{do}(X = \bar{x}_o)) \right\} \cdot \tilde{p}(B) \end{align*}$

where $\mathcal{B} \subset \Omega$ is a collection of events in the $\sigma$ -algebra, $\delta_B: B \mapsto \mathcal{R}$ and $\tilde{p}$ is defined as,

$\begin{align*} \tilde{p}(B) &= \frac{p(B)}{\sum_{B' \in \mathcal{B}} p(B')} \end{align*}$

Perhaps the best way to parse this definition is to walk through a few examples. First, I consider how to represent an average treatment effect using this definition. This effect is the standard estimator used throughout much of the economics literature as well as in other fields such as pharmaceutical testing. Below I walk through this implementation:

Example (Average Treatment Effect): Consider a setting where $\delta$ is just the conditional expectation operator for variable $V_n$ , i.e.:

$\begin{align*} \delta_B(M \mid \mathtt{do}(X = \bar{x})) &= \mathbb{E}[V_n \mid X = \bar{x}, \omega \in B] \end{align*}$

Under these conditions, if we let $\mathcal{B}$ be the entire $\sigma$ -algebra $\Omega$ , we get the standard average treatment effect:

$\begin{align*} \Delta^{\mathtt{ATE}} &= \mathbb{E}[V_n \mid X = \bar{x}_a] - \mathbb{E}[V_n \mid X = \bar{x}_o] \end{align*}$

Heckman and Vytlacil (2005) argue that analysts should think hard about whether or not the commonly used $\Delta^{\mathtt{ATE}}$ estimator is appropriate for their purposes. As an alternative, they suggest estimating marginal treatment effects:

Example (Marginal Treatment Effect): Consider a setting where $\delta$ is still the conditional expectation operator, but restrict $\mathcal{B}$ to be the subsets of the event space over which agents would be indifferent between the treatment and control assignments:

$\begin{align*} \Delta^{\mathtt{MTE}} &= \sum_{B \in \mathcal{B}} \left\{ \mathbb{E}_B[V_n \mid X = \bar{x}_a] - \mathbb{E}_B[V_n \mid X = \bar{x}_o] \right\} \cdot \tilde{p}(B) \end{align*}$

We can interpret $\Delta^{\mathtt{MTE}}$ as the mean gain change in outcomes for subjects who would be indifferent between treatment or not. Finally, consider transforming the rough explanation of the causal effect given in the introductory example into more formal language:

Example (Price Impact of Speculative Trading, Ctd…): Consider the example in the introduction concerning the potential destabilizing effects of speculative trading. The effect outlined in this introductory example can be written more formally as follows. Suppose that $V_n$ is the deviation of the observed price from the fundamental value in a market and the variable $X$ is binary representing the existence or absence of speculators in a market. Define $\delta$ as:

$\begin{align*} \delta_B &= \sqrt{\mathbb{V}_B[V_n \mid X = \bar{x}]} \end{align*}$

Then the estimator described heuristically in the introductory example would be given by $\Delta = \mathbb{E}[V_n \mid X = \bar{x}_a] - \mathbb{E}[V_n \mid X = \bar{x}_o]$ where $\mathcal{B}$ is the entire $\sigma$ -algebra $\Omega$ .

An Additional Example

Finally, with this new terminology in hand, I want to visit a second more complicated application of the front door criterion by Blanchard, Katz, Hall and Eichengreen (1992) to study the hypothesis that, at the state level, an increase in the immigration rate decreases the unemployment rate. Below, I describe a stylized version of their results:

Example (Immigration Choice and the Unemployment Rate): States with the lowest levels of unemployment tend to enjoy the largest immigrant populations. Is this relationship causal? In this example, I consider the alternative hypothesis that a $1\%$ increase in a state’s immigrant population makes its economy more efficient and lowers its unemployment rate against the null hypothesis that some omitted variable co-determines both a state’s immigrant population percentage and its unemployment rate. For instance, immigrants might rationally choose to move to the states with the highest labor demand.

The key insight to identifying the causal effect of a state’s population make up on its unemployment rate is to pin down the mechanism through which this effect flows. Specifically, observe that, if $\%$ changes in a state’s immigrant population have a causal effect on its unemployment rate, then this effect should be more pronounced in states where immigrant form a larger fraction of the population to start with. e.g., if there is a causal link, a $1\%$ change in the immigrant population of California ought to have a larger effect than the same $1\%$ in Iowa; whereas, if there is no causal link and some omitted variable is co-determining both the immigrant population percentage and the unemployment rate, this link to the absolute number of immigrants added need not exist. Indeed, this is roughly what Blanchard, Katz, Hall and Eichengreen (1992) as shown in figure 8 of the original paper.

Using the definition from above, we can now cast this identification strategy as instrumenting for an exogenous shift in the function $F_n$ which maps the immigrant population percentage to the unemployment rate. Let $\bar{f}_{\mathtt{CA}}$ be the function $F_n$ in California and $\bar{f}_{\mathtt{IA}}$ be the function $F_n$ in Iowa. Then, roughly speaking, I can write the causal estimator $\Delta^{\mathtt{FDC}}$ as:

$\begin{align*} \begin{split} \Delta^{\mathtt{FDC}} &= \left( \mathbb{E}[V_n \mid F_n = \bar{f}_{\mathtt{CA}}, X = \bar{x}_a] - \mathbb{E}[V_n \mid F_n = \bar{f}_{\mathtt{CA}}, X = \bar{x}_o] \right) \\ &\qquad - \left( \mathbb{E}[V_n \mid F_n = \bar{f}_{\mathtt{IA}}, X = \bar{x}_a] - \mathbb{E}[V_n \mid F_n = \bar{f}_{\mathtt{IA}}, X = \bar{x}_o] \right) \end{split} \end{align*}$

In words, $\Delta^{\mathtt{FDC}}$ captures how much more the unemployment rate $V_n$ shifts when the $\%$ change in a state’s immigrant population is large $X = \bar{x}_a$ versus when it is small $X = \bar{x}_0$ in the California regime as compared to the Iowa regime. Note that this is the exact same intuition as above in the social ties example.

3. The Front Door Criterion

In this section I link the front door criterion to a regression based estimation strategy.

Identifying Speculative Price Impact

Why is identifying the price impact of speculators hard? What is the basic inference problem here? Consider the following example, and ask yourself: “Is it right to conclude that speculators caused the massive run-up in prices?”

Example (Price Impact of Speculative Trading, Ctd…): Suppose that you look at the stock market, and you observe Amazon’s stock price sky-rocketing from around $1$ dollar at the beginning of the year to over $100$ dollars at the end of the year.

What’s more, after mulling it over, suppose that you also conduct a survey of all tech-stock traders and discover that a large fraction of them were speculators. When asked, they responded that they were buying the stock just to resell it later at a higher price and placed no weight on any dividend payment concerns. To many outside observers, this seems like an air-tight case that speculators are driving up the stock price of Amazon.com; however, it turns out that you can’t be so sure.

Speculators only show up when prices are out of whack. For instance, if I am a speculator making my living off of shorting over-priced assets, buying under-priced assets and riding waves of excessive prices changes I would never enter a market with correct prices. There would be no money to be made. As a speculator, I could well be having a stabilizing effect on prices. Thus, the core challenge is to determine whether or not speculators are showing up in response to mis-pricing or instead causing mis-pricing via their trading behavior. This is the argument that Milton Friedman put forth in his 1953 book, Essays on Positive Economics.

Price of Amazon.com, Inc. during the Dot-Com bubble on a log scale. Source: Yahoo Finance.

Regression Framework

To make this intuition a bit more concrete, I now walk through a simple numerical example.⁵ My goal in this section is to lay out the simplest possible model in which it is feasible to study the inference problem above.

Consider a world containing with $M>0$ markets in which prices $p_m \in \{0,1\}$ in each market $m \in 1,2, \ldots , M$ are either correct or too high. There are no shades of grey and prices can never be too low. What’s more, suppose that speculators $s_m \in \{0,1\}$ either abstain or enter the market. Thus, the price in market $p_m$ can be written as below where $(p_m^0,p_m^1)$ represent the $2$ counterfactual states of the world in which speculators either abstain or enter market $m$ :

$\begin{align*} p_m = p_m^0 \cdot (1-s_m) + p_m^1 \cdot s_m \end{align*}$

This formulation allows me to ask questions about counterfactuals. Even though in empirically observed data, I will only ever see either $p_m^0$ or $p_m^1$ , I want an econometric framework in which I can think about both these observations. For instance, I am interested in questions like: “Suppose speculators entered market $m$ but not market $m'$ . Would the prices in market $m$ have been the same as they are in market $m'$ if no speculators had entered?”

From here, I can derive an OLS specification which maps this binary inference framework into a regression specification:

Proposition: (OLS Regression) Let $P$ be a random variable representing the price in an arbitrary market, $S$ be a random variable representing the existence of speculators in an arbitrary market, and let the $({}^0,{}^1)$ superscripts denote the values of an economy which contains/does not contain speculators. Then an OLS regression has the components:

$\begin{align*} P = \mu^0 + (\mu^1 - \mu^0) \cdot S + \left( \nu^0 + (\nu^1 - \nu^0) \cdot S \right) \end{align*}$

where $\mathbb{E}P^j = \mu^j$ and $P^j - \mathbb{E}P^j = \nu^j$ .

This proposition says that an OLS regression has an intercept which is the expected price in a market that has not been treated with speculators and a slope which is the change in the expected price in a market due to treatment with speculators.

However, the most helpful part of this proposition is actually the factorization of the pricing error into $2$ components. The first component, $\nu^0$ , is the difference between the observed prices in untreated markets and the expected price in an untreated market. This difference will be non- $0$ if, for example, markets with correct prices are less likely to attract speculators. Conversely, the second component, $(\nu^1 - \nu^0) \cdot S$ , depends on whether or not the treated markets are differentially more likely to by over-priced relative to their expected levels.

Proof: (OLS Regression) To derive the formulation above, start with the simple decomposition:

$\begin{align*} P &= P^0 \cdot (1 - S) + P^1 \cdot S \\ &= P^0 + (P^1 - P^0) \cdot S \\ &= \mathbb{E}P^0 + (P^1 - P^0) \cdot S + (P^0 - \mathbb{E}P^0) \\ &= \mu^0 + (\mu^1 - \mu^0) \cdot S + \left\{ \nu^0 + (\nu^1 - \nu^0) \cdot S \right\} \end{align*}$

This decomposition is nice because it tells us exactly where the identification problem will show up in the naive OLS regression framework. If speculators are more likely to show up in markets with excessively high prices, we should expect to see a distorted $\nu^0 + (\nu^1 - \nu^0) \cdot S$ term. The standard way to get around this problem is to use an instrumental variables approach. However, instruments are hard to come by. In the section below, I show how to use a new approach to identify the price impact of adding speculators to a market.

Regression Estimate of $\Delta^{\mathtt{FDC}}$

The last section illustrated how a naive OLS regression specification will deliver biased estimates of the price impact of adding speculators to a financial market if the speculators endogenously choose which markets to enter. What’s more, the decomposition in Proposition 1 details exactly how this bias will manifest itself as a either a negative $\nu^0$ term or a positive $(\nu^1 - \nu^0) \cdot S$ term. In this section, I introduce a the front door criterion as a way to circumvent this identification problem and estimate this price impact.

Specifically, I show that the causal effect can be calculated as follows:

Proposition: (Causal Effect Estimator) The causal effect estimator using the front door criterion in a $2$ -state system with outcome variable $P$ , mechanism $Z$ and explanatory variable $S$ can be written as:

$\begin{align*} \begin{split} \Delta^{\mathtt{FDC}} &= \left\{ \mathbb{E}[P=1 \mid S=1, Z=1] - \mathbb{E}[P=1 \mid S=0, Z=1] \right\} \\ & - \left\{ \mathbb{E}[P=1 \mid S=1, Z=0] - \mathbb{E}[P=1 \mid S=0, Z=0] \right\} \end{split} \end{align*}$

In practical terms, what is this proposition saying? Well, suppose that I estimated $\Delta^{\mathtt{FDC}} = 0.10$ . This proposition would then read that the price in market $m$ in a world where $S=1$ is $10\%$ points higher than the price in the exact same market in a counterfactual world where $S=0$ . Before I can explain the proposition in more detail, I need to define the variable $Z$ :

Definition: (Mechanism) A variable $Z$ is a mechanism relative to the ordered pair of variables $(P,S)$ if 1) $S$ only affects $P$ through $Z$ , and 2) $\tilde{Z} = (Z - \mathtt{Proj}[Z \mid S])$ is independent of any confounding variables affecting both $S$ and $P$ .

$Z$ is called a “mechanism” because all of the affect the explanatory variable $S$ on the outcome variable $P$ travels through $Z$ . This is where the name front door criterion comes from as well. For example, I use exogenous variation in the amount of funds available to speculators as my mechanism where $Z \in \{0,1\}$ means that speculators either have little free cash or they have a ton of free cash. So, while speculators can still choose which markets to enter, they may or may not have sufficient funds to really affect the market equilibrium. Thus, $Z$ acts like an instrument for the intensity of a (…perhaps endogenously selected…) treatment effect.

To make these ideas more tangible, consider the fake data table below to confirm the estimator computation. This table lays out the $4$ different states of the world that we can possibly observe with respect to the pair of variables $(S,Z)$ as well as their relative frequency and the probability of observing over-pricing in each of these states.

$\begin{equation*} \begin{array}{cc|cc} s & z & \mathbb{E}[S = s,Z=z] & \mathbb{E}[P = 1 \mid S=s,Z=z] \\ \hline \hline 0 & 0 & 0.45 & 0.10 \\ 0 & 1 & 0.05 & 0.50 \\ 1 & 0 & 0.05 & 0.20 \\ 1 & 1 & 0.45 & 0.70 \end{array} \end{equation*}$

Not every channel is a valid mechanism though. $Z$ needs to have an additional property that an residual variation in $Z$ not explained by variation in $S$ is uncorrelated with any confounding variables.⁶ Let me make this idea clearer via an example of how $Z$ might violate this requirement. Consider some confounding variable $N$ that makes speculators enter a market and prices to over-heat. For instance, think of $N \in \{ 0,1\}$ as a dummy variable for whether or not the New York Times wrote a news article about a company⁷ For $Z$ to be an invalid mechanism, it would have to be the case that speculators tend to have unexpectedly more funds precisely when the New York Times is most likely to write an article about an industry; i.e., an invalid instrument would yield:

$\begin{align*} 0 \neq \mathbb{E} \left[ \ \left( Z - \mathtt{Proj}[Z \mid S] \right) \cdot N \ \right] \end{align*}$

Having I’ve outlined the basic elements of the proposition, I now give an intuitive explanation and refer any interested readers to Pearl (2000):

Intuition: (Causal Effect Estimator) Consider the first line. This difference captures the increase in the likelihood of over-pricing (i.e., $\mathbb{E}[P=1 \mid \cdot ]$ ) due to speculators entering a market (i.e., $S=1$ rather than $S=0$ ) when they have a lot of capital and should have a larger effect (i.e., $Z=1$ ). Now, consider the second line. This difference captures the increase in the likelihood of over-pricing (i.e., $\mathbb{E}[P=1 \mid \cdot ]$ ) due to speculators entering a market (i.e., $S=1$ rather than $S=0$ ) when they have do not have very much capital and should have a smaller effect (i.e., $Z=0$ ).

Now, suppose that the correlation between the existence of speculators in market $m$ and over-pricing in market $m$ is purely spurious and due to some confounding variable $N$ that drives up prices and speculator demand at the same time. In this world, we should expect to see the differences in both lines be the same. Opening and closing the nozzle on an unconnected hose should have no effect on the amount of water coming out of it.

On the other hand, suppose that the effect is not entirely spurious. Then unexpectedly giving speculators more funds will cause prices to rise relatively more leading to an estimate of $\Delta^{\mathtt{FDC}}>0$ .

Note that this approach does not rule out the possibility that speculators are still endogenously choosing to invest in over-priced markets to some degree. For instance, consider a world where:

$\begin{align*} 0 \neq \mathbb{E}[P=1 \mid S=1, Z=0] - \mathbb{E}[P=1 \mid S=0, Z=0] \end{align*}$

In this world, even where speculators can have little price impact, they are still good predictors of over-pricing. In this section, I show how to implement this estimator using a $2$ stage regression. This step is an immediate extension of the previous section and I give the main result below:

Proposition: ( $2$ -Stage Regression) Consider a system of variables $(P,S,Z)$ as outlined above. The following $2$ -stage regression procedure is an unbiased and consistent estimator of $\Delta^{\mathtt{FDC}}$ if $Z$ is a valid mechanism.

Stage $1$ :

$\begin{align*} Z &= \alpha + \beta \cdot S + \tilde{Z} \end{align*}$

Stage $2$ :

$\begin{align*} P &= \gamma + \Delta^{\mathtt{FDC}} \cdot \tilde{Z} + \varepsilon \end{align*}$

This result follows directly from reading the first differencing in a $2$ -state framework as a reduced form projection.

4. Conclusion

This identification strategy is new, and as a result there are a lot of places where using the front door criterion might yield new results for tough econometric problems. The key advantage is the flexibility to randomize the intensity of the treatment rather than the treatment assignment as in the standard IV framework.

I am currently working on a paper with Chris Mayer in which we use this identification strategy to parse the causal effect of introducing out of town second home buyers into a housing market on local house prices. In this setting we use relative city size rather than funding constraints as our mechanism. We find that air-dropping out of town speculators into a housing market causes house price appreciation. ↩
See Angrist and Pischke (2010). ↩
There are some settings such as development economics (e.g., See Banerjee and Duflo (2008).) where this natural experiment approach is feasible. However, for the majority of econometric questions this approach is difficult to implement. Much of the econometrics research is an effort to bridge this gap with various levels of success (e.g., see Lalonde (1986)). ↩
See Pearl (1995) and Pearl (2000). ↩
The analytical framework in this section comes from Ch. 3 in Morgan and Winship (2007) ↩
This is an admittedly a very ragged statement; for a more detailed treatment of this idea, read through Ch. 3 of Pearl (2000). For brevity’s sake, I trimmed much of my original discussion on the nuts and bolts of graphical models of causality. This is the cleanest way of looking at causal inference in my opinion and I really recommend this text. ↩
See Huberman and Rogev (2002) for a real world example in the pharmaceutical industry. ↩

Digesting the Hansen and Scheinkman Multiplicative Decomposition of the SDF

July 12, 2011 by Alex

Introduction¹

I give some intuition behind the multiplicative decomposition of the stochastic discount factor $M_{t \to t+h}$ introduced in Hansen and Scheinkman (2009). The economics underlying the original Hansen and Scheinkman (2009) results was not clear to me during my initial readings. This post collects my efforts to interpret these mathematical ideas in a sensible way.

Below I formally state the decomposition.

Theorem (Hansen and Scheinkman Decomposition): Suppose that $\phi_M$ is a principal eigenfunction with eigenvalue $\lambda_M$ for the extended generator of the stochastic discount factor $M$ . Then this multiplicative functional can be decomposed as:

$\begin{align*} M_{t \to t+h} \ &= \ e^{\lambda_M \cdot h} \cdot \left( \frac{\phi_M(X_t)}{\phi_M(X_{t+h})} \right) \cdot \hat{M}_{t \to t+h} \end{align*}$

where $\hat{M}_{t \to t+h}$ is a local martingale.

The stochastic discount factor $M_{t \to t+h}$ dictates how to discount cashflows occurring $h$ periods in the future in state $X_{t+h}$ . Roughly speaking, Hansen and Scheinkman (2009) factors $M_{t \to t+h}$ into $3$ different pieces: a state independent component $e^{\lambda_M \cdot h}$ , an investment horizon independent component $\phi_M(X_t)/\phi_M(X_{t+h})$ , and a white noise component $\hat{M}_{t \to t+h}$ .

Thus, you should think about $\lambda_M$ as a generalized time preference parameter. $\lambda_M$ will generally be negative, so $e^{\lambda_M \cdot h}$ is the continuous time representation of the state independent discount rate dictated by an asset pricing model. The ratio $\phi_M(X_t) / \phi_M(X_{t+h})$ captures the rate at which I discount payments at time $t+h$ given the state today at time $t$ and the state at time $t+h$ . This ratio is independent of $h$ meaning that if $X_{t+h} = X_{t+h'}$ , then for any $h$ and $h'$ we have:

$\begin{align*} \frac{\phi_M(X_t)}{\phi_M(X_{t+h})} \ &= \ \frac{\phi_M(X_t)}{\phi_M(X_{t+h'})} \end{align*}$

Finally, $\hat{M}_{t+h}$ represents a random noise component with $\mathbb{E}\hat{M}_{t+h} = 1$ and independent increments.

Motivation

The Hansen and Scheinkman decomposition generalizes the binomial options pricing framework for use in standard asset pricing applications by allowing for more complicated state space features like jumps and time averaging.² The main advantages of casting the stochastic discount factor as a multiplicative functional are $a)$ the use of the binomial pricing intuition to understand more complicated asset pricing models and $b)$ the streamlining of the econometrics needed to compare excess returns at different horizons.³

To illustrate the basic intuition behind this analogy, I work through the Black, Derman and Toy (1990) model.

Example (Binomial Model): Consider a discrete time, binomial world with states $X_t \in \{d,u\}, \ \forall t \geq 0$ in which traders have an independent probability $\pi(x)$ of entering state $x$ in the next period regardless of the current state. In this world, the price $P_{t \to t+1}$ at time $t$ of a risk free bond that pays out $1 at time $t+1$ is given by the expression:

$\begin{align*} P_{t \to t+1} \ &= \ \frac{\pi(u) \cdot 1 + \pi(d) \cdot 1}{1 + r^f_{t+1}} \end{align*}$

This $1$ step ahead pricing rule applies at each and every starting date $t$ . All pricing computations at longer horizons are built up from this local relationship based on the prevailing short rate $r_{t+1}^f$ .

To solve the model, I need to assume that the short rate $r_{t+1}^f$ process has independent log-normal increments. I could then use the volatility of this process to pin down the values of the short rate for the entire binomial tree.

In general, models of this sort are easy to solve analytically if the short rate process has log-normal increments. The recent papers Lettau and Wachter (2007), Van Binsbergen, Brandt and Koijen (2010) and Backus, Chernov and Zin (2011) adopt similar approaches and try to extend these insights to equity markets.

Nevertheless, most asset pricing models are not log-normal and will not suffer pen and paper analysis of their term structure using existing methods. Thus, in order to use cross-horizon predictions to discriminate between alternative models, we must adopt new mathematical tools.

Example (Binomial Model, Ctd…): We use operator methods to factor the discount factor process $M_{t \to t+h}$ which deflates payments in state $X_{t+h}$ at time horizon $t+h$ back to time $t$ into $3$ pieces, $e^{\lambda_M \cdot h}$ , $\tilde{\phi}_M(X_{t+h},X_t)$ and $\hat{M}_{t \to t+h}$ , where the first factor only depends on the investment horizon $h$ , the second factor only depends on the realized states and the third factor is noise, so that $M_{t \to t+h} = e^{\lambda_M \cdot h} \cdot \tilde{\phi}_M(X_{t+h},X_t) \cdot \hat{M}_{t \to t+h}$ .

By visual analogy to the Black, Derman and Toy (1990) model, in a binomial world we can use this decomposition to rewrite the $h=1$ Euler equation below where the dependence on $X_t$ is implicit:

$\begin{align*} 1 \ &= \ \mathbb{E}_t \left[ \ M_{t \to t+1} \cdot R_{t \to t+1} \ \right] \\ &= \ \frac{\pi(u) \cdot \tilde{\phi}_M(u) \cdot \varepsilon(u) \cdot R(u) + \pi(d) \cdot \tilde{\phi}_M( d) \cdot \varepsilon(u) \cdot R(d)}{1 - \lambda_M} \end{align*}$

Thus, in the Hansen and Scheinkman (2009) decomposition, $- \lambda_M$ serves as a synthetic risk free rate and the $\pi(x) \cdot \tilde{\phi}_M(x)$ serve as the twisted martingale measure.

In my work with Anmol Bhandari⁴ we look at a class of models for which $\ln \tilde{\phi}_M(x)$ is affine⁵ and show how to use this decomposition to compute a cross-horizon analogue to the Hansen and Jagannathan (1991) volatility bound. This new bound can be used to discriminate between different models which make identical predictions at a particular horizon. This exponentially affine structure is useful as it permits closed form solutions for the moments of $M_{t \to t+h}$ :

$\begin{align*} \mathbb{E}_t[M_{t \to t+h}] \ &\approx \ e^{\lambda_M \cdot h} \cdot \mathbb{E}_0 \left[ \frac{\phi_M(X_t)}{\phi_M(X_{t+h})} \right] \cdot 1 \\ \mathbb{E}_t[M_{t \to t+h}^2] \ &\approx \ e^{\lambda_{M^2} \cdot h} \cdot \mathbb{E}_0 \left[ \frac{\phi_{M^2}(X_t)}{\phi_{M^2}(X_{t+h})} \right] \cdot 1 \end{align*}$

In the next $2$ sections, I walk through the economics governing the $\lambda_M$ and $\phi_M$ terms.

Time Preference

Where does $\lambda_M$ come from? In the original article, the authors refer to $\lambda_M$ as the principle eigen-value of the extended generator of $M$ ; however, $\lambda_M$ has a well defined meaning without ever subscribing to Perron-Frobenius theory. $\lambda_M$ is a generalization of the time preference parameter dictated by an asset pricing model.

Consider the following thought experiment which casts the $\lambda_M$ term as the time preference parameter plus an extra Jensen inequality term.

Example (Generalized Time Preference): Suppose that an agent has preferences over a stream of consumption $C_1, C_2, C_3, ...$ and that for each period $t$ , $C_t = 100$ with probability $0.95$ and the remaining $5\%$ of the time $C_t = 50$ or $C_t = 150$ with equal probability. While $\mathbb{E}_t[C_{t+1}] = 100$ , the certainty equivalent is $\mathbb{E}_t^{c.e.}[C_{t+1}] < 100^{1-\gamma} = \mathbb{E}_t^*[C_{t+1}]$ .

In fact, with probability $0.05$ the agent will get a payout worth:

$\begin{align*} \mathbb{E}_t^{c.e.}[C_{t+1} \mid C_{t+1} \neq 100 ] \ &= \ \frac{50^{1-\gamma}}{2} + \frac{150^{1-\gamma}}{2} \end{align*}$

Let’s call this certainty equivelant gap $\delta$ :

$\begin{align*} \delta \ &= \ \mathbb{E}_t^{c.e.}[C_{t+1} \mid C_{t+1} \neq 100 ] \ - \ 100^{1-\gamma} \end{align*}$

$\lambda_M$ should then include both time preference, $\rho$ , and also the expected Jensen’s inequality loss:

$\begin{align*} \lambda_M \ &= \ \rho \ + \ 0.05 \cdot \delta \end{align*}$

Thus, in a more general framework, we should expect $\lambda_M$ to have roughly the following form:

$\begin{align*} \lambda_M \ &= \ \rho \ + \ f(\sigma_M^2, \sigma_X^2, \sigma_{M \times X}) \end{align*}$

where $f$ is an affine function. Heuristically, the $\sigma_X$ component will capture how volatile the state space is while the $\sigma_M$ component will capture how badly I need to discount this consumption stream due to Jensen’s inequality.

State Dependence

Next, in order to capture the dependence of the discount factor $M_{t \to t+h}$ on the current and future state $(X_t,X_{t+h})$ , Hansen and Scheinkman (2009) downshift to continuous time and apply the Perron-Frobenius theorem to the infinitesimal generator of the discount factor. When applied to the transition probability matrices, the Perron-Frobenius theory implies the largest eigen-pair dominates the behavior of a stochastic process as $h \to \infty$ . Hansen and Scheinkman use this $h \to \infty$ limiting result to argue that the ratio of $\phi_M(X_t)/\phi_M(X_{t+h})$ , the largest eigen-functions of the generator of the discount factor $M$ , is a good choice for the state dependent component of $M_{t \to t+h}$ .

It is important to note that Perron-Frobenius theory is only a modeling tool in the Hansen and Scheinkman (2009) construction, not a critical feature of their results. There may well be other reasonable choices for the state dependent component of $M_{t \to t+h}$ . In its simplest form⁶, the result can be written as:

Theorem (Perron-Frobenius): The largest eigen-value $\lambda$ of a positive square matrix $A$ is both simple and positive and belongs to a positive eigenvector $\phi$ . All other eigen-values are smaller in absolute value.⁷

In order to use this theorem, I need to have a positive square matrix to operate on. While strictly positive, $M_{t \to t+h}$ is not a square matrix; however, its infinitesimal generator is. Heuristically, you can think about the infinitesimal generator as encoding the transition probability matrix under the equivalent martingale measure deflated by the time preference parameter.

Definition (Infinitesimal Generator): The infinitesimal generator $\mathbb{A}$ of an Ito diffusion $\{ X_t \}$ in $\mathcal{R}^n$ is defined by:

$\begin{align*} \mathbb{A}[ f(x)] \ &= \ \lim_{h \searrow 0} \ \frac{\mathbb{E}_0[ f(X_h) ] - f(x)}{h}, \end{align*}$

where the set of functions $f: \mathcal{R}^n \mapsto \mathcal{R}$ such that the limit exists at $x$ is denoted by $\mathcal{D}_A(x)$ .

In words, the infinitesimal generator of the discount factor $M_{t \to t+h}$ captures how my valuation of a $1 payment in, say, the up state $u$ will change if I move the payment from $h=1$ period in the future to $h=2$ periods in the future. To get a feel for what the infinitesimal generator captures, consider the following short example using a $2$ state Markov chain. First, I define the physical transition intensity matrix for the Markov process $X_t$ .

Example (Markov Process w/ $2$ States): Consider a $2$ state Markov chain with states $X_t \in \{u,d\}$ . First, consider the physical evolution of the stochastic process $X_t$ which is governed by an $2 \times 2$ intensity matrix $\mathbb{T}$ . An intensity matrix encodes all of the transition probabilities. The matrix $e^{h \cdot \mathbb{T}}$ is the matrix of transition probabilities over a horizon $h$ . Since each row of the transition probability matrix $e^{h \cdot \mathbb{T}}$ must sum to $1$ , each row of the transition intensity matrix $\mathbb{T}$ must sum to $0$ .

$\begin{align*} \mathbb{T} \ &= \ \begin{bmatrix} \tau(u \mid u) & \tau(d \mid u) \\ \tau(u \mid d) & \tau(d \mid d) \end{bmatrix} \end{align*}$

The diagonal entries are nonpositive and represent minus the intensity of jumping from the current state to a new one. The remaining row entries, appropriately scaled, represent the conditional probabilities of jumping to the respective states. For concreteness, the following parameter values would be suffice:

$\begin{align*} \mathbb{T} \ &= \ \begin{bmatrix} -0.10 & 0.10 \\ 0.05 & -0.05 \end{bmatrix} \end{align*}$

Next, I want to show how to modify this transition intensity matrix $\mathbb{T}$ to describe the local evolution of the discount factor process $M_t$ . To do this, I first need to have an asset pricing model in mind, and I use a standard CRRA power utility model with risk aversion parameter $\gamma$ as in Breeden (1979) where $X_t$ is the log of the expected consumption growth.

Example (Markov Process w/ $2$ States, Ctd…): Intuitively, I know that every period I push the payment out into the future, I will end up discounting the payment by an additional $e^{\lambda_M}$ . However, I know that I will also have to twist $\mathbb{T}$ from the physical measure over to the risk neutral measure. Thus, the resulting generator will look something like:

$\begin{align*} \mathbb{A} \ &= \ \begin{bmatrix} \tau(u \mid u) \cdot \tilde{\phi}_M(u \mid u) & \tau(u \mid d) \cdot \tilde{\phi}_M(u \mid d) \\ \tau(d \mid u) \cdot \tilde{\phi}_M(d \mid u) & \tau(d \mid d) \cdot \tilde{\phi}_M(d \mid d) \end{bmatrix} \ - \ \lambda_M \end{align*}$

If we (correctly) assume that $\tilde{\phi}_M(s' \mid s) = 1$ , then we have:

$\begin{align*} \alpha(s' \mid s) \ &= \ \begin{cases} \tau(s' \mid s) - \lambda_M &\text{ if } s' = s \\ \tau(s' \mid s) \cdot \tilde{\phi}_M(s' \mid s) - \lambda_M &\text{ if } s' \neq s \end{cases} \end{align*}$

Note that the rows of $\mathbb{A}$ will in general not sum to $0$ as in the physical transition intensity matrix $T$ .

An Example

I conclude by working through an extended example showing how to solve for each of the terms in a simple model. Think about a Vasicek (1977) interest rate model. Let $X_t$ be a risk factor with the following scalar Ito diffusion. I choose this model so that I can verify all of my solutions by hand using existing techniques.

$\begin{align*} dX_t \ &= \ \beta_X(X_t) \cdot dt \ + \ \sigma_X(X_t) \cdot dB_t \\ \beta_X(x) \ &= \ \bar{\beta}_X \ - \ \beta_X \cdot x \\ \sigma_X(x) \ &= \ \sigma_X \end{align*}$

Let $M_t=\exp \{A_t\}$ and $A_t$ solves the following Ito diffusion.

$\begin{align*} dA_t \ &= \ \beta_A(X_t) \cdot dt \ + \ \sigma_A(X_t) \cdot dB_t \\ \beta_A(x) \ &= \ \bar{\beta}_A \ - \ \beta_A \cdot x \\ \sigma_A(x) \ &= \ \sigma_A \end{align*}$

Thus $(X_t,M_t)$ are described by parameter vector $\Theta$ :

$\begin{align*} \Theta \ &= \ \begin{bmatrix} \beta_X & \beta_A & \bar{\beta}_X & \bar{\beta}_A & \sigma_X & \sigma_A \end{bmatrix} \end{align*}$

We need to restrict $\Theta$ to ensure stationarity. Matching coefficients to ensure that $\lambda_M$ does not move with $x$ yields the following characterization of $\kappa_M$ .

$\begin{align*} \kappa_M \ &= \ - \ \frac{\beta_A}{\beta_X} \end{align*}$

Substituting back into the formula for $\lambda_M$ yields.

$\begin{align*} \begin{split} \lambda_M \ &= \ \left( \ \bar{\beta}_A \ + \ \frac{\sigma_A^2}{2} \ \right) \\ &\qquad \qquad + \ \left( \ \bar{\beta}_X \ + \ \sigma_A \cdot \sigma_X \ \right) \cdot \kappa_M \\ &\qquad \qquad \qquad + \ \left( \ \frac{\sigma_X^2}{2} \ \right) \cdot \kappa_M^2 \end{split} \end{align*}$

We know that $M_t^2 =\exp\{2 \cdot A_t\}$ .

$\begin{align*} \kappa_{M^2} \ &= \ - \ \frac{2 \cdot \beta_A}{\beta_X} \\ \lambda_{M^2} \ &= \ 2 \cdot \lambda_M \ + \ \sigma_A^2 \ + \ \left( \ \sigma_A \cdot \sigma_X \ \right) \cdot \kappa_{M^2} \ + \ \left( \ \frac{\sigma_X^2}{4} \ \right) \cdot \kappa_{M^2}^2 \end{align*}$

Exercise (Offsetting Shocks): If $\rho$ is the standard time preference parameter, when would $\lambda_M = \rho$ ?

Exercise (Stochastic Volatility): Think about a Feller square root term to allow for stochastic volatility a lá Cox, Ingersoll and Ross (1985) interest rate model.

$\begin{align*} dX_t \ &= \ \beta_X(X_t) \cdot dt \ + \ \sigma_X(X_t) \cdot dB_t \\ \beta_X(x) \ &= \ \bar{\beta}_X \ - \ \beta_X \cdot x \\ \sigma_X(x) \ &= \ \sigma_X \cdot \sqrt{x} \end{align*}$

$\begin{align*} dA_t \ &= \ \beta_A(X_t) \cdot dt \ + \ \sigma_A(X_t) \cdot dB_t \\ \beta_A(x) \ &= \ \bar{\beta}_A \ - \ \beta_A \cdot x \\ \sigma_A(x) \ &= \ \sigma_A \cdot \sqrt{x} \end{align*}$

What are $\kappa_M$ and $\lambda_M$ ?

Note: The results in this post stem from joint work I am conducting with Anmol Bhandari for our paper “Model Selection Using the Term Structure of Risk”. In this paper, we characterize the maximum Sharpe ratio allowed by an asset pricing model at each and every investment horizon. Using this cross-horizon bound, we develop a macro-finance model identification toolkit. ↩
e.g., think of the state space needed in the Campbell and Cochrane (1999) habit model. ↩
Investment horizon symmetry is an unexplored prediction of many asset pricing theory. Asset pricing models characterize how much a trader needs to be compensated in order to hold 1 unit of risk for 1 unit of time. The standard approach to testing these models is to fix the unit of time and then look for incorrectly priced packets of risk. e.g., Roll (1981) looked at the spread in 1 month holding period returns on 10 portfolios of NYSE firms sorted by market cap and found that small firms earned abnormal excess returns relative to the CAPM. Yet, I could just as easily ask the question: Given a model, how much more does a trader need to be compensated for her to hold the same 1 unit of risk for an extra 1 unit of time? This inversion is well defined as asset pricing models possess investment horizon symmetry. Models hold at each and every investment horizon running from $1$ second to 1 year to 1 century and everywhere in between. To illustrate this point via an absurd case, John Cochrane writes in his textbook (Asset Pricing (2005), Section 9.3.) that according to the consumption CAPM ‘…if stocks go up between 12:00 and 1:00, it must be because (on average) we all decided to have a big lunch.’ ↩
See Model Selection Using the Term Structure of Risk. ↩
This class of models allows for features such as rare disasters, recursive preferences and habit formation among others… ↩
Really, this is just the Oskar Perron version of the theorem. ↩
For an introduction to Perron-Frobenius theory, see MacCluer (2000). ↩

« Previous Page

1. Introduction

2. Nickle and Dimed

3. The Brain Drain

4. Why Do Public Markets Persist?

1. Introduction

2. Mathematical Foundation

3. Motivating Examples

4. Construction

1. Introduction1

Motivation

Outline

2. Terminology and Examples

An Introductory Example

Some Terminology

An Additional Example

3. The Front Door Criterion

Identifying Speculative Price Impact

Regression Framework

Regression Estimate of

4. Conclusion

Introduction1

Motivation

Time Preference

State Dependence

An Example

1. Introduction¹

Regression Estimate of $\Delta^{\mathtt{FDC}}$

Introduction¹