Uncategorized – Page 20 – Research Notebook

Using the Front Door Criterion

July 14, 2011 by Alex

1. Introduction¹

I show how to use the front door criterion rather than an instrumental variables approach to identify causal effects in non-experimental settings.

Motivation

Every econometrician is familiar with the experimental ideal²: in order to test a hypothesis a scientist should collect a large group of identical subjects, split them into $2$ groups, administer a treatment to only $1$ of the groups, and then quantify the difference in outcomes between the groups. For instance, consider an application of this experimental design to test the hypothesis that speculative traders destabilize asset prices:

Example (Price Impact of Speculative Trading): Does speculative trading destabilize prices? To answer this question in an idealized world, I would execute the following experiment. First, I would find $1000$ identical asset markets for which I knew the true value of the asset. Then, in $500$ of these markets, I would air-drop speculative traders into the market. This half of the markets would be the treatment group, while the remaining half of the markets would be the control group.

After allowing the speculators to trade, I would then compare the average volatility of the asset prices centered around their true values in each of the groups. If the asset prices in the treatment group were measurably farther from their fundamental value, I would conclude that the introduction of speculators into an asset market causes mispricing.

Although this is a nice benchmark, econometricians generally don’t have the luxury of executing this idealized experiment design due to physical, financial or ethical constraints.³ The setting above often breaks down leaving the analysts to account for confounding effects like dissimilar treatment and control groups.

For instance, if the treatment and control groups are observably different, an econometrician can include control variables in a regression framework in order to account for this variation. If the treatment and control groups are unobservably different, i.e. there is endogenous treatment selection, the problem is more challenging. Here, the standard approach in the economics literature has been to use an instrumental variables approach whereby, as an econometrician, I would look for an instrumental variable which co-varies with treatment assignment but not with any confounding effects.

In this post, I provide an alternative route to identifying a causal effect in the presence of unobservable variation in the treatment and control groups called the front door criterion. This approach was introduced by Judea Pearl in the mid-1990’s⁴. Rather than focusing on exogenous variation in treatment selection, this approach exploits exogenous variation in the strength of the treatment. Thus, even if agents endogenously selected their broad treatment groups, if there is exogenous within group variation in how intensely each agent was treated, we can use this variation to identify the causal effect of treatment.

Outline

I proceed as follows. First, to build intuition for the approach, in the next section I give $2$ examples of how the front door criterion has been used in the economics literature without being explicitly named. I also use these examples to formally define the causal inference problem facing the econometrician in the language of directed graphs outlined in Pearl (2000).

Then, in section 3 I define the front door criterion approach to causal inference explicitly and show how it applies to the problem of identifying the price impact of introducing speculative traders to a market. I also illustrate how to implement this identification strategy using an OLS regression framework

2. Terminology and Examples

In this section I illustrate how the front door criterion is fundamentally different from the instrumental variables approach and also introduce some additional terminology in the context of $2$ examples.

An Introductory Example

The front door criterion has been used without a name in the economics literature since at least the early 1990’s in the form of Blanchard, Katz, Hall and Eichengreen (1992)‘s work on macro-laboreconomics. Cohen and Malloy (2010) execute one of the cleanest quasi-experiments using this approach. These authors are in interested in the effect of social ties on congressional voting outcomes. In the example below, I outline their experimental design and discuss their results:

Example (Social Ties and Congressional Voting): Do social ties between U.S. senators affect their voting behavior? For example, when a senator from another state has an important bill to pass, are congressmen who attended the same college more likely to vote in favor of the bill? Cohen and Malloy (2010) find this exact result.

More precisely, that authors find that the $\%$ of senators in congressman $i$ ‘s alumni network that vote for a given bill predicts congressman $i$ ‘s voting behavior on the bill. So, for example, consider the voting behavior of senator $i$ who attended Harvard. The result reads that senator $i$ is more likely to vote yes on the bill when $90\%$ of the other congressmen who have a tie to Harvard vote “yes” on a bill relative to when only $10\%$ do.

What’s more, this social network effect on congressman $i$ ‘s voting decisions is increasing in the strength of network. So, for instance, senators that attended the same school at the same time have more correlated votes than senators that just went to the same school. Finally, the results seem to be robust to school, ideology, time period and senator fixed effects.

The trouble with interpreting the results above is that school choice is not an exogenous variable. For instance, students who chose to go to UC Berkeley in the 1960’s were very different that those that chose to go to the University of Alabama during the same period. It could be that some omitted variable related to each senator’s upbringing is driving both school choice and voting decisions. In order to solve this problem, Cohen and Malloy (2010) use the front door criterion by exploiting the fact that social ties between congressmen work through discussion on the floor of the senate chamber (shown below).

Example (Social Ties and Congressional Voting, Ctd…): In order to get a few extra votes for an important bill, a congressmen might turn to the people sitting just to his left and right and try to convince them to vote his way. This coercion becomes increasingly hard the longer a senator must travel to get in a few quiet words with his colleague. Importantly, the seating is strictly assigned with senior senators getting the best seats and rookie senators getting whatever seats are left over. Thus, the seating of rookie senators is randomized.

Cohen and Malloy (2010) find that within the group of rookie senators the congressmen who were randomly assigned a seat which was closer to school mates had more correlated voting outcomes.

Map of the seating arrangement of the Senate chamber.

The key idea embedded in this example is that the Cohen and Malloy (2010) exploited exogenous variation in the treatment intensity rather than group assignment. If the authors had followed an instrumental variables approach and looked for exogenous variation in subject group assignment, they would have needed to find an instrument that randomly assigned future senators to different colleges at the age of $18$ –a near impossible task. The authors instead identify the pathway through which the social network treatment effect travels (i.e., quiet discussions in the congressional chamber during senate recesses.) and look for exogenous variation in how wide this channel is.

The identification comes from the following comparison. Consider a situation in which $2$ rookie senators, $a$ and $b$ , each went to the same school, Faber College, but senator $a$ gets randomly assigned a seat next to $3$ other Faber College grads while senator $b$ is isolated from any Faber alumni. Even though both senators selected into the same school treatment group, only senator $a$ has a valid mechanism through which the social network treatment effect can travel. Intuitively, the within school vote correlation experienced by senator $b$ should be due only to background effects, while the correlation experienced by senator $a$ should be due to both background effects and the social network treatment.

Some Terminology

In order to make the above intuition a more concrete, I need to introduce some new notation from Pearl (2000). My goal is to be able to extend the natural intuition in the example above to more complicated settings.

Cohen and Malloy (2010) critically rely on defining the precise channel through which the treatment effect travels. If I want to extend their intuition, I first need a more precise definition of what constitutes a causal model (…and thus a causal chanel).

Definition (Causal Model): A causal model $M$ is a triple $(U,V,F)$ where:

$U = \begin{bmatrix} U_1, U_2, \ldots, U_K \end{bmatrix}$ is a vector of variables that are determined by factors outside the model,

$V = \begin{bmatrix} V_1, V_2, \ldots, V_N \end{bmatrix}$ is a set of variables that are endogenously determined within the model, and

$F = \begin{bmatrix} F_1, F_2, \ldots, F_N \end{bmatrix}$ is a set of functions
$\begin{align*}f_n: U \cup \{ V \setminus V_n \} \mapsto V_n\end{align*}$

such that the entire set of $F$ forms an acyclical mapping from $U$ to $V$ .

For simplicity, I will denote the $\sigma$ -algebra of all realizations of $U \cup V$ as $\Omega$ and the associated probability space as $(\Omega, p)$ . So, for example, $p(\omega)$ is the probability of observing a particular realization $\omega$ :

$\begin{align*} \omega &= \{\bar{U}_1, \bar{U}_2, \ldots, \bar{U}_K, \bar{V}_1, \bar{V}_2, \ldots, \bar{V}_N\} \end{align*}$

This definition has $2$ nice features. First, as Pearl (2000) emphasizes, it allows for the representation of causal models as graphs where each node is a random variable and directed edge is an affect. For instance, in panel $a)$ of the figure below, I show how to graph a causal chain in which the outcome variable $y \in V$ is affected by an endogenous variable $x \in V$ which is in turn affected by an exogenous variable $z \in U$ that has no direct effect on $y$ . In panel $b)$ of the figure, I show a graph of a causal model that is not well defined. Here, there are no exogenous variables on which to stand. The values of $y$ are determined by the values of $x$ , which are determined by the values of $e$ , which are in turn determined by the values of $y$ . Finally, in panel $c)$ of the figure, I show a graph of a causal model which would admit identification via an instrumental variables approach. Here, even though $x$ has a direct effect on $y$ , there exists a confounding variable $e$ which jointly determines both $x$ and $y$ . The instrumental variables approach suggests that an analyst use the exogenous variation in $x$ predicted by $z$ to circumvent the effect of $e$ .

Directed graphs proposed in Pearl (2000) as convenient representations of causal models.

Second, this definition explicitly models the channels through which different variables interact in the form of the function $F$ . This allows us to think about, not just adjusting which nodes are connected in the graphs above, but also the strength and nature of the connection.

With definition in hand, I now need to explicitly define how to quantify a causal effect:

Definition (Causal Effect): Let $M$ be a causal model, $X$ be a particular variable of the causal model $M$ , and $\bar{x}$ be a particular realization of this variable. Then, the effect $\Delta$ of taking the action $\mathtt{do}(X = \bar{x}_a)$ rather than $\mathtt{do}(X = \bar{x}_o)$ can be written as a distance metric over the elements of the probability space $(\Omega,p)$ :

$\begin{align*} \Delta &= \sum_{B \in \mathcal{B}} \left\{ \delta_B(M \mid \mathtt{do}(X = \bar{x}_a)) - \delta_B(M \mid \mathtt{do}(X = \bar{x}_o)) \right\} \cdot \tilde{p}(B) \end{align*}$

where $\mathcal{B} \subset \Omega$ is a collection of events in the $\sigma$ -algebra, $\delta_B: B \mapsto \mathcal{R}$ and $\tilde{p}$ is defined as,

$\begin{align*} \tilde{p}(B) &= \frac{p(B)}{\sum_{B' \in \mathcal{B}} p(B')} \end{align*}$

Perhaps the best way to parse this definition is to walk through a few examples. First, I consider how to represent an average treatment effect using this definition. This effect is the standard estimator used throughout much of the economics literature as well as in other fields such as pharmaceutical testing. Below I walk through this implementation:

Example (Average Treatment Effect): Consider a setting where $\delta$ is just the conditional expectation operator for variable $V_n$ , i.e.:

$\begin{align*} \delta_B(M \mid \mathtt{do}(X = \bar{x})) &= \mathbb{E}[V_n \mid X = \bar{x}, \omega \in B] \end{align*}$

Under these conditions, if we let $\mathcal{B}$ be the entire $\sigma$ -algebra $\Omega$ , we get the standard average treatment effect:

$\begin{align*} \Delta^{\mathtt{ATE}} &= \mathbb{E}[V_n \mid X = \bar{x}_a] - \mathbb{E}[V_n \mid X = \bar{x}_o] \end{align*}$

Heckman and Vytlacil (2005) argue that analysts should think hard about whether or not the commonly used $\Delta^{\mathtt{ATE}}$ estimator is appropriate for their purposes. As an alternative, they suggest estimating marginal treatment effects:

Example (Marginal Treatment Effect): Consider a setting where $\delta$ is still the conditional expectation operator, but restrict $\mathcal{B}$ to be the subsets of the event space over which agents would be indifferent between the treatment and control assignments:

$\begin{align*} \Delta^{\mathtt{MTE}} &= \sum_{B \in \mathcal{B}} \left\{ \mathbb{E}_B[V_n \mid X = \bar{x}_a] - \mathbb{E}_B[V_n \mid X = \bar{x}_o] \right\} \cdot \tilde{p}(B) \end{align*}$

We can interpret $\Delta^{\mathtt{MTE}}$ as the mean gain change in outcomes for subjects who would be indifferent between treatment or not. Finally, consider transforming the rough explanation of the causal effect given in the introductory example into more formal language:

Example (Price Impact of Speculative Trading, Ctd…): Consider the example in the introduction concerning the potential destabilizing effects of speculative trading. The effect outlined in this introductory example can be written more formally as follows. Suppose that $V_n$ is the deviation of the observed price from the fundamental value in a market and the variable $X$ is binary representing the existence or absence of speculators in a market. Define $\delta$ as:

$\begin{align*} \delta_B &= \sqrt{\mathbb{V}_B[V_n \mid X = \bar{x}]} \end{align*}$

Then the estimator described heuristically in the introductory example would be given by $\Delta = \mathbb{E}[V_n \mid X = \bar{x}_a] - \mathbb{E}[V_n \mid X = \bar{x}_o]$ where $\mathcal{B}$ is the entire $\sigma$ -algebra $\Omega$ .

An Additional Example

Finally, with this new terminology in hand, I want to visit a second more complicated application of the front door criterion by Blanchard, Katz, Hall and Eichengreen (1992) to study the hypothesis that, at the state level, an increase in the immigration rate decreases the unemployment rate. Below, I describe a stylized version of their results:

Example (Immigration Choice and the Unemployment Rate): States with the lowest levels of unemployment tend to enjoy the largest immigrant populations. Is this relationship causal? In this example, I consider the alternative hypothesis that a $1\%$ increase in a state’s immigrant population makes its economy more efficient and lowers its unemployment rate against the null hypothesis that some omitted variable co-determines both a state’s immigrant population percentage and its unemployment rate. For instance, immigrants might rationally choose to move to the states with the highest labor demand.

The key insight to identifying the causal effect of a state’s population make up on its unemployment rate is to pin down the mechanism through which this effect flows. Specifically, observe that, if $\%$ changes in a state’s immigrant population have a causal effect on its unemployment rate, then this effect should be more pronounced in states where immigrant form a larger fraction of the population to start with. e.g., if there is a causal link, a $1\%$ change in the immigrant population of California ought to have a larger effect than the same $1\%$ in Iowa; whereas, if there is no causal link and some omitted variable is co-determining both the immigrant population percentage and the unemployment rate, this link to the absolute number of immigrants added need not exist. Indeed, this is roughly what Blanchard, Katz, Hall and Eichengreen (1992) as shown in figure 8 of the original paper.

Using the definition from above, we can now cast this identification strategy as instrumenting for an exogenous shift in the function $F_n$ which maps the immigrant population percentage to the unemployment rate. Let $\bar{f}_{\mathtt{CA}}$ be the function $F_n$ in California and $\bar{f}_{\mathtt{IA}}$ be the function $F_n$ in Iowa. Then, roughly speaking, I can write the causal estimator $\Delta^{\mathtt{FDC}}$ as:

$\begin{align*} \begin{split} \Delta^{\mathtt{FDC}} &= \left( \mathbb{E}[V_n \mid F_n = \bar{f}_{\mathtt{CA}}, X = \bar{x}_a] - \mathbb{E}[V_n \mid F_n = \bar{f}_{\mathtt{CA}}, X = \bar{x}_o] \right) \\ &\qquad - \left( \mathbb{E}[V_n \mid F_n = \bar{f}_{\mathtt{IA}}, X = \bar{x}_a] - \mathbb{E}[V_n \mid F_n = \bar{f}_{\mathtt{IA}}, X = \bar{x}_o] \right) \end{split} \end{align*}$

In words, $\Delta^{\mathtt{FDC}}$ captures how much more the unemployment rate $V_n$ shifts when the $\%$ change in a state’s immigrant population is large $X = \bar{x}_a$ versus when it is small $X = \bar{x}_0$ in the California regime as compared to the Iowa regime. Note that this is the exact same intuition as above in the social ties example.

3. The Front Door Criterion

In this section I link the front door criterion to a regression based estimation strategy.

Identifying Speculative Price Impact

Why is identifying the price impact of speculators hard? What is the basic inference problem here? Consider the following example, and ask yourself: “Is it right to conclude that speculators caused the massive run-up in prices?”

Example (Price Impact of Speculative Trading, Ctd…): Suppose that you look at the stock market, and you observe Amazon’s stock price sky-rocketing from around $1$ dollar at the beginning of the year to over $100$ dollars at the end of the year.

What’s more, after mulling it over, suppose that you also conduct a survey of all tech-stock traders and discover that a large fraction of them were speculators. When asked, they responded that they were buying the stock just to resell it later at a higher price and placed no weight on any dividend payment concerns. To many outside observers, this seems like an air-tight case that speculators are driving up the stock price of Amazon.com; however, it turns out that you can’t be so sure.

Speculators only show up when prices are out of whack. For instance, if I am a speculator making my living off of shorting over-priced assets, buying under-priced assets and riding waves of excessive prices changes I would never enter a market with correct prices. There would be no money to be made. As a speculator, I could well be having a stabilizing effect on prices. Thus, the core challenge is to determine whether or not speculators are showing up in response to mis-pricing or instead causing mis-pricing via their trading behavior. This is the argument that Milton Friedman put forth in his 1953 book, Essays on Positive Economics.

Price of Amazon.com, Inc. during the Dot-Com bubble on a log scale. Source: Yahoo Finance.

Regression Framework

To make this intuition a bit more concrete, I now walk through a simple numerical example.⁵ My goal in this section is to lay out the simplest possible model in which it is feasible to study the inference problem above.

Consider a world containing with $M>0$ markets in which prices $p_m \in \{0,1\}$ in each market $m \in 1,2, \ldots , M$ are either correct or too high. There are no shades of grey and prices can never be too low. What’s more, suppose that speculators $s_m \in \{0,1\}$ either abstain or enter the market. Thus, the price in market $p_m$ can be written as below where $(p_m^0,p_m^1)$ represent the $2$ counterfactual states of the world in which speculators either abstain or enter market $m$ :

$\begin{align*} p_m = p_m^0 \cdot (1-s_m) + p_m^1 \cdot s_m \end{align*}$

This formulation allows me to ask questions about counterfactuals. Even though in empirically observed data, I will only ever see either $p_m^0$ or $p_m^1$ , I want an econometric framework in which I can think about both these observations. For instance, I am interested in questions like: “Suppose speculators entered market $m$ but not market $m'$ . Would the prices in market $m$ have been the same as they are in market $m'$ if no speculators had entered?”

From here, I can derive an OLS specification which maps this binary inference framework into a regression specification:

Proposition: (OLS Regression) Let $P$ be a random variable representing the price in an arbitrary market, $S$ be a random variable representing the existence of speculators in an arbitrary market, and let the $({}^0,{}^1)$ superscripts denote the values of an economy which contains/does not contain speculators. Then an OLS regression has the components:

$\begin{align*} P = \mu^0 + (\mu^1 - \mu^0) \cdot S + \left( \nu^0 + (\nu^1 - \nu^0) \cdot S \right) \end{align*}$

where $\mathbb{E}P^j = \mu^j$ and $P^j - \mathbb{E}P^j = \nu^j$ .

This proposition says that an OLS regression has an intercept which is the expected price in a market that has not been treated with speculators and a slope which is the change in the expected price in a market due to treatment with speculators.

However, the most helpful part of this proposition is actually the factorization of the pricing error into $2$ components. The first component, $\nu^0$ , is the difference between the observed prices in untreated markets and the expected price in an untreated market. This difference will be non- $0$ if, for example, markets with correct prices are less likely to attract speculators. Conversely, the second component, $(\nu^1 - \nu^0) \cdot S$ , depends on whether or not the treated markets are differentially more likely to by over-priced relative to their expected levels.

Proof: (OLS Regression) To derive the formulation above, start with the simple decomposition:

$\begin{align*} P &= P^0 \cdot (1 - S) + P^1 \cdot S \\ &= P^0 + (P^1 - P^0) \cdot S \\ &= \mathbb{E}P^0 + (P^1 - P^0) \cdot S + (P^0 - \mathbb{E}P^0) \\ &= \mu^0 + (\mu^1 - \mu^0) \cdot S + \left\{ \nu^0 + (\nu^1 - \nu^0) \cdot S \right\} \end{align*}$

This decomposition is nice because it tells us exactly where the identification problem will show up in the naive OLS regression framework. If speculators are more likely to show up in markets with excessively high prices, we should expect to see a distorted $\nu^0 + (\nu^1 - \nu^0) \cdot S$ term. The standard way to get around this problem is to use an instrumental variables approach. However, instruments are hard to come by. In the section below, I show how to use a new approach to identify the price impact of adding speculators to a market.

Regression Estimate of $\Delta^{\mathtt{FDC}}$

The last section illustrated how a naive OLS regression specification will deliver biased estimates of the price impact of adding speculators to a financial market if the speculators endogenously choose which markets to enter. What’s more, the decomposition in Proposition 1 details exactly how this bias will manifest itself as a either a negative $\nu^0$ term or a positive $(\nu^1 - \nu^0) \cdot S$ term. In this section, I introduce a the front door criterion as a way to circumvent this identification problem and estimate this price impact.

Specifically, I show that the causal effect can be calculated as follows:

Proposition: (Causal Effect Estimator) The causal effect estimator using the front door criterion in a $2$ -state system with outcome variable $P$ , mechanism $Z$ and explanatory variable $S$ can be written as:

$\begin{align*} \begin{split} \Delta^{\mathtt{FDC}} &= \left\{ \mathbb{E}[P=1 \mid S=1, Z=1] - \mathbb{E}[P=1 \mid S=0, Z=1] \right\} \\ & - \left\{ \mathbb{E}[P=1 \mid S=1, Z=0] - \mathbb{E}[P=1 \mid S=0, Z=0] \right\} \end{split} \end{align*}$

In practical terms, what is this proposition saying? Well, suppose that I estimated $\Delta^{\mathtt{FDC}} = 0.10$ . This proposition would then read that the price in market $m$ in a world where $S=1$ is $10\%$ points higher than the price in the exact same market in a counterfactual world where $S=0$ . Before I can explain the proposition in more detail, I need to define the variable $Z$ :

Definition: (Mechanism) A variable $Z$ is a mechanism relative to the ordered pair of variables $(P,S)$ if 1) $S$ only affects $P$ through $Z$ , and 2) $\tilde{Z} = (Z - \mathtt{Proj}[Z \mid S])$ is independent of any confounding variables affecting both $S$ and $P$ .

$Z$ is called a “mechanism” because all of the affect the explanatory variable $S$ on the outcome variable $P$ travels through $Z$ . This is where the name front door criterion comes from as well. For example, I use exogenous variation in the amount of funds available to speculators as my mechanism where $Z \in \{0,1\}$ means that speculators either have little free cash or they have a ton of free cash. So, while speculators can still choose which markets to enter, they may or may not have sufficient funds to really affect the market equilibrium. Thus, $Z$ acts like an instrument for the intensity of a (…perhaps endogenously selected…) treatment effect.

To make these ideas more tangible, consider the fake data table below to confirm the estimator computation. This table lays out the $4$ different states of the world that we can possibly observe with respect to the pair of variables $(S,Z)$ as well as their relative frequency and the probability of observing over-pricing in each of these states.

$\begin{equation*} \begin{array}{cc|cc} s & z & \mathbb{E}[S = s,Z=z] & \mathbb{E}[P = 1 \mid S=s,Z=z] \\ \hline \hline 0 & 0 & 0.45 & 0.10 \\ 0 & 1 & 0.05 & 0.50 \\ 1 & 0 & 0.05 & 0.20 \\ 1 & 1 & 0.45 & 0.70 \end{array} \end{equation*}$

Not every channel is a valid mechanism though. $Z$ needs to have an additional property that an residual variation in $Z$ not explained by variation in $S$ is uncorrelated with any confounding variables.⁶ Let me make this idea clearer via an example of how $Z$ might violate this requirement. Consider some confounding variable $N$ that makes speculators enter a market and prices to over-heat. For instance, think of $N \in \{ 0,1\}$ as a dummy variable for whether or not the New York Times wrote a news article about a company⁷ For $Z$ to be an invalid mechanism, it would have to be the case that speculators tend to have unexpectedly more funds precisely when the New York Times is most likely to write an article about an industry; i.e., an invalid instrument would yield:

$\begin{align*} 0 \neq \mathbb{E} \left[ \ \left( Z - \mathtt{Proj}[Z \mid S] \right) \cdot N \ \right] \end{align*}$

Having I’ve outlined the basic elements of the proposition, I now give an intuitive explanation and refer any interested readers to Pearl (2000):

Intuition: (Causal Effect Estimator) Consider the first line. This difference captures the increase in the likelihood of over-pricing (i.e., $\mathbb{E}[P=1 \mid \cdot ]$ ) due to speculators entering a market (i.e., $S=1$ rather than $S=0$ ) when they have a lot of capital and should have a larger effect (i.e., $Z=1$ ). Now, consider the second line. This difference captures the increase in the likelihood of over-pricing (i.e., $\mathbb{E}[P=1 \mid \cdot ]$ ) due to speculators entering a market (i.e., $S=1$ rather than $S=0$ ) when they have do not have very much capital and should have a smaller effect (i.e., $Z=0$ ).

Now, suppose that the correlation between the existence of speculators in market $m$ and over-pricing in market $m$ is purely spurious and due to some confounding variable $N$ that drives up prices and speculator demand at the same time. In this world, we should expect to see the differences in both lines be the same. Opening and closing the nozzle on an unconnected hose should have no effect on the amount of water coming out of it.

On the other hand, suppose that the effect is not entirely spurious. Then unexpectedly giving speculators more funds will cause prices to rise relatively more leading to an estimate of $\Delta^{\mathtt{FDC}}>0$ .

Note that this approach does not rule out the possibility that speculators are still endogenously choosing to invest in over-priced markets to some degree. For instance, consider a world where:

$\begin{align*} 0 \neq \mathbb{E}[P=1 \mid S=1, Z=0] - \mathbb{E}[P=1 \mid S=0, Z=0] \end{align*}$

In this world, even where speculators can have little price impact, they are still good predictors of over-pricing. In this section, I show how to implement this estimator using a $2$ stage regression. This step is an immediate extension of the previous section and I give the main result below:

Proposition: ( $2$ -Stage Regression) Consider a system of variables $(P,S,Z)$ as outlined above. The following $2$ -stage regression procedure is an unbiased and consistent estimator of $\Delta^{\mathtt{FDC}}$ if $Z$ is a valid mechanism.

Stage $1$ :

$\begin{align*} Z &= \alpha + \beta \cdot S + \tilde{Z} \end{align*}$

Stage $2$ :

$\begin{align*} P &= \gamma + \Delta^{\mathtt{FDC}} \cdot \tilde{Z} + \varepsilon \end{align*}$

This result follows directly from reading the first differencing in a $2$ -state framework as a reduced form projection.

4. Conclusion

This identification strategy is new, and as a result there are a lot of places where using the front door criterion might yield new results for tough econometric problems. The key advantage is the flexibility to randomize the intensity of the treatment rather than the treatment assignment as in the standard IV framework.

I am currently working on a paper with Chris Mayer in which we use this identification strategy to parse the causal effect of introducing out of town second home buyers into a housing market on local house prices. In this setting we use relative city size rather than funding constraints as our mechanism. We find that air-dropping out of town speculators into a housing market causes house price appreciation. ↩
See Angrist and Pischke (2010). ↩
There are some settings such as development economics (e.g., See Banerjee and Duflo (2008).) where this natural experiment approach is feasible. However, for the majority of econometric questions this approach is difficult to implement. Much of the econometrics research is an effort to bridge this gap with various levels of success (e.g., see Lalonde (1986)). ↩
See Pearl (1995) and Pearl (2000). ↩
The analytical framework in this section comes from Ch. 3 in Morgan and Winship (2007) ↩
This is an admittedly a very ragged statement; for a more detailed treatment of this idea, read through Ch. 3 of Pearl (2000). For brevity’s sake, I trimmed much of my original discussion on the nuts and bolts of graphical models of causality. This is the cleanest way of looking at causal inference in my opinion and I really recommend this text. ↩
See Huberman and Rogev (2002) for a real world example in the pharmaceutical industry. ↩

Digesting the Hansen and Scheinkman Multiplicative Decomposition of the SDF

July 12, 2011 by Alex

Introduction¹

I give some intuition behind the multiplicative decomposition of the stochastic discount factor $M_{t \to t+h}$ introduced in Hansen and Scheinkman (2009). The economics underlying the original Hansen and Scheinkman (2009) results was not clear to me during my initial readings. This post collects my efforts to interpret these mathematical ideas in a sensible way.

Below I formally state the decomposition.

Theorem (Hansen and Scheinkman Decomposition): Suppose that $\phi_M$ is a principal eigenfunction with eigenvalue $\lambda_M$ for the extended generator of the stochastic discount factor $M$ . Then this multiplicative functional can be decomposed as:

$\begin{align*} M_{t \to t+h} \ &= \ e^{\lambda_M \cdot h} \cdot \left( \frac{\phi_M(X_t)}{\phi_M(X_{t+h})} \right) \cdot \hat{M}_{t \to t+h} \end{align*}$

where $\hat{M}_{t \to t+h}$ is a local martingale.

The stochastic discount factor $M_{t \to t+h}$ dictates how to discount cashflows occurring $h$ periods in the future in state $X_{t+h}$ . Roughly speaking, Hansen and Scheinkman (2009) factors $M_{t \to t+h}$ into $3$ different pieces: a state independent component $e^{\lambda_M \cdot h}$ , an investment horizon independent component $\phi_M(X_t)/\phi_M(X_{t+h})$ , and a white noise component $\hat{M}_{t \to t+h}$ .

Thus, you should think about $\lambda_M$ as a generalized time preference parameter. $\lambda_M$ will generally be negative, so $e^{\lambda_M \cdot h}$ is the continuous time representation of the state independent discount rate dictated by an asset pricing model. The ratio $\phi_M(X_t) / \phi_M(X_{t+h})$ captures the rate at which I discount payments at time $t+h$ given the state today at time $t$ and the state at time $t+h$ . This ratio is independent of $h$ meaning that if $X_{t+h} = X_{t+h'}$ , then for any $h$ and $h'$ we have:

$\begin{align*} \frac{\phi_M(X_t)}{\phi_M(X_{t+h})} \ &= \ \frac{\phi_M(X_t)}{\phi_M(X_{t+h'})} \end{align*}$

Finally, $\hat{M}_{t+h}$ represents a random noise component with $\mathbb{E}\hat{M}_{t+h} = 1$ and independent increments.

Motivation

The Hansen and Scheinkman decomposition generalizes the binomial options pricing framework for use in standard asset pricing applications by allowing for more complicated state space features like jumps and time averaging.² The main advantages of casting the stochastic discount factor as a multiplicative functional are $a)$ the use of the binomial pricing intuition to understand more complicated asset pricing models and $b)$ the streamlining of the econometrics needed to compare excess returns at different horizons.³

To illustrate the basic intuition behind this analogy, I work through the Black, Derman and Toy (1990) model.

Example (Binomial Model): Consider a discrete time, binomial world with states $X_t \in \{d,u\}, \ \forall t \geq 0$ in which traders have an independent probability $\pi(x)$ of entering state $x$ in the next period regardless of the current state. In this world, the price $P_{t \to t+1}$ at time $t$ of a risk free bond that pays out $1 at time $t+1$ is given by the expression:

$\begin{align*} P_{t \to t+1} \ &= \ \frac{\pi(u) \cdot 1 + \pi(d) \cdot 1}{1 + r^f_{t+1}} \end{align*}$

This $1$ step ahead pricing rule applies at each and every starting date $t$ . All pricing computations at longer horizons are built up from this local relationship based on the prevailing short rate $r_{t+1}^f$ .

To solve the model, I need to assume that the short rate $r_{t+1}^f$ process has independent log-normal increments. I could then use the volatility of this process to pin down the values of the short rate for the entire binomial tree.

In general, models of this sort are easy to solve analytically if the short rate process has log-normal increments. The recent papers Lettau and Wachter (2007), Van Binsbergen, Brandt and Koijen (2010) and Backus, Chernov and Zin (2011) adopt similar approaches and try to extend these insights to equity markets.

Nevertheless, most asset pricing models are not log-normal and will not suffer pen and paper analysis of their term structure using existing methods. Thus, in order to use cross-horizon predictions to discriminate between alternative models, we must adopt new mathematical tools.

Example (Binomial Model, Ctd…): We use operator methods to factor the discount factor process $M_{t \to t+h}$ which deflates payments in state $X_{t+h}$ at time horizon $t+h$ back to time $t$ into $3$ pieces, $e^{\lambda_M \cdot h}$ , $\tilde{\phi}_M(X_{t+h},X_t)$ and $\hat{M}_{t \to t+h}$ , where the first factor only depends on the investment horizon $h$ , the second factor only depends on the realized states and the third factor is noise, so that $M_{t \to t+h} = e^{\lambda_M \cdot h} \cdot \tilde{\phi}_M(X_{t+h},X_t) \cdot \hat{M}_{t \to t+h}$ .

By visual analogy to the Black, Derman and Toy (1990) model, in a binomial world we can use this decomposition to rewrite the $h=1$ Euler equation below where the dependence on $X_t$ is implicit:

$\begin{align*} 1 \ &= \ \mathbb{E}_t \left[ \ M_{t \to t+1} \cdot R_{t \to t+1} \ \right] \\ &= \ \frac{\pi(u) \cdot \tilde{\phi}_M(u) \cdot \varepsilon(u) \cdot R(u) + \pi(d) \cdot \tilde{\phi}_M( d) \cdot \varepsilon(u) \cdot R(d)}{1 - \lambda_M} \end{align*}$

Thus, in the Hansen and Scheinkman (2009) decomposition, $- \lambda_M$ serves as a synthetic risk free rate and the $\pi(x) \cdot \tilde{\phi}_M(x)$ serve as the twisted martingale measure.

In my work with Anmol Bhandari⁴ we look at a class of models for which $\ln \tilde{\phi}_M(x)$ is affine⁵ and show how to use this decomposition to compute a cross-horizon analogue to the Hansen and Jagannathan (1991) volatility bound. This new bound can be used to discriminate between different models which make identical predictions at a particular horizon. This exponentially affine structure is useful as it permits closed form solutions for the moments of $M_{t \to t+h}$ :

$\begin{align*} \mathbb{E}_t[M_{t \to t+h}] \ &\approx \ e^{\lambda_M \cdot h} \cdot \mathbb{E}_0 \left[ \frac{\phi_M(X_t)}{\phi_M(X_{t+h})} \right] \cdot 1 \\ \mathbb{E}_t[M_{t \to t+h}^2] \ &\approx \ e^{\lambda_{M^2} \cdot h} \cdot \mathbb{E}_0 \left[ \frac{\phi_{M^2}(X_t)}{\phi_{M^2}(X_{t+h})} \right] \cdot 1 \end{align*}$

In the next $2$ sections, I walk through the economics governing the $\lambda_M$ and $\phi_M$ terms.

Time Preference

Where does $\lambda_M$ come from? In the original article, the authors refer to $\lambda_M$ as the principle eigen-value of the extended generator of $M$ ; however, $\lambda_M$ has a well defined meaning without ever subscribing to Perron-Frobenius theory. $\lambda_M$ is a generalization of the time preference parameter dictated by an asset pricing model.

Consider the following thought experiment which casts the $\lambda_M$ term as the time preference parameter plus an extra Jensen inequality term.

Example (Generalized Time Preference): Suppose that an agent has preferences over a stream of consumption $C_1, C_2, C_3, ...$ and that for each period $t$ , $C_t = 100$ with probability $0.95$ and the remaining $5\%$ of the time $C_t = 50$ or $C_t = 150$ with equal probability. While $\mathbb{E}_t[C_{t+1}] = 100$ , the certainty equivalent is $\mathbb{E}_t^{c.e.}[C_{t+1}] < 100^{1-\gamma} = \mathbb{E}_t^*[C_{t+1}]$ .

In fact, with probability $0.05$ the agent will get a payout worth:

$\begin{align*} \mathbb{E}_t^{c.e.}[C_{t+1} \mid C_{t+1} \neq 100 ] \ &= \ \frac{50^{1-\gamma}}{2} + \frac{150^{1-\gamma}}{2} \end{align*}$

Let’s call this certainty equivelant gap $\delta$ :

$\begin{align*} \delta \ &= \ \mathbb{E}_t^{c.e.}[C_{t+1} \mid C_{t+1} \neq 100 ] \ - \ 100^{1-\gamma} \end{align*}$

$\lambda_M$ should then include both time preference, $\rho$ , and also the expected Jensen’s inequality loss:

$\begin{align*} \lambda_M \ &= \ \rho \ + \ 0.05 \cdot \delta \end{align*}$

Thus, in a more general framework, we should expect $\lambda_M$ to have roughly the following form:

$\begin{align*} \lambda_M \ &= \ \rho \ + \ f(\sigma_M^2, \sigma_X^2, \sigma_{M \times X}) \end{align*}$

where $f$ is an affine function. Heuristically, the $\sigma_X$ component will capture how volatile the state space is while the $\sigma_M$ component will capture how badly I need to discount this consumption stream due to Jensen’s inequality.

State Dependence

Next, in order to capture the dependence of the discount factor $M_{t \to t+h}$ on the current and future state $(X_t,X_{t+h})$ , Hansen and Scheinkman (2009) downshift to continuous time and apply the Perron-Frobenius theorem to the infinitesimal generator of the discount factor. When applied to the transition probability matrices, the Perron-Frobenius theory implies the largest eigen-pair dominates the behavior of a stochastic process as $h \to \infty$ . Hansen and Scheinkman use this $h \to \infty$ limiting result to argue that the ratio of $\phi_M(X_t)/\phi_M(X_{t+h})$ , the largest eigen-functions of the generator of the discount factor $M$ , is a good choice for the state dependent component of $M_{t \to t+h}$ .

It is important to note that Perron-Frobenius theory is only a modeling tool in the Hansen and Scheinkman (2009) construction, not a critical feature of their results. There may well be other reasonable choices for the state dependent component of $M_{t \to t+h}$ . In its simplest form⁶, the result can be written as:

Theorem (Perron-Frobenius): The largest eigen-value $\lambda$ of a positive square matrix $A$ is both simple and positive and belongs to a positive eigenvector $\phi$ . All other eigen-values are smaller in absolute value.⁷

In order to use this theorem, I need to have a positive square matrix to operate on. While strictly positive, $M_{t \to t+h}$ is not a square matrix; however, its infinitesimal generator is. Heuristically, you can think about the infinitesimal generator as encoding the transition probability matrix under the equivalent martingale measure deflated by the time preference parameter.

Definition (Infinitesimal Generator): The infinitesimal generator $\mathbb{A}$ of an Ito diffusion $\{ X_t \}$ in $\mathcal{R}^n$ is defined by:

$\begin{align*} \mathbb{A}[ f(x)] \ &= \ \lim_{h \searrow 0} \ \frac{\mathbb{E}_0[ f(X_h) ] - f(x)}{h}, \end{align*}$

where the set of functions $f: \mathcal{R}^n \mapsto \mathcal{R}$ such that the limit exists at $x$ is denoted by $\mathcal{D}_A(x)$ .

In words, the infinitesimal generator of the discount factor $M_{t \to t+h}$ captures how my valuation of a $1 payment in, say, the up state $u$ will change if I move the payment from $h=1$ period in the future to $h=2$ periods in the future. To get a feel for what the infinitesimal generator captures, consider the following short example using a $2$ state Markov chain. First, I define the physical transition intensity matrix for the Markov process $X_t$ .

Example (Markov Process w/ $2$ States): Consider a $2$ state Markov chain with states $X_t \in \{u,d\}$ . First, consider the physical evolution of the stochastic process $X_t$ which is governed by an $2 \times 2$ intensity matrix $\mathbb{T}$ . An intensity matrix encodes all of the transition probabilities. The matrix $e^{h \cdot \mathbb{T}}$ is the matrix of transition probabilities over a horizon $h$ . Since each row of the transition probability matrix $e^{h \cdot \mathbb{T}}$ must sum to $1$ , each row of the transition intensity matrix $\mathbb{T}$ must sum to $0$ .

$\begin{align*} \mathbb{T} \ &= \ \begin{bmatrix} \tau(u \mid u) & \tau(d \mid u) \\ \tau(u \mid d) & \tau(d \mid d) \end{bmatrix} \end{align*}$

The diagonal entries are nonpositive and represent minus the intensity of jumping from the current state to a new one. The remaining row entries, appropriately scaled, represent the conditional probabilities of jumping to the respective states. For concreteness, the following parameter values would be suffice:

$\begin{align*} \mathbb{T} \ &= \ \begin{bmatrix} -0.10 & 0.10 \\ 0.05 & -0.05 \end{bmatrix} \end{align*}$

Next, I want to show how to modify this transition intensity matrix $\mathbb{T}$ to describe the local evolution of the discount factor process $M_t$ . To do this, I first need to have an asset pricing model in mind, and I use a standard CRRA power utility model with risk aversion parameter $\gamma$ as in Breeden (1979) where $X_t$ is the log of the expected consumption growth.

Example (Markov Process w/ $2$ States, Ctd…): Intuitively, I know that every period I push the payment out into the future, I will end up discounting the payment by an additional $e^{\lambda_M}$ . However, I know that I will also have to twist $\mathbb{T}$ from the physical measure over to the risk neutral measure. Thus, the resulting generator will look something like:

$\begin{align*} \mathbb{A} \ &= \ \begin{bmatrix} \tau(u \mid u) \cdot \tilde{\phi}_M(u \mid u) & \tau(u \mid d) \cdot \tilde{\phi}_M(u \mid d) \\ \tau(d \mid u) \cdot \tilde{\phi}_M(d \mid u) & \tau(d \mid d) \cdot \tilde{\phi}_M(d \mid d) \end{bmatrix} \ - \ \lambda_M \end{align*}$

If we (correctly) assume that $\tilde{\phi}_M(s' \mid s) = 1$ , then we have:

$\begin{align*} \alpha(s' \mid s) \ &= \ \begin{cases} \tau(s' \mid s) - \lambda_M &\text{ if } s' = s \\ \tau(s' \mid s) \cdot \tilde{\phi}_M(s' \mid s) - \lambda_M &\text{ if } s' \neq s \end{cases} \end{align*}$

Note that the rows of $\mathbb{A}$ will in general not sum to $0$ as in the physical transition intensity matrix $T$ .

An Example

I conclude by working through an extended example showing how to solve for each of the terms in a simple model. Think about a Vasicek (1977) interest rate model. Let $X_t$ be a risk factor with the following scalar Ito diffusion. I choose this model so that I can verify all of my solutions by hand using existing techniques.

$\begin{align*} dX_t \ &= \ \beta_X(X_t) \cdot dt \ + \ \sigma_X(X_t) \cdot dB_t \\ \beta_X(x) \ &= \ \bar{\beta}_X \ - \ \beta_X \cdot x \\ \sigma_X(x) \ &= \ \sigma_X \end{align*}$

Let $M_t=\exp \{A_t\}$ and $A_t$ solves the following Ito diffusion.

$\begin{align*} dA_t \ &= \ \beta_A(X_t) \cdot dt \ + \ \sigma_A(X_t) \cdot dB_t \\ \beta_A(x) \ &= \ \bar{\beta}_A \ - \ \beta_A \cdot x \\ \sigma_A(x) \ &= \ \sigma_A \end{align*}$

Thus $(X_t,M_t)$ are described by parameter vector $\Theta$ :

$\begin{align*} \Theta \ &= \ \begin{bmatrix} \beta_X & \beta_A & \bar{\beta}_X & \bar{\beta}_A & \sigma_X & \sigma_A \end{bmatrix} \end{align*}$

We need to restrict $\Theta$ to ensure stationarity. Matching coefficients to ensure that $\lambda_M$ does not move with $x$ yields the following characterization of $\kappa_M$ .

$\begin{align*} \kappa_M \ &= \ - \ \frac{\beta_A}{\beta_X} \end{align*}$

Substituting back into the formula for $\lambda_M$ yields.

$\begin{align*} \begin{split} \lambda_M \ &= \ \left( \ \bar{\beta}_A \ + \ \frac{\sigma_A^2}{2} \ \right) \\ &\qquad \qquad + \ \left( \ \bar{\beta}_X \ + \ \sigma_A \cdot \sigma_X \ \right) \cdot \kappa_M \\ &\qquad \qquad \qquad + \ \left( \ \frac{\sigma_X^2}{2} \ \right) \cdot \kappa_M^2 \end{split} \end{align*}$

We know that $M_t^2 =\exp\{2 \cdot A_t\}$ .

$\begin{align*} \kappa_{M^2} \ &= \ - \ \frac{2 \cdot \beta_A}{\beta_X} \\ \lambda_{M^2} \ &= \ 2 \cdot \lambda_M \ + \ \sigma_A^2 \ + \ \left( \ \sigma_A \cdot \sigma_X \ \right) \cdot \kappa_{M^2} \ + \ \left( \ \frac{\sigma_X^2}{4} \ \right) \cdot \kappa_{M^2}^2 \end{align*}$

Exercise (Offsetting Shocks): If $\rho$ is the standard time preference parameter, when would $\lambda_M = \rho$ ?

Exercise (Stochastic Volatility): Think about a Feller square root term to allow for stochastic volatility a lá Cox, Ingersoll and Ross (1985) interest rate model.

$\begin{align*} dX_t \ &= \ \beta_X(X_t) \cdot dt \ + \ \sigma_X(X_t) \cdot dB_t \\ \beta_X(x) \ &= \ \bar{\beta}_X \ - \ \beta_X \cdot x \\ \sigma_X(x) \ &= \ \sigma_X \cdot \sqrt{x} \end{align*}$

$\begin{align*} dA_t \ &= \ \beta_A(X_t) \cdot dt \ + \ \sigma_A(X_t) \cdot dB_t \\ \beta_A(x) \ &= \ \bar{\beta}_A \ - \ \beta_A \cdot x \\ \sigma_A(x) \ &= \ \sigma_A \cdot \sqrt{x} \end{align*}$

What are $\kappa_M$ and $\lambda_M$ ?

Note: The results in this post stem from joint work I am conducting with Anmol Bhandari for our paper “Model Selection Using the Term Structure of Risk”. In this paper, we characterize the maximum Sharpe ratio allowed by an asset pricing model at each and every investment horizon. Using this cross-horizon bound, we develop a macro-finance model identification toolkit. ↩
e.g., think of the state space needed in the Campbell and Cochrane (1999) habit model. ↩
Investment horizon symmetry is an unexplored prediction of many asset pricing theory. Asset pricing models characterize how much a trader needs to be compensated in order to hold 1 unit of risk for 1 unit of time. The standard approach to testing these models is to fix the unit of time and then look for incorrectly priced packets of risk. e.g., Roll (1981) looked at the spread in 1 month holding period returns on 10 portfolios of NYSE firms sorted by market cap and found that small firms earned abnormal excess returns relative to the CAPM. Yet, I could just as easily ask the question: Given a model, how much more does a trader need to be compensated for her to hold the same 1 unit of risk for an extra 1 unit of time? This inversion is well defined as asset pricing models possess investment horizon symmetry. Models hold at each and every investment horizon running from $1$ second to 1 year to 1 century and everywhere in between. To illustrate this point via an absurd case, John Cochrane writes in his textbook (Asset Pricing (2005), Section 9.3.) that according to the consumption CAPM ‘…if stocks go up between 12:00 and 1:00, it must be because (on average) we all decided to have a big lunch.’ ↩
See Model Selection Using the Term Structure of Risk. ↩
This class of models allows for features such as rare disasters, recursive preferences and habit formation among others… ↩
Really, this is just the Oskar Perron version of the theorem. ↩
For an introduction to Perron-Frobenius theory, see MacCluer (2000). ↩

Plotting Geographic Densities in R

July 11, 2011 by Alex

I show how (here) to create a heat map of the intensity of home purchases from 2000 to 2008 in Los Angeles County, CA using a random sample of 5000observations from the county deeds records. I build off of the code created by David Kahle for Hadley Wickham‘s GGPlot2 Case study competition. I use the results of the geocoding procedure that I outline here as the input data.

How to Geocode Addresses Using the Yahoo! PlaceFinder API

July 11, 2011 by Alex

This post contains a link (here) to a python program which geocodes a large number of addresses using the Yahoo! PlaceFinder API. This program manages both the use of the API IDs as well as which files have been completed. The code can also be easily parallelized. The code makes use of earlier work I had done in R to accomplish the same task.

Random Effects Decomposition

June 27, 2011 by Alex

Motivation

I work through the error components econometric model outlined in Amemiya (1985). I use Hayashi (2000) as a reference text. I work through this example because I use this model in my working paper with Chris Mayer on bubble identification and I would like to work out the details as I didn’t spend much time on these sorts of models in my core econometrics courses.

In my paper with Chris, I develop a method of identifying relative mispricings between city specific markets in the US residential housing market using flows of speculative buyers between cities and assuming that city sizes are exogenous. Previously, analysts suspected that the housing bubble was due to credit supply factors. I use a random effects model to gauge the relative importance of $1)$ aggregate credit supply factors and $2)$ cross-city speculator flows in explaining mis-pricing in the housing market in our sample.

Econometric Framework

I characterize the random effects error components estimator outlined in Amemiya (1985, Ch. 6). Consider a balanced panel with $N$ panels and $T$ observations per panel. I study a regression specification of the following type:

(1) $\begin{align*} y_{n,t} \ &= \ \langle X_{n,t} \mid \beta \rangle \ + \ \mu_n \ + \ \lambda_t \ + \ \varepsilon_{n,t} \end{align*}$

I can vectorize this specification by stacking each of these $N \times T$ equations:

(2) $\begin{align*} \begin{split} \mathcal{U} \ &= \ \langle I_N \otimes 1_T \mid \mu \rangle \ + \ \langle 1_N \otimes I_T \mid \lambda \rangle \ + \ \mathcal{E} \\ Y \ &= \ \langle X \mid \beta \rangle \ + \ \mathcal{U} \end{split} \end{align*}$

Assumptions

I make the following assumptions about the shape of the errors:

Assumption: (Error Structure) I assume that:

1) Unbiased-ness: $\langle \mu_n \rangle = 0$ , $\langle \lambda_t \rangle = 0$ and $\langle \varepsilon_{n,t} \rangle = 0$

2) White-Noise: $\langle \mu_n \mid \lambda_t \rangle = 0$ , $\langle \lambda_t \mid \varepsilon_{n,t} \rangle = 0$ and $\langle \varepsilon_{n,t} \mid \mu_n \rangle = 0$

3) Homoskedasticity: $\vert \mu \rangle \langle \mu \vert = I_N \cdot \sigma^2_\mu$ , $\vert \lambda \rangle \langle \lambda \vert = I_T \cdot \sigma^2_\lambda$ and $\vert \varepsilon \rangle \langle \varepsilon \vert = I_{N \times T} \cdot \sigma^2_{\varepsilon}$

What are the key take-aways from these assumptions? First, assumption $1)$ means that there is a constant term in the explanatory $X$ variables. Assumption $2)$ is just the standard white noise assumption. Assumption $3)$ is the key restriction. This assumption says that the within and between effects are independent across time and panels respectively. The estimator I define below allows me to learn the values of $\sigma_\mu^2$ , $\sigma_\lambda^2$ and $\sigma_\varepsilon^2$ .

Estimation

How do I go about estimating these $3$ objects? First, I define some notation to make my life a bit easier and stave of carpel tunnel for a few more semesters:

(3) $\begin{align*} \begin{split} F \ &= \ \vert I_N \otimes 1_T \rangle \langle I_N \otimes 1_T \vert \\ G \ &= \ \vert 1_N \otimes I_T \rangle \langle 1_N \otimes I_T \vert \end{split} \end{align*}$

Also, let $H$ be an $(N \cdot T) \cdot (N \cdot T - N - T + 1)$ unit matrix. I name the error covariance matrix $\Omega$ , and then characterize it as a linear function of the $3$ variance terms of interest:

(4) $\begin{align*} \begin{split} \Omega \ &= \ \vert \mathcal{U} \rangle \langle \mathcal{U} \vert \\ &= \ \sigma_\mu^2 \cdot F \ + \ \sigma^2_\lambda \cdot G \ + \ \sigma_\varepsilon^2 \cdot I_{N \times T} \end{split} \end{align*}$

I can write out the inverse of the error covariance matix $\Omega$ as follows:

(5) $\begin{align*} \begin{split} \Omega^{-1} \ &= \ \frac{1}{\sigma_\varepsilon^2} \cdot \left( I_{N \times T} - \gamma_1 \cdot F + \gamma_2 \cdot G + \gamma_3 \cdot H \right) \\ \gamma_1 \ &= \ \frac{\sigma_\mu^2}{\sigma_\varepsilon^2 + T \cdot \sigma_\mu^2} \\ \gamma_2 \ &= \ \frac{\sigma_\lambda^2}{\sigma_\varepsilon^2 + N \cdot \sigma_\lambda^2} \\ \gamma_3 \ &= \ \gamma_1 \cdot \gamma_2 \cdot \left( \ \frac{2 \cdot \sigma_\varepsilon^2 + T \cdot \sigma_\mu^2 + N \cdot \sigma_\lambda^2}{\sigma_\varepsilon^2 + T \cdot \sigma_\mu^2 + N \cdot \sigma_\lambda^2} \ \right) \end{split} \end{align*}$

This formulation shows that the sample error covariance matrix will provide unbiased and consistent estimates if both $N \to \infty$ and $T \to \infty$ . In this not, I am not going to worry about what is the most consistent estimator for the parameters. Next, I want to decompose the error covariance matrix into within, between and indiosyncratic components. To do this I need $1$ last piece of notation:

(6) $\begin{align*} Q \ &= \ I \ - \ \frac{F}{T} \ - \ \frac{G}{N} \ + \ \frac{H}{N \cdot T} \end{align*}$

Think about this as an orthogonal decomposition of a unitary error covariance matrix into each of the $3$ components: within, between and idiosyncratic. Then, using this term, Amemiya (1971) shows that the following estimators for the parameter vector $\begin{bmatrix} \sigma_\mu^2 & \sigma_\lambda^2 & \sigma_\varepsilon^2 \end{bmatrix}$ :

(7) $\begin{align*} \begin{split} \hat{\mathcal{U}} \ &= \ Y \ - \ \langle X \mid \hat{\beta} \rangle \\ \hat{\sigma}_{\varepsilon}^2 \ &= \ \frac{\langle \hat{\mathcal{U}} \mid \langle Q \mid \hat{\mathcal{U}} \rangle \rangle}{(N-1) \cdot (T-1)} \\ \hat{\sigma}_{\mu}^2 \ &= \ \frac{\langle \hat{\mathcal{U}} \mid \langle \frac{T-1}{T} \cdot F - \frac{T-1}{N \cdot T} \cdot H - Q \mid \hat{\mathcal{U}} \rangle \rangle}{T \cdot (N-1) \cdot (T-1)} \\ \hat{\sigma}_{\lambda}^2 \ &= \ \frac{\langle \hat{\mathcal{U}} \mid \langle \frac{N-1}{N} \cdot G - \frac{N-1}{N \cdot T} \cdot H - Q \mid \hat{\mathcal{U}} \rangle \rangle}{T \cdot (N-1) \cdot (T-1)} \end{split} \end{align*}$

« Previous Page

1. Introduction1

Motivation

Outline

2. Terminology and Examples

An Introductory Example

Some Terminology

An Additional Example

3. The Front Door Criterion

Identifying Speculative Price Impact

Regression Framework

Regression Estimate of

4. Conclusion

Introduction1

Motivation

Time Preference

State Dependence

An Example

Motivation

Assumptions

Estimation

1. Introduction¹

Regression Estimate of $\Delta^{\mathtt{FDC}}$

Introduction¹