Research Notebook

Digesting the Hansen and Scheinkman Multiplicative Decomposition of the SDF

July 12, 2011 by Alex

Introduction1

I give some intuition behind the multiplicative decomposition of the stochastic discount factor M_{t \to t+h} introduced in Hansen and Scheinkman (2009). The economics underlying the original Hansen and Scheinkman (2009) results was not clear to me during my initial readings. This post collects my efforts to interpret these mathematical ideas in a sensible way.

Below I formally state the decomposition.

Theorem (Hansen and Scheinkman Decomposition): Suppose that \phi_M is a principal eigenfunction with eigenvalue \lambda_M for the extended generator of the stochastic discount factor M. Then this multiplicative functional can be decomposed as:

    \begin{align*} M_{t \to t+h} \ &= \ e^{\lambda_M \cdot h} \cdot \left( \frac{\phi_M(X_t)}{\phi_M(X_{t+h})} \right) \cdot \hat{M}_{t \to t+h} \end{align*}

where \hat{M}_{t \to t+h} is a local martingale.

 

The stochastic discount factor M_{t \to t+h} dictates how to discount cashflows occurring h periods in the future in state X_{t+h}. Roughly speaking, Hansen and Scheinkman (2009) factors M_{t \to t+h} into 3 different pieces: a state independent component e^{\lambda_M \cdot h}, an investment horizon independent component \phi_M(X_t)/\phi_M(X_{t+h}), and a white noise component \hat{M}_{t \to t+h}.

Thus, you should think about \lambda_M as a generalized time preference parameter. \lambda_M will generally be negative, so e^{\lambda_M \cdot h} is the continuous time representation of the state independent discount rate dictated by an asset pricing model. The ratio \phi_M(X_t) / \phi_M(X_{t+h}) captures the rate at which I discount payments at time t+h given the state today at time t and the state at time t+h. This ratio is independent of h meaning that if X_{t+h} = X_{t+h'}, then for any h and h' we have:

    \begin{align*} \frac{\phi_M(X_t)}{\phi_M(X_{t+h})} \ &= \ \frac{\phi_M(X_t)}{\phi_M(X_{t+h'})} \end{align*}

Finally, \hat{M}_{t+h} represents a random noise component with \mathbb{E}\hat{M}_{t+h} = 1 and independent increments.

 

Motivation

The Hansen and Scheinkman decomposition generalizes the binomial options pricing framework for use in standard asset pricing applications by allowing for more complicated state space features like jumps and time averaging.2 The main advantages of casting the stochastic discount factor as a multiplicative functional are a) the use of the binomial pricing intuition to understand more complicated asset pricing models and b) the streamlining of the econometrics needed to compare excess returns at different horizons.3

To illustrate the basic intuition behind this analogy, I work through the Black, Derman and Toy (1990) model.

Example (Binomial Model): Consider a discrete time, binomial world with states X_t \in \{d,u\}, \ \forall t \geq 0 in which traders have an independent probability \pi(x) of entering state x in the next period regardless of the current state. In this world, the price P_{t \to t+1} at time t of a risk free bond that pays out $1 at time t+1 is given by the expression:

    \begin{align*} P_{t \to t+1} \ &= \ \frac{\pi(u) \cdot 1 + \pi(d) \cdot 1}{1 + r^f_{t+1}} \end{align*}

This 1 step ahead pricing rule applies at each and every starting date t. All pricing computations at longer horizons are built up from this local relationship based on the prevailing short rate r_{t+1}^f.

To solve the model, I need to assume that the short rate r_{t+1}^f process has independent log-normal increments. I could then use the volatility of this process to pin down the values of the short rate for the entire binomial tree.

 

In general, models of this sort are easy to solve analytically if the short rate process has log-normal increments. The recent papers Lettau and Wachter (2007), Van Binsbergen, Brandt and Koijen (2010) and Backus, Chernov and Zin (2011) adopt similar approaches and try to extend these insights to equity markets.

Nevertheless, most asset pricing models are not log-normal and will not suffer pen and paper analysis of their term structure using existing methods. Thus, in order to use cross-horizon predictions to discriminate between alternative models, we must adopt new mathematical tools.

 Example (Binomial Model, Ctd…): We use operator methods to factor the discount factor process M_{t \to t+h} which deflates payments in state X_{t+h} at time horizon t+h back to time t into 3 pieces, e^{\lambda_M \cdot h}, \tilde{\phi}_M(X_{t+h},X_t) and \hat{M}_{t \to t+h}, where the first factor only depends on the investment horizon h, the second factor only depends on the realized states and the third factor is noise, so that M_{t \to t+h} = e^{\lambda_M \cdot h} \cdot \tilde{\phi}_M(X_{t+h},X_t) \cdot \hat{M}_{t \to t+h}.

By visual analogy to the Black, Derman and Toy (1990) model, in a binomial world we can use this decomposition to rewrite the h=1 Euler equation below where the dependence on X_t is implicit:

    \begin{align*} 1 \ &= \ \mathbb{E}_t \left[ \ M_{t \to t+1} \cdot R_{t \to t+1} \ \right] \\ &= \ \frac{\pi(u) \cdot \tilde{\phi}_M(u) \cdot \varepsilon(u) \cdot R(u) + \pi(d) \cdot \tilde{\phi}_M( d) \cdot \varepsilon(u) \cdot R(d)}{1 - \lambda_M} \end{align*}

 

Thus, in the Hansen and Scheinkman (2009) decomposition, - \lambda_M serves as a synthetic risk free rate and the \pi(x) \cdot \tilde{\phi}_M(x) serve as the twisted martingale measure.

In my work with  Anmol Bhandari4 we look at a class of models for which \ln \tilde{\phi}_M(x) is affine5 and show how to use this decomposition to compute a cross-horizon analogue to the Hansen and Jagannathan (1991) volatility bound. This new bound can be used to discriminate between different models which make identical predictions at a particular horizon. This exponentially affine structure is useful as it permits closed form solutions for the moments of M_{t \to t+h}:

    \begin{align*} \mathbb{E}_t[M_{t \to t+h}] \ &\approx \ e^{\lambda_M \cdot h} \cdot \mathbb{E}_0 \left[ \frac{\phi_M(X_t)}{\phi_M(X_{t+h})} \right] \cdot 1 \\ \mathbb{E}_t[M_{t \to t+h}^2] \ &\approx \ e^{\lambda_{M^2} \cdot h} \cdot \mathbb{E}_0 \left[ \frac{\phi_{M^2}(X_t)}{\phi_{M^2}(X_{t+h})} \right] \cdot 1 \end{align*}

In the next 2 sections, I walk through the economics governing the \lambda_M and \phi_M terms.

 

Time Preference

Where does \lambda_M come from? In the original article, the authors refer to \lambda_M as the principle eigen-value of the extended generator of M; however, \lambda_M has a well defined meaning without ever subscribing to Perron-Frobenius theory. \lambda_M is a generalization of the time preference parameter dictated by an asset pricing model.

Consider the following thought experiment which casts the \lambda_M term as the time preference parameter plus an extra Jensen inequality term.

Example (Generalized Time Preference): Suppose that an agent has preferences over a stream of consumption C_1, C_2, C_3, ... and that for each period t, C_t = 100 with probability 0.95 and the remaining 5\% of the time C_t = 50 or C_t = 150 with equal probability. While \mathbb{E}_t[C_{t+1}] = 100, the certainty equivalent is \mathbb{E}_t^{c.e.}[C_{t+1}] < 100^{1-\gamma} = \mathbb{E}_t^*[C_{t+1}].

In fact, with probability 0.05 the agent will get a payout worth:

    \begin{align*} \mathbb{E}_t^{c.e.}[C_{t+1} \mid C_{t+1} \neq 100 ] \ &= \ \frac{50^{1-\gamma}}{2} + \frac{150^{1-\gamma}}{2} \end{align*}

Let’s call this certainty equivelant gap \delta:

    \begin{align*} \delta \ &= \ \mathbb{E}_t^{c.e.}[C_{t+1} \mid C_{t+1} \neq 100 ] \ - \ 100^{1-\gamma} \end{align*}

\lambda_M should then include both time preference, \rho, and also the expected Jensen’s inequality loss:

    \begin{align*} \lambda_M \ &= \ \rho \ + \ 0.05 \cdot \delta \end{align*}

 

Thus, in a more general framework, we should expect \lambda_M to have roughly the following form:

    \begin{align*} \lambda_M \ &= \ \rho \ + \ f(\sigma_M^2, \sigma_X^2, \sigma_{M \times X}) \end{align*}

where f is an affine function. Heuristically, the \sigma_X component will capture how volatile the state space is while the \sigma_M component will capture how badly I need to discount this consumption stream due to Jensen’s inequality.

 

State Dependence

Next, in order to capture the dependence of the discount factor M_{t \to t+h} on the current and future state (X_t,X_{t+h}), Hansen and Scheinkman (2009) downshift to continuous time and apply the Perron-Frobenius theorem to the infinitesimal generator of the discount factor. When applied to the transition probability matrices, the Perron-Frobenius theory implies the largest eigen-pair dominates the behavior of a stochastic process as h \to \infty. Hansen and Scheinkman use this h \to \infty limiting result to argue that the ratio of \phi_M(X_t)/\phi_M(X_{t+h}), the largest eigen-functions of the generator of the discount factor M, is a good choice for the state dependent component of M_{t \to t+h}.

It is important to note that Perron-Frobenius theory is only a modeling tool in the Hansen and Scheinkman (2009) construction, not a critical feature of their results. There may well be other reasonable choices for the state dependent component of M_{t \to t+h}. In its simplest form6, the result can be written as:

Theorem (Perron-Frobenius): The largest eigen-value \lambda of a positive square matrix A is both simple and positive and belongs to a positive eigenvector \phi. All other eigen-values are smaller in absolute value.7

 

In order to use this theorem, I need to have a positive square matrix to operate on. While strictly positive, M_{t \to t+h} is not a square matrix; however, its infinitesimal generator is. Heuristically, you can think about the infinitesimal generator as encoding the transition probability matrix under the equivalent martingale measure deflated by the time preference parameter.

Definition (Infinitesimal Generator): The infinitesimal generator \mathbb{A} of an Ito diffusion \{ X_t \} in \mathcal{R}^n is defined by:

    \begin{align*} \mathbb{A}[ f(x)] \ &= \ \lim_{h \searrow 0} \ \frac{\mathbb{E}_0[ f(X_h) ] - f(x)}{h}, \end{align*}

where the set of functions f: \mathcal{R}^n \mapsto \mathcal{R} such that the limit exists at x is denoted by \mathcal{D}_A(x).

 

In words, the infinitesimal generator of the discount factor M_{t \to t+h} captures how my valuation of a $1 payment in, say, the up state u will change if I move the payment from h=1 period in the future to h=2 periods in the future. To get a feel for what the infinitesimal generator captures, consider the following short example using a 2 state Markov chain. First, I define the physical transition intensity matrix for the Markov process X_t.

Example (Markov Process w/ 2 States): Consider a 2 state Markov chain with states X_t \in \{u,d\}. First, consider the physical evolution of the stochastic process X_t which is governed by an 2 \times 2 intensity matrix \mathbb{T}. An intensity matrix encodes all of the transition probabilities. The matrix e^{h \cdot \mathbb{T}} is the matrix of transition probabilities over a horizon h. Since each row of the transition probability matrix e^{h \cdot \mathbb{T}} must sum to 1, each row of  the transition intensity matrix \mathbb{T} must sum to 0.

    \begin{align*} \mathbb{T} \ &= \ \begin{bmatrix} \tau(u \mid u) & \tau(d \mid u) \\ \tau(u \mid d) & \tau(d \mid d) \end{bmatrix} \end{align*}

The diagonal entries are nonpositive and represent minus the intensity of jumping from the current state to a new one. The remaining row entries, appropriately scaled, represent the conditional probabilities of jumping to the respective states. For concreteness, the following parameter values would be suffice:

    \begin{align*} \mathbb{T} \ &= \ \begin{bmatrix} -0.10 & 0.10 \\ 0.05 & -0.05 \end{bmatrix} \end{align*}

 

Next, I want to show how to modify this transition intensity matrix \mathbb{T} to describe the local evolution of the discount factor process M_t. To do this, I first need to have an asset pricing model in mind, and I use a standard CRRA power utility model with risk aversion parameter \gamma as in Breeden (1979) where X_t is the log of the expected consumption growth.

Example (Markov Process w/ 2 States, Ctd…): Intuitively, I know that every period I push the payment out into the future, I will end up discounting the payment by an additional e^{\lambda_M}. However, I know that I will also have to twist \mathbb{T} from the physical measure over to the risk neutral measure. Thus, the resulting generator will look something like:

    \begin{align*} \mathbb{A} \ &= \ \begin{bmatrix} \tau(u \mid u) \cdot \tilde{\phi}_M(u \mid u) & \tau(u \mid d) \cdot \tilde{\phi}_M(u \mid d) \\ \tau(d \mid u) \cdot \tilde{\phi}_M(d \mid u) & \tau(d \mid d) \cdot \tilde{\phi}_M(d \mid d) \end{bmatrix} \ - \ \lambda_M \end{align*}

If we (correctly) assume that \tilde{\phi}_M(s' \mid s) = 1, then we have:

    \begin{align*} \alpha(s' \mid s) \ &= \ \begin{cases} \tau(s' \mid s) - \lambda_M &\text{ if } s' = s \\ \tau(s' \mid s) \cdot \tilde{\phi}_M(s' \mid s) - \lambda_M &\text{ if } s' \neq s \end{cases} \end{align*}

Note that the rows of \mathbb{A} will in general not sum to 0 as in the physical transition intensity matrix T.

 

An Example

I conclude by working through an extended example showing how to solve for each of the terms in a simple model. Think about a Vasicek (1977) interest rate model. Let X_t be a risk factor with the following scalar Ito diffusion. I choose this model so that I can verify all of my solutions by hand using existing techniques.

    \begin{align*} dX_t \ &= \ \beta_X(X_t) \cdot dt \ + \ \sigma_X(X_t) \cdot dB_t \\ \beta_X(x) \ &= \ \bar{\beta}_X \ - \ \beta_X \cdot x \\ \sigma_X(x) \ &= \ \sigma_X \end{align*}

Let M_t=\exp \{A_t\} and A_t solves the following Ito diffusion.

    \begin{align*} dA_t \ &= \ \beta_A(X_t) \cdot dt \ + \ \sigma_A(X_t) \cdot dB_t \\ \beta_A(x) \ &= \ \bar{\beta}_A \ - \ \beta_A \cdot x \\ \sigma_A(x) \ &= \ \sigma_A \end{align*}

Thus (X_t,M_t) are described by parameter vector \Theta:

    \begin{align*} \Theta \ &= \ \begin{bmatrix} \beta_X & \beta_A & \bar{\beta}_X & \bar{\beta}_A & \sigma_X & \sigma_A \end{bmatrix} \end{align*}

We need to restrict \Theta to ensure stationarity. Matching coefficients to ensure that \lambda_M does not move with x yields the following characterization of \kappa_M.

    \begin{align*} \kappa_M \ &= \ - \ \frac{\beta_A}{\beta_X} \end{align*}

Substituting back into the formula for \lambda_M yields.

    \begin{align*} \begin{split} \lambda_M \ &= \ \left( \ \bar{\beta}_A \ + \ \frac{\sigma_A^2}{2} \ \right) \\ &\qquad \qquad + \ \left( \ \bar{\beta}_X \ + \ \sigma_A \cdot \sigma_X \ \right) \cdot \kappa_M \\ &\qquad \qquad \qquad  + \ \left( \ \frac{\sigma_X^2}{2} \ \right) \cdot \kappa_M^2 \end{split} \end{align*}

We know that M_t^2 =\exp\{2 \cdot A_t\}.

    \begin{align*} \kappa_{M^2} \ &= \ - \ \frac{2 \cdot \beta_A}{\beta_X} \\ \lambda_{M^2} \ &= \ 2 \cdot \lambda_M \ + \ \sigma_A^2 \ + \ \left( \ \sigma_A \cdot \sigma_X \ \right) \cdot \kappa_{M^2} \ + \ \left( \ \frac{\sigma_X^2}{4} \ \right) \cdot \kappa_{M^2}^2 \end{align*}

Exercise (Offsetting Shocks): If \rho is the standard time preference parameter, when would \lambda_M = \rho?

Exercise (Stochastic Volatility): Think about a Feller square root term to allow for stochastic volatility a lá Cox, Ingersoll and Ross (1985) interest rate model.

    \begin{align*} dX_t \ &= \ \beta_X(X_t) \cdot dt \ + \ \sigma_X(X_t) \cdot dB_t \\ \beta_X(x) \ &= \ \bar{\beta}_X \ - \ \beta_X \cdot x \\ \sigma_X(x) \ &= \ \sigma_X \cdot \sqrt{x} \end{align*}

    \begin{align*} dA_t \ &= \ \beta_A(X_t) \cdot dt \ + \ \sigma_A(X_t) \cdot dB_t \\ \beta_A(x) \ &= \ \bar{\beta}_A \ - \ \beta_A \cdot x \\ \sigma_A(x) \ &= \ \sigma_A \cdot \sqrt{x} \end{align*}

What are \kappa_M and \lambda_M?

  1. Note: The results in this post stem from joint work I am conducting with Anmol Bhandari for our paper “Model Selection Using the Term Structure of Risk”. In this paper, we characterize the maximum Sharpe ratio allowed by an asset pricing model at each and every investment horizon. Using this cross-horizon bound, we develop a macro-finance model identification toolkit. ↩
  2. e.g., think of the state space needed in the Campbell and Cochrane (1999) habit model. ↩
  3. Investment horizon symmetry is an unexplored prediction of many asset pricing theory. Asset pricing models characterize how much a trader needs to be compensated in order to hold 1 unit of risk for 1 unit of time. The standard approach to testing these models is to fix the unit of time and then look for incorrectly priced packets of risk. e.g., Roll (1981) looked at the spread in 1 month holding period returns on 10 portfolios of NYSE firms sorted by market cap and found that small firms earned abnormal excess returns relative to the CAPM. Yet, I could just as easily ask the question: Given a model, how much more does a trader need to be compensated for her to hold the same 1 unit of risk for an extra 1 unit of time? This inversion is well defined as asset pricing models possess investment horizon symmetry. Models hold at each and every investment horizon running from 1 second to 1 year to 1 century and everywhere in between. To illustrate this point via an absurd case, John Cochrane writes in his textbook (Asset Pricing (2005), Section 9.3.) that according to the consumption CAPM ‘…if stocks go up between 12:00 and 1:00, it must be because (on average) we all decided to have a big lunch.’ ↩
  4. See Model Selection Using the Term Structure of Risk. ↩
  5. This class of models allows for features such as rare disasters, recursive preferences and habit formation among others… ↩
  6. Really, this is just the Oskar Perron version of the theorem. ↩
  7. For an introduction to Perron-Frobenius theory, see MacCluer (2000). ↩

Filed Under: Uncategorized

Plotting Geographic Densities in R

July 11, 2011 by Alex

I show how (here) to create a heat map of the intensity of home purchases from 2000 to 2008 in Los Angeles County, CA using a random sample of 5000observations from the county deeds records. I build off of the code created by David Kahle for Hadley Wickham‘s GGPlot2 Case study competition. I use the results of the geocoding procedure that I outline here as the input data.

Filed Under: Uncategorized

How to Geocode Addresses Using the Yahoo! PlaceFinder API

July 11, 2011 by Alex

This post contains a link (here) to a python program which geocodes a large number of addresses using the Yahoo! PlaceFinder API. This program manages both the use of the API IDs as well as which files have been completed. The code can also be easily parallelized. The code makes use of earlier work I had done in R to accomplish the same task.

Filed Under: Uncategorized

Random Effects Decomposition

June 27, 2011 by Alex

Motivation

I work through the error components econometric model outlined in Amemiya (1985). I use Hayashi (2000) as a reference text. I work through this example because I use this model in my working paper with Chris Mayer on bubble identification and I would like to  work out the details as I didn’t spend much time on these sorts of  models in my core econometrics courses.

In my paper with Chris, I develop a method of identifying relative  mispricings between city specific markets in the US residential  housing market using flows of speculative buyers between cities and  assuming that city sizes are exogenous. Previously, analysts  suspected that the housing bubble was due to credit supply  factors. I use a random effects model to gauge the relative  importance of 1) aggregate credit supply factors and 2)  cross-city speculator flows in explaining mis-pricing in the housing  market in our sample.

 

Econometric Framework

I characterize the random effects error components  estimator outlined in Amemiya (1985, Ch. 6). Consider a balanced panel with N panels and T observations per  panel. I study a regression specification of the following type:

(1)   \begin{align*} y_{n,t} \ &= \ \langle X_{n,t} \mid \beta \rangle \ + \ \mu_n \ + \ \lambda_t \ + \ \varepsilon_{n,t} \end{align*}

 

I can vectorize this specification by stacking each of these N  \times T equations:

(2)   \begin{align*} \begin{split} \mathcal{U} \ &= \ \langle I_N \otimes 1_T \mid \mu \rangle \ + \ \langle 1_N \otimes I_T \mid \lambda \rangle \ + \ \mathcal{E} \\ Y \ &= \ \langle X \mid \beta \rangle \ + \ \mathcal{U} \end{split} \end{align*}

 

Assumptions

I make the following assumptions about the shape of the errors:

Assumption: (Error Structure) I assume that:

1) Unbiased-ness: \langle \mu_n \rangle = 0, \langle \lambda_t \rangle = 0 and \langle \varepsilon_{n,t} \rangle = 0

2) White-Noise: \langle \mu_n \mid \lambda_t \rangle = 0,  \langle \lambda_t \mid \varepsilon_{n,t} \rangle = 0 and \langle \varepsilon_{n,t} \mid \mu_n \rangle = 0

3) Homoskedasticity: \vert \mu \rangle \langle \mu \vert = I_N \cdot \sigma^2_\mu, \vert \lambda \rangle \langle \lambda \vert = I_T \cdot \sigma^2_\lambda and \vert \varepsilon \rangle \langle \varepsilon \vert = I_{N \times T} \cdot \sigma^2_{\varepsilon}

 

What are the key take-aways from these assumptions? First,  assumption 1) means that there is a constant term in the  explanatory X variables. Assumption 2) is just the standard  white noise assumption. Assumption 3) is the key restriction. This  assumption says that the within and between effects are independent  across time and panels respectively. The estimator I define below  allows me to learn the values of \sigma_\mu^2, \sigma_\lambda^2  and \sigma_\varepsilon^2.

 

Estimation

How do I go about estimating these 3 objects? First, I define some notation to make my life a bit easier and stave  of carpel tunnel for a few more semesters:

(3)   \begin{align*} \begin{split} F \ &= \ \vert I_N \otimes 1_T \rangle \langle I_N \otimes 1_T \vert \\ G \ &= \ \vert 1_N \otimes I_T \rangle \langle 1_N \otimes I_T \vert \end{split} \end{align*}

 

Also, let H be an (N \cdot T) \cdot (N \cdot T - N - T + 1) unit  matrix. I name the error covariance matrix \Omega, and then characterize  it as a linear function of the 3 variance terms of interest:

(4)   \begin{align*} \begin{split} \Omega \ &= \ \vert \mathcal{U} \rangle \langle \mathcal{U} \vert \\ &= \ \sigma_\mu^2 \cdot F \ + \ \sigma^2_\lambda \cdot G \ + \ \sigma_\varepsilon^2 \cdot I_{N \times T} \end{split} \end{align*}

 

I can write out the inverse of the error covariance matix \Omega  as follows:

(5)   \begin{align*} \begin{split} \Omega^{-1} \ &= \ \frac{1}{\sigma_\varepsilon^2} \cdot \left( I_{N \times T} - \gamma_1 \cdot F + \gamma_2 \cdot G + \gamma_3 \cdot H \right) \\ \gamma_1 \ &= \ \frac{\sigma_\mu^2}{\sigma_\varepsilon^2 + T \cdot \sigma_\mu^2} \\ \gamma_2 \ &= \ \frac{\sigma_\lambda^2}{\sigma_\varepsilon^2 + N \cdot \sigma_\lambda^2} \\ \gamma_3 \ &= \ \gamma_1 \cdot \gamma_2 \cdot \left( \ \frac{2 \cdot \sigma_\varepsilon^2 + T \cdot \sigma_\mu^2 + N \cdot \sigma_\lambda^2}{\sigma_\varepsilon^2 + T \cdot \sigma_\mu^2 + N \cdot \sigma_\lambda^2} \ \right) \end{split} \end{align*}

 

This formulation shows that the sample error covariance matrix will  provide unbiased and consistent estimates if both N \to \infty and  T \to \infty. In this not, I am not going to worry about what is  the most consistent estimator for the parameters. Next, I want to decompose the error covariance matrix into within,  between and indiosyncratic components. To do this I need 1 last  piece of notation:

(6)   \begin{align*} Q \ &= \ I \ - \ \frac{F}{T} \ - \ \frac{G}{N} \ + \ \frac{H}{N \cdot T} \end{align*}

 

Think about this as an orthogonal decomposition of a unitary error  covariance matrix into each of the 3 components: within, between  and idiosyncratic. Then, using this term, Amemiya (1971) shows that the following  estimators for the parameter vector \begin{bmatrix} \sigma_\mu^2 &  \sigma_\lambda^2 & \sigma_\varepsilon^2 \end{bmatrix}:

(7)   \begin{align*} \begin{split} \hat{\mathcal{U}} \ &= \ Y \ - \ \langle X \mid \hat{\beta} \rangle \\ \hat{\sigma}_{\varepsilon}^2 \ &= \ \frac{\langle \hat{\mathcal{U}} \mid \langle Q \mid \hat{\mathcal{U}} \rangle \rangle}{(N-1) \cdot (T-1)} \\ \hat{\sigma}_{\mu}^2 \ &= \ \frac{\langle \hat{\mathcal{U}} \mid \langle \frac{T-1}{T} \cdot F - \frac{T-1}{N \cdot T} \cdot H - Q \mid \hat{\mathcal{U}} \rangle \rangle}{T \cdot (N-1) \cdot (T-1)} \\ \hat{\sigma}_{\lambda}^2 \ &= \ \frac{\langle \hat{\mathcal{U}} \mid \langle \frac{N-1}{N} \cdot G - \frac{N-1}{N \cdot T} \cdot H - Q \mid \hat{\mathcal{U}} \rangle \rangle}{T \cdot (N-1) \cdot (T-1)} \end{split} \end{align*}

Filed Under: Uncategorized

Recurrence in 1D, 2D and 3D Brownian Motion

June 26, 2011 by Alex

Introduction

I show that Brownian motion is recurrent for dimensions d=1 and  d=2 but transient for dimensions d \geq 3. Below, I give the  technical definition of a recurrent stochastic process:

Definition: (Recurrent Stochastic Process) Let X(t) be a stochastic process. We say that X(t) is recurrent  if for any \varepsilon > 0 and any point \bar{x} \in  \mathtt{Dom}(X) we have that:

(1)   \begin{align*} \infty \ &= \ \int_0^\infty \ \mathtt{Pr} \left[ \ \left\Vert X(t) - \bar{x} \right\Vert < \varepsilon  \mid X(0) = \bar{x} \ \right] \cdot dt \end{align*}

In words, this definition says that if the stochastic process X(t)  starts out at a point \bar{x}, then if we watch the process  forever it will return again and again to within some tiny region of  a an infinite number of times.

Motivating Example

Before I go about proving that Brownian motion is recurrent or   transient in different dimensions, I first want to nail down the   intuition of what it means for a stochastic process to be recurrent   in a more physical sense. To do this, I use the standard real world   example for random walks: a drunk leaving a bar.

Arnold’s lattice world for the case of 2 dimensions.

Example: (A Drunkard’s Flight) Suppose that Arnold is drunk and leaving his local bar. What’s   more, Arnold is really inebriated and can only muster enough   coordination to move 1 step backwards or 1 step forward each   second. Because he is so drunk, he doesn’t have any control which   direction he stumbles so you can think about him moving backwards   and forwards each second with equal probability \pi = 1/2. Thus, Arnold’s position relative to the door of the bar is a   stochastic process with independent \pm 1 increments. This   process is recurrent if Arnold returns to the bar an infinite   number of times as we allow him to stumble around all night. Put   differently, if Arnold ever has a last drink for the evening and   exits the bar for good, then his stumbling process will be   transient.

In the context of this toy example, I show that as I allow Arnold   to stumble in more and more different directions (backwards   vs. forwards, left vs. right, up vs. down, etc…), his   probability of returning to the bar decreases. Namely, if Arnold   can only move backwards and forwards, then his stumbling will lead   him back to his bar an infinite number of times. If he can move   backwards and forwards as well as left and right, he will still   wander back to the bar an infinite number of times. However, if   Arnold either suddenly grows wings (i.e., can move up or down) or   happens to be the Terminator (i.e., can time travel to the future   or past), at some point his wandering will lead him away from the   bar forever.

 

Outline

First, I state and prove   Polya’s Theorem which characterizes whether or not a random walk on   a lattice is recurrent in each dimension d=1,2,3\ldots. Then, I show how to extend this result to continuous time Brownian motion using the   Central Limit Theorem. I attack this recurrence result for continuous time Brownian motion   via Polya’s Recurrence Theorem because I think the intuition is   much clearer along this route. I find the direct proof in   continuous time which relies on Dynkin’s lemma a bit obscure;   whereas, I have a very good feel for what it means to count paths   (i.e., possible random walk trajectories) on a grid.

 

Polya’s Recurrence Theorem

Below, I formulate and prove Polya’s Recurrence Theorem for  dimensions d \in \{1,2,3\}.

Theorem: (Polya Recurrence Theorem) Let p(d) be the probability that a random walk on a d  dimensional lattice ever returns to the origin. Then, we have that  p(1)=p(2) = 1 while p(3) < 1.

 

Intuition

Before I go any further into the maths, I walk through the physical   intuition behind the result. First, imagine the case where drunk   Arnold can only move forwards and backwards. In order for Arnold to   return to the bar door in 2 \cdot s steps1, he must take the exact   same number of forward and backwards steps. i.e., he has to choose   a sequence of 2 \cdot s steps such that exactly s of them are   forward. There are 2 \cdot s choose s ways I could do   this:

(2)   \begin{align*} \mathtt{\# \ returning \ paths} \ &= \ \begin{pmatrix} 2 \cdot s \\ s \end{pmatrix} \end{align*}

What’s more, I know that the probability of each of the paths Arnold could take is just 1 divided by the total number of paths 2^{2   \cdot s}:

(3)   \begin{align*} \mathtt{Pr[each \ path]} \ &= \ \frac{1}{2^{2 \cdot s}} \end{align*}

Now consider drunk Arnold’s situation in 2-dimensions. Here, he   must take the exact same number of steps forward and backwards as   well as the exact same number of steps left and right. Thus, there   are 2 \cdot s choose (k,k,s-k,s-k) ways for Arnold to return to the bar:

(4)   \begin{align*} \mathtt{\# \ returning \ paths} \ &= \ \sum_{k=0}^s \ \begin{pmatrix} 2 \cdot s \\ k,k,(s-k),(s-k) \end{pmatrix} \end{align*}

What is this sum computing in words? First, suppose that Arnold   takes no steps in the left or right directions, then set k=0 and   the number of paths he could take back to the bar is equal to the   number in the 1-dimensional case. Conversely, if Arnold takes no   steps forwards or backwards, set k=s and again you get the   1-dimensional case. Thus, the number of possible paths Arnold can   take back to the bar in 2-dimensions is strictly larger than in   1-dimension. However, Arnold can also take paths which mover   along both axes. This sum first counts up the number of ways he can   make to end up back at his starting point in the left or right   directions. Then, it takes the remaining number of steps, and   counts the number of ways he can use those steps to return to the   starting point in the forwards and backwards direction.

Note that this process doesn’t add that many new returning paths   for each new dimension. Every time I add a new dimension, I’m   certainly adding fewer than 2^s new paths as:

(5)   \begin{align*} m^n \ &= \ \sum_{k_1 + k_2 + \ldots + k_m = n} \ \begin{pmatrix} n \\ k_1, k_2, \ldots, k_m \end{pmatrix} \end{align*}

However, each path only happens with probability 4^{- 2 \cdot s}   now. The probability of realizing each possible path is decreasing   at a rate of 2 \cdot s:

(6)   \begin{align*} \mathtt{Pr[each \ path]}(d) \ &= \ \left(\frac{1}{2 \cdot d}\right)^{2 \cdot s} \end{align*}

Thus, the Polya’s Recurrence Theorem stems from the fact that the number of possible paths back to the origin in growing at a rate   that in less than the number of all paths; i.e., the wilderness of   paths that do not loop back to the origin is increasing faster than   the set of paths which do loop back as we add dimensions.

 

Proof

Below, I prove this result 1 dimension at a time:

Proof: (d=1) The probability that Arnold will return to the origin in 2 \cdot s   steps is the number of possible paths times the probability that   each 1 of those paths occurs:

(7)   \begin{align*} p_{2 \cdot s}(1) \ &= \ \left( \frac{1}{2} \right)^{2 \cdot s} \cdot \begin{pmatrix} 2 \cdot s \\ s \end{pmatrix} \end{align*}

Next, in order to derive an analytical characterization of this   probability, I use Stirling’s approximation to handle the factorial   terms in the binomial coefficient:

(8)   \begin{align*}   s! \ &\approx \ \sqrt{2 \cdot \pi \cdot s} \cdot e^{-s} \cdot s^s \end{align*}

Using this approximation and simplifying, I find that:

(9)   \begin{align*} \begin{split} p_{2 \cdot s}(1) \ &= \ \left( \frac{1}{2} \right)^{2 \cdot s} \cdot \frac{(2 \cdot s)!}{s! \cdot (2 \cdot s - s)!} \\ &\approx \ \frac{1}{(\pi \cdot s)^{1/2}} \end{split} \end{align*}

Thus, if I sum over all possible periods, I get the expected number   of times that drunk Arnold will return to the bar for another night   cap. I find that this infinite sum diverges:

(10)   \begin{align*} \begin{split} p(1) \ &= \ \sum_{s=0}^\infty \ p_{2 \cdot s}(1) \\ &= \ \sum_{s=0}^\infty \ \frac{1}{(\pi \cdot s)^{1/2}} \\ &= \ \infty \end{split} \end{align*}

 

Proof: (d=2) Next, I follow all of the same steps through for the d=2   dimensional case:

(11)   \begin{align*} \begin{split} p_{2 \cdot s}(2) \ &= \ \left( \frac{1}{4} \right)^{2 \cdot s} \cdot \sum_{k=0}^s \ \begin{pmatrix} 2 \cdot s \\ k,k,(n-k),(n-k) \end{pmatrix} \\ &= \ \left( \frac{1}{4} \right)^{2 \cdot s} \cdot \sum_{k=0}^s \ \frac{(2 \cdot s)!}{k! \cdot k! \cdot (s - k)! \cdot (s - k)!} \\ &= \ \left( \frac{1}{4} \right)^{2 \cdot s} \cdot \sum_{k=0}^s \ \frac{(2 \cdot s)!}{s! \cdot s!} \cdot \frac{s! \cdot s!}{k! \cdot k! \cdot (s - k)! \cdot (s - k)!} \\ &= \ \left( \frac{1}{4} \right)^{2 \cdot s} \cdot \begin{pmatrix} 2 \cdot s \\ s \end{pmatrix} \cdot \sum_{k=0}^s \ \begin{pmatrix} s \\ k \end{pmatrix}^2 \\ &= \ \left[ \left( \frac{1}{2} \right)^{2 \cdot s} \cdot \begin{pmatrix} 2 \cdot s \\ s \end{pmatrix} \right]^2 \\ &= \ \left[ p_{2 \cdot s}(1) \right]^2 \end{split} \end{align*}

Summing over all possible path lengths yields a divergent series:

(12)   \begin{align*} \begin{split} p(2) \ &= \sum_{s=0}^\infty \ p_{2 \cdot s}(2) \\ &= \ \sum_{s=0}^\infty \ \frac{1}{\pi \cdot s} \\ &= \ \infty \end{split} \end{align*}

 

Proof: (d=3) The result for d=3 is a bit more complicated as there isn’t a   nice closed form expression for each of the p_{2 \cdot s}(3)   terms. I start by simplifying as far as I can:

(13)   \begin{align*} \begin{split} p_{2 \cdot s}(3) \ &= \ \left( \frac{1}{6} \right)^{2 \cdot s} \cdot \begin{pmatrix} 2 \cdot s \\ k,k, j,j,  (s-k-j),(s-k-j) \end{pmatrix} \\ &= \ \left( \frac{1}{6} \right)^{2 \cdot s} \cdot \sum_{j,k \mid j+k \leq s} \ \frac{(2 \cdot s)!}{k! \cdot k! \cdot j! \cdot j! \cdot (s-j-k)! \cdot (s-j-k)!} \\ &= \ \left( \frac{1}{2} \right)^{2 \cdot s} \cdot \begin{pmatrix} 2 \cdot s \\ s \end{pmatrix} \cdot \sum_{j,k \mid j+k \leq s} \ \left( \frac{1}{3^s} \cdot \frac{s!}{k! \cdot j! \cdot (s-j-k)!} \right)^2 \end{split} \end{align*}

Next, I apply the Multinomial Theorem and note that this   probability is maximized when j=k=n/3. Thus, if I substitute in   this value, I will have an upper bound on the probability p_{2   \cdot s}(3):

(14)   \begin{align*} \begin{split} p_{2 \cdot s}(3) \ &\leq \ \left( \frac{1}{2} \right)^{2 \cdot s} \cdot \begin{pmatrix} 2 \cdot s \\ s \end{pmatrix} \cdot \left( \frac{1}{3^s} \cdot \frac{s!}{\left[ \left( \frac{s}{3} \right)! \right]^3} \right) \\ &\leq \ \frac{C}{(\pi \cdot s)^{3/2}} \end{split} \end{align*}

Summing over all possible path lengths leads to a convergent   series, so I know that Arnold may have a final drink at some point   during the evening:

(15)   \begin{align*} \begin{split} p(3) \ &= \ \sum_{s=0}^\infty \ p_{2 \cdot n}(3) \\ &< \ \infty \end{split} \end{align*}

 

Extension to Brownian Motion

Below, I define Brownian motion in d>1 dimensions and then show  how to extend the results from Polya’s Recurrence Theorem from  random walks on a lattice to continuous time Brownian  motion.

Brownian motion for d>1 dimensions is a natural extension of the  d=1 dimensional case. I give the formal definition below:

Definition: (Multi-Dimensional Brownian Motion) Brownian motion in \mathcal{R}^d is the vector valued process:

(16)   \begin{align*} \mathbf{B}(t) \ &= \ \begin{bmatrix} B_1(t) & B_2(t) & \ldots & B_d(t) \end{bmatrix} \end{align*}

To extend Polya’s Recurrence Theorem to continuous time Brownian  motion, I just need to apply the Central Limit Theorem and then  construct the Brownian motion from the resulting independent  Gaussian increments:

Theorem: (deMoivre-Laplace) Let k_s be the number of successful draws from a binomial  distribution in s tries. Then, when \mathbb{E}(k_s) \approx s  \cdot \pi, we can approximate the binomial distribution with the  Gaussian distribution with the approximation becoming exact as s  \to \infty:

(17)   \begin{align*} \mathtt{Bin}(s,\pi) \ &\sim \ \mathtt{Norm}\left(s \cdot \pi, \sqrt{s \cdot \pi \cdot (1-\pi)}\right) \end{align*}

Lemma: (Levy’s Selector) Suppose that s<t and X(s) and X(t) are random variables  defined on the same sample space such that X(t) - X(s) has a  distribution which is \mathtt{Norm}(0,t-s). Then there exists a  random variable X(\frac{t+s}{2}) such that X(\frac{t+s}{2}) -  X(s) and X(t) - X(\frac{t+s}{2}) are independent with a common  \mathtt{Norm}(0,\frac{t-s}{2}) distribution.

  1. Sanity Check:   Why 2 \cdot s and not just s here? ↩

Filed Under: Uncategorized

« Previous Page
Next Page »

Pages

  • Publications
  • Working Papers
  • Curriculum Vitae
  • Notebook
  • Courses

Copyright © 2026 · eleven40 Pro Theme on Genesis Framework · WordPress · Log in

 

Loading Comments...
 

You must be logged in to post a comment.