Standard Error Estimation with Overlapping Samples

1. Introduction

In this post, I show how to compute corrected standard errors for a predictive regression with overlapping samples as in Hodrick (1992). First, in Section 2, I walk through a simple example which outlines the general empirical setting and illustrates why we would need to correct the standard errors on the coefficient estimates when faced with overlapping samples. Then, in Section 3 I compute the estimator for the standard errors proposed in Hodrick (1992). I conclude in Section 4 with a numerical simulation to verify that the mathematics below in fact computes a sensible estimate of the standard deviation of $\beta$ .

2. An Illustrative Example

Suppose that you are a mutual fund manager who has to allocate capital amongst stocks and you want to know which stocks will earn the highest returns over the next $H$ months where $H$ stands for your investment horizon. To start with, you might consider $H=1$ and run a bunch of regressions with the form below where $r_{t \to (t+1)}$ is the log $1$ month excess return, $z_t$ is a current state variable and $\varepsilon_{t \to (t+1)}$ is the residual:

(1) $\begin{align*} r_{t \to (t+1)} &= \theta_1 + \theta_z \cdot z_t + \varepsilon_{t \to (t + 1)} \end{align*}$

For example, Fama and French (1988) pick $z_t$ to be the log price to dividend ratio while Jegadeesh and Titman (1993) pick $z_t$ to be a dummy variable for a stock’s inclusion or exclusion from a momentum portfolio. We can vectorize the expression above to clean up the algebra and obtain the regression equation below:

(2) $\begin{align*} \underbrace{\begin{bmatrix} r_{1 \to 2} \\ r_{2 \to 3} \\ r_{3 \to 4} \\ \vdots \\ r_{(T-1) \to T} \end{bmatrix}}_{Y_{T-1}(1)} &= \underbrace{\begin{bmatrix} 1 & z_1 \\ 1 & z_2 \\ 1 & z_3 \\ \vdots & \vdots \\ 1 & z_{T-1} \end{bmatrix}}_{X_{T-1}} \underbrace{\begin{pmatrix} \theta_1 \\ \theta_z \end{pmatrix}}_{\Theta(1)} + \underbrace{\begin{bmatrix} \varepsilon_{1 \to 2} \\ \varepsilon_{2 \to 3} \\ \varepsilon_{3 \to 4} \\ \vdots \\ \varepsilon_{(T-1) \to T} \end{bmatrix}}_{\mathcal{E}_{T-1}(1)} \end{align*}$

However, just as Fama and French (1988) and Jegadeesh and Titman (1993) are interested in investment horizons of $H>1$ , you could also set up the regression from above with $H>1$ by making the adjustments:

(3) $\begin{align*} r_{t \to (t+H)} &= \sum_{h=1}^H r_{(t+h-1) \to (t+h)} \\ \varepsilon_{t \to (t+H)} &= \sum_{h=1}^H \varepsilon_{(t+h-1) \to (t+h)} \end{align*}$

Here, expression for $\varepsilon_{t \to (t+H)}$ comes from the null hypothesis that $z_t$ has no predictive power assuming that each of the $\varepsilon_{t \to (t+1)}$ terms are $\mathtt{iid}$ white noise. Thus, if you run a new set of regressions at the $H=2$ month investment horizon, you would have the vectorized regression equation:

(4) $\begin{align*} Y_{T-2}(2) &= X_{T-2} \ \Theta(2) + \mathcal{E}_{T-2}(2) \end{align*}$

However, while estimating this equation and trying to compute the standard errors for $\theta_z$ , you notice something troubling: even though each of the $\varepsilon_{t \to (t+1)}$ terms is distributed $\mathtt{iid}$ and act as white noise, the $\varepsilon_{t \to (t+2)}$ and $\varepsilon_{(t+1) \to (t+3)}$ terms each contain the $\varepsilon_{(t+1) \to (t+2)}$ shock. Thus, while the step by step shocks are white noise, the regression residuals are autocorrelated in a non trivial way:

(5) $\begin{align*} r_{t \to (t+2)} &= 2 \cdot \theta_1 + \varepsilon_{t \to (t + 1)} + \varepsilon_{(t+1) \to (t + 2)} \end{align*}$

Thus, in order to properly account for the variability of your estimate of $\theta_z$ , you will have to compute standard errors for the regression that take this autocorrelation and conditional heteroskedasticity into account.

3. Hodrick (1992) Solution

Standard econometric theory tells us that we can estimate $\Theta(H)$ using GMM yielding the distributional result:

(6) $\begin{align*} \hat{\Theta}(H) - \Theta(H) &\sim \mathtt{N}\left( 0, V(H) \right) \end{align*}$

with the variance covariance matrix given by the expression:

(7) $\begin{align*} V(H) &= \frac{1}{T-H} \cdot \mathbb{E}[X_{T-H} X_{T-H}^{\top}]^{-1} S_{T-H} \mathbb{E}[X_{T-H} X_{T-H}^{\top}]^{-1} \end{align*}$

Thus, the autocovariance of the regression residuals will be captured by the $S_{T-H}$ or spectral density term. A natural way to account for this persistence in errors would be to compute the $S_{T-H}$ would be to compute something like the average of the autocovariances:

(8) $\begin{align*} S_{T-H} &= \sum_{j=-H+1}^{H-1} \mathbb{E} \left[ \left( \varepsilon_{t+H} \cdot \begin{bmatrix} 1 & z_t \end{bmatrix}^{\top} \right) \left( \varepsilon_{t+H-j} \cdot \begin{bmatrix} 1 & z_t \end{bmatrix}^{\top} \right)^{\top} \right] \end{align*}$

However, this estimator for the spectral density has bad small sample properties as autocovariance matrices are only garraunteed to be positive semi-definite leading to large amounts of noise as your computer attempts to invert a nearly singular matrix. The insight in Hodrick (1992) is to use stationarity of the time series $Y_{T-H}(H)$ and $X_{T-H}$ to switch from summing autocovariances to variances:

(9) $\begin{align*} &= \mathbb{E} \left[ \varepsilon_{t+1}^2 \cdot \left( \sum_{h=0}^{H-1} \begin{bmatrix} 1 & z_{t - h} \end{bmatrix}^{\top} \right) \left( \sum_{h=0}^{H-1} \begin{bmatrix} 1 & z_{t - h} \end{bmatrix}^{\top} \right)^{\top} \right] \end{align*}$

4. Simulation Results

In this section, I conclude by verifying my derivations with a simulation (code). First, I compute a data set of $1$ month returns using a discretized version of an Ornstein-Uhlenbeck process with $\Delta t = 1/12$ :

(10) $\begin{align*} r_{t \to (t+1)} &= \theta \cdot (\mu - r_{(t-1) \to t}) \cdot \Delta t + \sigma \cdot \sqrt{\Delta t} \cdot \varsigma_{t \to (t+1)} \end{align*}$

with $\varsigma_{t \to (t+1)}$ an $\mathtt{iid}$ standard normal variable. I use the annualized moments below taken from Cochrane (2005):

(11) $\begin{align*} \begin{array}{l|c} & \textit{Value} \\ \hline \hline \mu & 0.08 \\ \theta & 0.70 \\ \sigma & 0.16 \end{array} \end{align*}$

I also simulate a completely unrelated process $z_t$ which represents $T$ $\mathtt{iid}$ draws from a standard normal distribution. Thus, I check my computations under the null hypothesis that $z_t$ has no predictive power. I then run $500$ simulations in which I compute the data series above, estimate the regression:

(12) $\begin{align*} Y_{T-6}(6) &= X_{T-6} \ \Theta(6) + \mathcal{E}_{T-6}(6) \end{align*}$

and then report the distribution of $\theta_z$ , as well as the naive and Hodrick (1992) implied standard errors:

Estimated coefficients for 500 simulated draws.

Estimated standard errors for 500 simulated draws using both the naive and Hodrick (1992) approaches.

I report the mean values from the simulations below:

(13) $\begin{align*} \begin{array}{l|c} \hat{\sigma}_{\mathtt{Naive}} & 0.00312 \\ \hline \hat{\sigma}_{\mathtt{H1992}} & 0.00326 \end{array} \end{align*}$

1. Introduction

2. An Illustrative Example

3. Hodrick (1992) Solution

4. Simulation Results

Trackbacks