1. Introduction
In this post, I show how to compute corrected standard errors for a predictive regression with overlapping samples as in Hodrick (1992). First, in Section 2, I walk through a simple example which outlines the general empirical setting and illustrates why we would need to correct the standard errors on the coefficient estimates when faced with overlapping samples. Then, in Section 3 I compute the estimator for the standard errors proposed in Hodrick (1992). I conclude in Section 4 with a numerical simulation to verify that the mathematics below in fact computes a sensible estimate of the standard deviation of .
2. An Illustrative Example
Suppose that you are a mutual fund manager who has to allocate capital amongst stocks and you want to know which stocks will earn the highest returns over the next months where
stands for your investment horizon. To start with, you might consider
and run a bunch of regressions with the form below where
is the log
month excess return,
is a current state variable and
is the residual:
(1)
For example, Fama and French (1988) pick to be the log price to dividend ratio while Jegadeesh and Titman (1993) pick
to be a dummy variable for a stock’s inclusion or exclusion from a momentum portfolio. We can vectorize the expression above to clean up the algebra and obtain the regression equation below:
(2)
However, just as Fama and French (1988) and Jegadeesh and Titman (1993) are interested in investment horizons of , you could also set up the regression from above with
by making the adjustments:
(3)
Here, expression for comes from the null hypothesis that
has no predictive power assuming that each of the
terms are
white noise. Thus, if you run a new set of regressions at the
month investment horizon, you would have the vectorized regression equation:
(4)
However, while estimating this equation and trying to compute the standard errors for , you notice something troubling: even though each of the
terms is distributed
and act as white noise, the
and
terms each contain the
shock. Thus, while the step by step shocks are white noise, the regression residuals are autocorrelated in a non trivial way:
(5)
Thus, in order to properly account for the variability of your estimate of , you will have to compute standard errors for the regression that take this autocorrelation and conditional heteroskedasticity into account.
3. Hodrick (1992) Solution
Standard econometric theory tells us that we can estimate using GMM yielding the distributional result:
(6)
with the variance covariance matrix given by the expression:
(7)
Thus, the autocovariance of the regression residuals will be captured by the or spectral density term. A natural way to account for this persistence in errors would be to compute the
would be to compute something like the average of the autocovariances:
(8)
However, this estimator for the spectral density has bad small sample properties as autocovariance matrices are only garraunteed to be positive semi-definite leading to large amounts of noise as your computer attempts to invert a nearly singular matrix. The insight in Hodrick (1992) is to use stationarity of the time series and
to switch from summing autocovariances to variances:
(9)
4. Simulation Results
In this section, I conclude by verifying my derivations with a simulation (code). First, I compute a data set of month returns using a discretized version of an Ornstein-Uhlenbeck process with
:
(10)
with an
standard normal variable. I use the annualized moments below taken from Cochrane (2005):
(11)
I also simulate a completely unrelated process which represents
draws from a standard normal distribution. Thus, I check my computations under the null hypothesis that
has no predictive power. I then run
simulations in which I compute the data series above, estimate the regression:
(12)
and then report the distribution of , as well as the naive and Hodrick (1992) implied standard errors:

Estimated standard errors for 500 simulated draws using both the naive and Hodrick (1992) approaches.
I report the mean values from the simulations below:
(13)
[…] an example, take a look at my previous post in which I walk through the properties of the Hodrick (1992) standard errors for overlapping […]