1. Introduction
In this post, I show how to compute corrected standard errors for a predictive regression with overlapping samples as in Hodrick (1992). First, in Section 2, I walk through a simple example which outlines the general empirical setting and illustrates why we would need to correct the standard errors on the coefficient estimates when faced with overlapping samples. Then, in Section 3 I compute the estimator for the standard errors proposed in Hodrick (1992). I conclude in Section 4 with a numerical simulation to verify that the mathematics below in fact computes a sensible estimate of the standard deviation of .
2. An Illustrative Example
Suppose that you are a mutual fund manager who has to allocate capital amongst stocks and you want to know which stocks will earn the highest returns over the next months where stands for your investment horizon. To start with, you might consider and run a bunch of regressions with the form below where is the log month excess return, is a current state variable and is the residual:
(1)
For example, Fama and French (1988) pick to be the log price to dividend ratio while Jegadeesh and Titman (1993) pick to be a dummy variable for a stock’s inclusion or exclusion from a momentum portfolio. We can vectorize the expression above to clean up the algebra and obtain the regression equation below:
(2)
However, just as Fama and French (1988) and Jegadeesh and Titman (1993) are interested in investment horizons of , you could also set up the regression from above with by making the adjustments:
(3)
Here, expression for comes from the null hypothesis that has no predictive power assuming that each of the terms are white noise. Thus, if you run a new set of regressions at the month investment horizon, you would have the vectorized regression equation:
(4)
However, while estimating this equation and trying to compute the standard errors for , you notice something troubling: even though each of the terms is distributed and act as white noise, the and terms each contain the shock. Thus, while the step by step shocks are white noise, the regression residuals are autocorrelated in a non trivial way:
(5)
Thus, in order to properly account for the variability of your estimate of , you will have to compute standard errors for the regression that take this autocorrelation and conditional heteroskedasticity into account.
3. Hodrick (1992) Solution
Standard econometric theory tells us that we can estimate using GMM yielding the distributional result:
(6)
with the variance covariance matrix given by the expression:
(7)
Thus, the autocovariance of the regression residuals will be captured by the or spectral density term. A natural way to account for this persistence in errors would be to compute the would be to compute something like the average of the autocovariances:
(8)
However, this estimator for the spectral density has bad small sample properties as autocovariance matrices are only garraunteed to be positive semi-definite leading to large amounts of noise as your computer attempts to invert a nearly singular matrix. The insight in Hodrick (1992) is to use stationarity of the time series and to switch from summing autocovariances to variances:
(9)
4. Simulation Results
In this section, I conclude by verifying my derivations with a simulation (code). First, I compute a data set of month returns using a discretized version of an Ornstein-Uhlenbeck process with :
(10)
with an standard normal variable. I use the annualized moments below taken from Cochrane (2005):
(11)
I also simulate a completely unrelated process which represents draws from a standard normal distribution. Thus, I check my computations under the null hypothesis that has no predictive power. I then run simulations in which I compute the data series above, estimate the regression:
(12)
and then report the distribution of , as well as the naive and Hodrick (1992) implied standard errors:
I report the mean values from the simulations below:
(13)
[…] an example, take a look at my previous post in which I walk through the properties of the Hodrick (1992) standard errors for overlapping […]