Research Notebook

Why max EPS Persists: Ba (AER 2026) Interpretation

March 29, 2026 by Alex

The question

Researchers currently take it for granted that CEOs should maximize PV[Shareholder Payouts]. Suppose we agree. Under this premise, max EPS is the wrong objective. It’s a misspecified model for how to run a firm. In work with Itzhak Ben-David, I show that EPS maximization solves the 3 core problems in corporate finance: capital structure, real investment, and payout policy.

How does max EPS survive? CEOs have access to textbooks, MBA programs, consultants, and analysts—all of which teach present-value logic. The data generated by decades of corporate decisions are available for everyone to examine. If max EPS is misspecified, why hasn’t it been abandoned?

Ba (2026) provides a formal theory of exactly this phenomenon: when and why misspecified models persist, even when decision-makers are open to switching and have access to infinite data. This note spells out the connection to EPS maximization.

Ba’s framework

An agent uses a subjective model \theta to guide decisions. Each period t, she chooses an action a_t from a finite set \mathcal{A} and observes an outcome y_t drawn from the true (unknown) data-generating process Q^*(\cdot | a_t). Her model \theta is a parametric family of predicted DGPs, \{Q^\theta(\cdot | a, \omega)\}_{a \in \mathcal{A}, \omega \in \Omega^\theta}, where \omega indexes the parameter space \Omega^\theta. The model is correctly specified if some \omega recovers Q^*; it is misspecified otherwise.

The agent holds a prior \pi_0^\theta over \Omega^\theta and updates beliefs via Bayes’ rule within the model. Crucially, she also considers a competing model \theta' with its own parameter space \Omega^{\theta'} and prior \pi_0^{\theta'}. She compares models using the Bayes factor

(1)   \begin{equation*} \lambda_t = \frac{\ell_t(\theta')}{\ell_t(\theta)} \end{equation*}

where \ell_t(\theta) = \sum_{\omega \in \Omega^\theta} \pi_0^\theta(\omega) \ell_t(\theta, \omega) is the marginal likelihood of the data under model \theta, and \ell_t(\theta, \omega) = \prod_{\tau=0}^{t} q^\theta(y_\tau | a_\tau, \omega) is the likelihood conditional on parameter \omega.

As new data rolls in, the agent updates her Bayes factor recursively

(2)   \begin{equation*}  \lambda_t = \lambda_{t-1} \cdot \frac{\sum_{\omega' \in \Omega^{\theta'}} \pi_t^{\theta'}(\omega') \, q^{\theta'}(y_t | a_t, \omega')}{\sum_{\omega \in \Omega^\theta} \pi_t^\theta(\omega) \, q^\theta(y_t | a_t, \omega)} \end{equation*}

If \lambda_t > \alpha where \alpha \geq 1 is a switching threshold, the agent switches to \theta'. If \lambda_t < 1/\alpha, she switches back. The threshold \alpha controls switching stickiness. A larger \alpha requires stronger evidence to switch.

Ba (2026) notation EPS vs. PV application
Initial model \theta Max EPS
Competing model \theta' Max PV[Shareholder Payouts]
Action set \mathcal{A} Corporate decisions: leverage choice, project selection, payout policy
Outcome y_t Observable corporate outcomes: EPS level, EPS growth, stock-price reaction, analyst response
True DGP Q^*(\cdot \mid a) The actual mapping from corporate decisions to outcomes (determined by the full economic environment)
Parameter space \Omega^\theta Parameters of the EPS model (earnings yield, interest rates)
Parameter space \Omega^{\theta'} Parameters of the PV model (discount rates, growth rates, terminal values, risk premia, payout schedules)
Switching threshold \alpha Institutional friction: retraining costs, compensation redesign, regulatory reporting norms, board inertia
Bayes factor \lambda_t Cumulative evidence that PV logic fits the data better than EPS logic

Result 1: Endogenous data lets misspecified models survive forever

The theorem

Theorem 1 (Ba 2026, p. 16). Suppose \alpha > 1. The following are equivalent:

  1. Model \theta is globally robust for at least one full-support prior.
  2. Model \theta is locally robust for at least one full-support prior.
  3. There exists a p-absorbing self-confirming equilibrium (SCE) under model \theta.

A self-confirming equilibrium under \theta is a strategy \sigma supported by a belief \pi^\theta such that (i) \sigma is myopically optimal given \pi^\theta, and (ii) the model’s prediction matches the true DGP on the equilibrium path

(3)   \begin{equation*} q^\theta(\cdot \mid a, \omega) \equiv q^*(\cdot \mid a) \qquad \forall \, a \in \text{supp}(\sigma), \; \forall \, \omega \in \text{supp}(\pi^\theta) \end{equation*}

The strategy is p-absorbing if a dogmatic \theta-modeler eventually plays only actions in \text{supp}(\sigma).

The key insight: a misspecified model need not be globally correct. It only needs to be correct on the equilibrium path—for the actions it actually induces. Off-path misspecification is never revealed because the agent’s own actions determine which data are generated. This is why endogenous data is essential: Ba (2026, p. 16, fn. 19) notes that “in an exogenous-data environment, Theorem 1 implies that the sufficient and necessary condition for both local robustness and global robustness is that the model is correctly specified.”

P-absorbingness adds a dynamic requirement on top of the static SCE condition. It is not enough for an SCE to exist; the agent’s belief dynamics must actually converge to it. Ba’s Section 5.2 (Proposition 2, p. 24) shows that this convergence property depends on the direction of belief reinforcement. When beliefs and actions are complements (so that the bias feeds on itself), the dynamics are positively reinforcing and convergence to the SCE is guaranteed. Hence, the SCE is p-absorbing. When beliefs and actions are substitutes, the bias is self-correcting. Dynamics may oscillate and fail to converge. A SCE exists but is not p-absorbing, and the misspecified model is not robust.

Application to EPS

When a CEO maximizes EPS, her decisions shape the observable corporate outcomes. The EPS model’s predictions are tested only against data generated by EPS-driven actions. Predictions about actions a CEO never takes are never tested.

Leverage. An EPS maximizer borrows when \mathrm{EY} > \mathrm{i} (earnings yield exceeds the interest rate) and uses the proceeds to retire shares. This raises EPS mechanically. The outcome the CEO observes is: EPS went up, the stock price did not collapse, analysts applauded the “accretive” transaction. The EPS model’s prediction that the transaction would be good because it is accretive is confirmed by the data the decision itself generated. The CEO does not observe the counterfactual: what would’ve happened under the PV-optimal leverage choice given frictions.

Investment. An EPS maximizer uses \mathrm{HR} = \min\{\mathrm{EY}, \, \mathrm{i}, \, \mathrm{rf}\} as the hurdle rate, not the WACC. She rejects projects with positive NPV but negative first-year EPS impact (dilutive projects) and accepts projects with negative NPV but positive first-year EPS impact (accretive projects). The observed outcome: EPS did not fall, the project looks like it “worked.” The NPV of rejected projects is never observed.

Payout. An EPS maximizer buys back stock whenever buybacks offer a higher yield than investing cash (\mathrm{EY} > \mathrm{CY}) rather than evaluating the NPV of the buyback. The observed outcome: EPS went up, the market reacted positively to the announcement. The PV counterfactual (could the cash have been better deployed elsewhere?) is off-path.

Let \sigma^{EPS} be the strategy induced by max EPS, and let \hat{\omega} be a parameter value in the EPS model under which the predicted outcome distribution matches Q^*(\cdot | a) for all a \in \text{supp}(\sigma^{\mathrm{EPS}}). Then \sigma^{\mathrm{EPS}} is an SCE under the EPS model. This is plausible because the EPS model does not mispredict the direction of EPS changes from leverage, buybacks, or accretive acquisitions. It correctly predicts that borrowing at \mathrm{i} < \mathrm{EY} raises EPS, that using cash for buybacks at \mathrm{EY} > \mathrm{CY} raise EPS, and so on. What it gets wrong is the welfare interpretation: whether these EPS changes correspond to value creation. But welfare is not directly observed in y_t; what is observed are EPS changes, stock-price reactions, and analyst ratings, all of which are consistent with the EPS model’s on-path predictions.

Moreover, the EPS model’s feedback dynamics are positively reinforcing in the sense of Ba’s Proposition 2: EPS-driven decisions raise EPS, which validates the model, which strengthens conviction, which leads to more EPS-driven decisions. This positive feedback ensures that the SCE is p-absorbing. Contrast this with the dynamics facing a CEO who switches to PV logic: she accepts a dilutive acquisition, EPS falls in the short run, analysts downgrade, the stock price drops, and the PV model appears to have failed, creating pressure to revert. The transition to the correct model generates short-run data that seem to disconfirm it. By Theorem 1, the existence of a p-absorbing SCE under the EPS model is sufficient for it to be globally robust. Hence, EPS maximization can persist against any competitor, including PV logic, with infinite data.

Result 2: Concise models can be more robust than correct ones

The theorem

Theorem 2 (Ba 2026, p. 19). Suppose \alpha > 1 and model \theta has no traps. Then:

  1. Model \theta is globally robust at prior \pi_0^\theta if and only if \pi_0^\theta(C^\theta) \geq 1/\alpha.
  2. Model \theta is locally robust at all full-support priors if and only if C^\theta \neq \emptyset.

C^\theta is the set of consistent parameters: those \omega for which the pure belief \delta_\omega supports a p-absorbing SCE. The model’s prediction under \omega matches the true DGP at every action in the equilibrium strategy’s support.

The condition \pi_0^\theta(C^\theta) \geq 1/\alpha links three forces: the model’s structure (which determines C^\theta), the agent’s prior (which determines how much mass falls on C^\theta), and the switching threshold (which sets the bar). Prior tightness and switching stickiness are substitutes: a higher \alpha lowers the bar for prior concentration, and a tighter prior lowers the bar for stickiness. Any asymptotically accurate model can be globally robust at a given prior, provided switching is sufficiently sticky.

The critical implication: correctly specified models are not necessarily more robust than misspecified ones. A misspecified model with a smaller parameter space |\Omega^\theta| can satisfy the tightness condition more easily. Under an ignorance prior (uniform over \Omega^\theta), each parameter receives weight 1/|\Omega^\theta|. For a model where every parameter is consistent (C^\theta = \Omega^\theta), the tightness condition is automatically satisfied at any \alpha > 1, regardless of the prior—the model is unconditionally globally robust. But a correctly specified model with a large parameter space needs a correspondingly tight prior to be globally robust, and under a uniform prior it may fail. Ba (2026, p. 4): “simple misspecified models equipped with entrenched priors can be more robust than complex correctly specified models.”

In the media-bias application (Section 5.1, Proposition 1), Ba makes this concrete: a two-state misspecified model \hat{\theta} is globally robust at all priors and all \alpha \geq 1, while the correctly specified three-state model \theta is globally robust only if \pi_0^\theta(\omega^M) \geq 1/\alpha. The misspecified model permanently replaces the correct one with arbitrarily high probability as the prior on the extreme states increases.

Application to EPS

The EPS model has a small parameter space. For any given decision (borrow or not, invest or not, buy back or not), it requires the CEO to know essentially two things: the earnings yield \mathrm{EY} = \frac{\mathbb{E}[\mathrm{EPS}]}{\mathrm{Price}} and the relevant financing cost (interest rate \mathrm{i} or risk-free rate \mathrm{rf}). The decision rule is a direct comparison: act if and only if \mathrm{EY} > \mathrm{HR}, where \mathrm{HR} = \min\{\mathrm{EY}, \, \mathrm{i}, \, \mathrm{rf}\}.

The PV model requires knowledge of a much larger parameter space \Omega^{\theta'}: the risk-free rate, market risk premium, firm beta (or multi-factor betas), the project-specific risk adjustment, the terminal growth rate, the expected path of future cash flows, and the probability distribution over states of the world.

Under Theorem 2, the prior tightness condition for global robustness is \pi_0^\theta(C^\theta) \geq 1/\alpha. For the EPS model, if C^\theta encompasses most or all of \Omega^\theta (because the model is consistent for the small set of parameters it uses), then \pi_0^\theta(C^\theta) is close to 1 and the condition is satisfied for any \alpha > 1. The EPS model may be unconditionally globally robust.

For the PV model, even though it is correctly specified (C^{\theta'} \neq \emptyset), the prior mass is spread across a large parameter space. Under a diffuse prior, \pi_0^{\theta'}(C^{\theta'}) may be small. The PV model is locally robust at all priors (by Theorem 2(ii)), but it is globally robust only if \pi_0^{\theta'}(C^{\theta'}) \geq 1/\alpha. With large |\Omega^{\theta'}| and diffuse prior, this can fail.

Moreover, switching stickiness in corporate settings is very large. Compensation contracts are tied to EPS targets. Analyst coverage is organized around EPS estimates, consensus forecasts, and PE multiples. Regulatory reporting (GAAP earnings) makes EPS the most salient and auditable metric, while PV calculations involve subjective inputs (discount rates, growth assumptions) that are harder to audit and verify. Board education is required to shift from a direct comparison (“is this accretive?”) to a multi-parameter model (“what is the NPV at the appropriate risk-adjusted discount rate?”). All of this amounts to a very high \alpha, which further lowers the bar for the prior tightness that the EPS model must satisfy.

Result 3: Even slight switching friction is enough

The theorem

Theorem 3 (Ba 2026, p. 21). Suppose model \theta has no traps and \alpha = 1. Then model \theta is locally or globally robust at any full-support prior \pi_0^\theta if and only if C^\theta = \Omega^\theta.

When switching is non-sticky (\alpha = 1), local and global robustness coincide, robustness at some prior is equivalent to robustness at all priors, and both hold only when every parameter in the model is consistent. This is an extreme demand: the model must be correct for every DGP it entertains, not just on the equilibrium path. Only a model with full prior tightness (C^\theta = \Omega^\theta) can survive frictionless comparison.

The set of robust models shrinks discontinuously at \alpha = 1. For any \alpha > 1, models with C^\theta \neq \Omega^\theta can be robust (provided the prior tightness condition is met). At \alpha = 1, they cannot. Ba (2026, p. 21): “the set of locally robust models and supporting priors shrinks discontinuously at \alpha = 1, which highlights how stickiness helps more misspecified models persist.”

The mechanism: at \alpha = 1, there always exists a nearby competing model that fits the data slightly better than the initial model on some dimension. Because there is no switching friction, this marginal improvement is sufficient to trigger a switch. The proof constructs such a competing model by preserving most DGPs in \theta while slightly improving the accuracy of one DGP associated with a parameter in \Omega^\theta \setminus C^\theta.

Application to EPS

Theorem 3 clarifies that the persistence of max EPS depends on switching friction being positive—but the required friction can be arbitrarily small. For any \alpha > 1 (even \alpha = 1.01), the EPS model can be globally robust provided the prior tightness condition is met. The discontinuity at \alpha = 1 means that even minimal institutional friction (a small cost of retraining, a slight reluctance to abandon a familiar framework) is qualitatively different from zero friction.

This matters because it addresses a potential objection: “surely CEOs could switch to PV logic if they wanted to; there’s no real barrier.” Ba’s result says that even a negligible barrier is enough, as long as it is positive. The EPS model does not need an enormous moat to survive. It needs (i) a p-absorbing SCE (Result 1), (ii) sufficient prior concentration on consistent parameters (Result 2), and (iii) any positive switching friction at all (Result 3). The first two conditions are structural properties of the EPS model. The third is almost trivially satisfied in any real institution.

Conversely, Theorem 3 identifies the knife-edge case where EPS would be displaced: a world with literally zero switching costs (\alpha = 1) and a PV model that is a slight local improvement. In practice, this would correspond to an environment where CEOs face no career risk from short-term EPS misses, no analyst pressure around quarterly earnings, and no cognitive cost of estimating multi-parameter discount rates. These conditions do not describe any real-world setting.

Summary

Ba (2026) provides a formal framework for understanding when misspecified models persist despite competition from correctly specified alternatives. Applied to the EPS-vs.-PV question, the theory identifies three reinforcing mechanisms, one per main result.

Result Mechanism Application to EPS
Theorem 1 Misspecified model is robust iff it admits p-absorbing SCE; endogenous data insulates on-path predictions from off-path errors Accretive actions generate data that confirm max EPS model; positive feedback dynamics ensure convergence to SCE
Theorem 2 Global robustness requires \pi_0^\theta(C^\theta) \!\geq\! 1/\alpha; concise models concentrate priors; stickiness and prior tightness substitutes EPS model’s small parameter space makes tightness condition easy to satisfy; minor frictions and reporting norms matter
Theorem 3 Set of robust models shrinks discontinuously at \alpha = 1; any positive friction qualitatively expands what can persist Even minimal switching costs suffice for EPS to survive; knife-edge \alpha = 1 case (zero friction) does not describe real world

The punchline: even if we take as given that maximizing PV[Shareholder Payouts] is the correct objective, the Ba (2026) framework gives formal reasons—grounded in Bayesian learning theory—for why maximizing EPS can persist indefinitely. The misspecified model is not merely sticky due to inertia or ignorance. It is robust in a precise sense: it admits a self-confirming equilibrium, its directness concentrates prior beliefs, the data it generates through the CEO’s own actions provide continuous apparent validation, and even minimal institutional friction is enough to protect it. These forces can be strong enough that the correct model is permanently abandoned.

Caveat: Is max PV[Shareholder Payouts] actually the correct model?

Everything above assumes that maximizing PV[Shareholder Payouts] is the correctly specified model. It’s the true DGP against which max EPS is judged misspecified. Ba’s framework requires us to designate one model as correct and ask whether the other persists. We chose PV as the correct model because that is what finance theory prescribes. But this assumption deserves scrutiny.

Shareholders do not get to spend corporate earnings. Earnings are an accounting construct; they accrue to the firm, not to the shareholder’s bank account. A dollar of EPS that is retained and reinvested never reaches the shareholder at all. In that sense, maximizing EPS is maximizing a fiction—a number that does not correspond to any cash flow the shareholder actually receives.

But PV[Shareholder Payouts] has a parallel problem. Shareholders do not get to spend the present discounted value of a dollar they expect to receive in 20 years. You cannot eat risk-adjusted returns. The “present value” of a distant payout is a mathematical object, not cash in hand. And yet these distant, heavily discounted payouts are the primary drivers of valuation in the PV framework.

The Gordon growth model makes this concrete. Under the standard formulation, an asset’s price equals expected cash flow next year times a forward-looking multiple

(4)   \begin{equation*} \mathrm{Price} = \mathbb{E}[\mathrm{CF}] \times \bigg( \frac{1}{\mathrm{r} - \mathrm{g}} \bigg) \end{equation*}

For typical parameter values (\mathrm{r} \approx 10\%, \mathrm{g} \approx 5\%), the multiple is roughly \big(\frac{1}{10\%-5\%}\big) = 20\times. But \big(\frac{1}{\mathrm{r}-\mathrm{g}}\big) is also the Macaulay duration of the cash flow stream in years. So the “typical” dollar of present value corresponds to a payout roughly two decades in the future. The PV framework asks the CEO to make decisions today based on the risk-adjusted value of money that shareholders will not receive for 20 years. This is money that never appears on a financial statement, whose value depends on estimates of discount rates and growth rates that are themselves deeply uncertain.

This does not mean PV logic is wrong. It means that both models involve abstractions, and the question of which abstraction is “correct” is less obvious than textbook finance suggests. EPS is a fiction because earnings are not payouts. PV is a fiction because present values are not cash. The Ba (2026) framework shows that even if we grant the PV model the status of correct specification, the EPS model can persist indefinitely. But if we take seriously the possibility that neither model is unambiguously correctly specified, then the persistence of max EPS becomes even less surprising. In Ba’s terms, we may not be in a world where a correctly specified competitor exists at all, in which case the question is not whether EPS will be abandoned but which misspecified model proves more robust. Regardless, history has shown that EPS wins.

Filed Under: Uncategorized

Behavioral finance and corporate finance are both organized in the exact same way

February 4, 2023 by Alex

Behavioral finance and corporate finance are both organized in the exact same way. Neither is based on a grand unified theory. Instead, both fields proceed by looking for deviations from a benchmark model. The behavioral-finance literature is a list … [Continue reading]

Filed Under: Uncategorized

Asset-pricing models as theories of good synthetic controls

January 18, 2023 by Alex

In 1988, California passed a major piece of tobacco-control legislation called Proposition 99. This bill increased the tax on cigarettes by \$0.25 a pack and triggered a wave of bans on smoking indoors throughout the state. After the bill was passed … [Continue reading]

Filed Under: Uncategorized

Interpreting the LASSO as a *really* simple neural network

January 10, 2023 by Alex

Suppose you want to forecast the return of a particular stock using many different predictors (think: past returns, market cap, asset growth, etc...). One way to do this would be to use the LASSO. Alternatively, you could use a neural network to make … [Continue reading]

Filed Under: Uncategorized

Where’s the “narrative” in “narrative economics”?

October 29, 2022 by Alex

Bob Shiller defines "narrative economics" as the study of "how narrative contagion affects economic events". This research program focuses on two things: "(1) the word-of-mouth contagion of ideas in the form of stories and (2) the efforts that people … [Continue reading]

Filed Under: Uncategorized

Next Page »

Pages

  • Publications
  • Working Papers
  • Curriculum Vitae
  • Notebook
  • Courses

Copyright © 2026 · eleven40 Pro Theme on Genesis Framework · WordPress · Log in