Fano’s Inequality and Resource Allocation

1. Motivation

This post describes Fano’s inequality. It’s not a particularly complicated result. After all, it first shows up on page 33 of Cover and Thomas (1991). However, I recently ran across the result again for the first time in a while, and I realized it had an interesting asset pricing implication.

Roughly speaking, what does inequality say? Suppose I need to make some decision, and you give me some news that helps me decide. Fano’s inequality gives a lower bound on the probability that I end up making the wrong choice as a function of my initial uncertainty and how informative your news was. What’s cool about the result is that it doesn’t place any restrictions on how I make my decision. i.e., it gives a lower bound on my best case error probability. If the bound is negative, then in principle I might be able to eliminate my decision error. If the bound is positive (i.e., binds), then there is no way for me to use the news you gave me to always make the right decision.

Now, back to asset pricing. We want accurate prices so that, in the words of Fama (1970), they can serve as “signals for resource allocation.” If we treat resource allocation as a discrete choice problem and prices as news, then Fano’s inequality applies and gives bounds on how effectively decision makers can use this information.

2. Notation

I start by laying out the notation. Imagine that a decision maker wants to predict the value of a random variable $\widetilde{X}$ that can take on $N$ possible values:

(1) $\begin{align*} \widetilde{X} \in \{ x_1,x_2,\ldots,x_N \} \end{align*}$

e.g., you might think about the decision maker as a farmer and $\widetilde{X}$ as the most profitable crop he can plant next fall. The probability that $\widetilde{X}$ takes on each of the $N$ values is given by:

(2) $\begin{align*} \mathrm{Pr}[\widetilde{X} = x_n] = p_n \end{align*}$

Finally, I use the $\mathrm{H}[\cdot]$ operator to denote the entropy of a random variable:

(3) $\begin{align*} \mathrm{H}[\widetilde{X}] &= - \sum_{n=1}^N p_n \cdot \log_2(p_n) \end{align*}$

3. Main Result

Now, imagine that the farmer knows which crop currently has the highest futures price, $\widetilde{Y}$ , and that this price signal is correlated with the correct choice of which crop to plant:

(4) $\begin{align*} \mathrm{Cor}[\widetilde{X},\widetilde{Y}] \neq 0 \end{align*}$

The farmer could use this information to make an educated guess about the right crop to plant:

(5) $\begin{align*} f(\widetilde{Y}) \in \{ x_1, x_2, \ldots, x_N\} \end{align*}$

e.g., his rule might be something simple like, “Plant the crop with the highest futures price today.” Or, it might be something more complicated like, “Plant the crop with the highest futures price today unless it’s corn in which case plant soy beans.” I am agnostic about what function $f(\cdot)$ the farmer uses to turn price signals into crop decisions. Let $\widetilde{Z}$ denote whether or not he got the decision right though:

(6) $\begin{align*} \widetilde{Z} &= \begin{cases} 0 &\text{if } f(\widetilde{Y}) = \widetilde{X} \\ 1 &\text{else } \end{cases} \end{align*}$

Fano’s inequality links the probability that the farmer makes the wrong crop choice, $\mathrm{E}[\widetilde{Z}]$ , to his remaining entropy after seeing the price signals, $\mathrm{H}[\widetilde{X}|\widetilde{Y}]$ :

(7) $\begin{align*} 1 + \mathrm{E}[\widetilde{Z}] \cdot \log_2(N) \geq \mathrm{H}[\widetilde{X}|\widetilde{Y}] \end{align*}$

4. Quick Proof

The result follows from applying the entropy chain rule in $2$ different ways. Let’s think about the entropy of the joint distribution of errors and crop choices, $(\widetilde{Z},\widetilde{X})$ , after the farmer see the price signal, $\widetilde{Y}$ . The entropy chain rule says that we can rewrite this quantity as:

(8) $\begin{align*} \mathrm{H}[\widetilde{Z},\widetilde{X}|\widetilde{Y}] &= \mathrm{H}[\widetilde{X}|\widetilde{Y}] + \underbrace{\mathrm{H}[\widetilde{Z}|\widetilde{X},\widetilde{Y}]}_{=0} \end{align*}$

where the second term on the right-hand side is $0$ since if you know the correct crop choice you will never make an error. Yet, we can also rewrite $\mathrm{H}[\widetilde{Z},\widetilde{X}|\widetilde{Y}]$ as follows using the exact same chain rule:

(9) $\begin{align*} \mathrm{H}[\widetilde{Z},\widetilde{X}|\widetilde{Y}] &= \mathrm{H}[\widetilde{Z}|\widetilde{Y}] + \mathrm{H}[\widetilde{X}|\widetilde{Z},\widetilde{Y}] \end{align*}$

It’s not like either $\widetilde{Z}$ or $\widetilde{X}$ has a privileged position in the joint distribution $(\widetilde{Z},\widetilde{X})$ !

Applying the chain rule in $2$ ways then leaves us with the equation:

(10) $\begin{align*} \mathrm{H}[\widetilde{Z}|\widetilde{Y}] + \mathrm{H}[\widetilde{X}|\widetilde{Z},\widetilde{Y}] & = \mathrm{H}[\widetilde{X}|\widetilde{Y}] \end{align*}$

The first term on the left-hand side is bounded above by:

(11) $\begin{align*} \mathrm{H}[\widetilde{Z}|\widetilde{Y}] \leq \mathrm{H}[\widetilde{Z}] \leq 1 \end{align*}$

since conditioning on a random variable weakly lowers entropy and a binary choice variable has at most $1$ bit of information. Rewriting the second term on the left-hand side as follows:

(12) $\begin{align*} \mathrm{H}[\widetilde{X}|\widetilde{Z},\widetilde{Y}] &= \mathrm{Pr}[\widetilde{Z} = 0] \cdot \underbrace{\mathrm{H}[\widetilde{X}|\widetilde{Z} = 0,\widetilde{Y}]}_{=0} + \mathrm{Pr}[\widetilde{Z} = 1] \cdot \mathrm{H}[\widetilde{X}|\widetilde{Z} = 1,\widetilde{Y}] \end{align*}$

then gives the desired result since the uniform distribution maximizes a discrete variable’s entropy:

(13) $\begin{align*} \mathrm{H}[\widetilde{X}|\widetilde{Z} = 1,\widetilde{Y}] \leq \log_2(N - 1) \leq \log_2(N) \end{align*}$

5. Application

Now let’s consider an application. Suppose that the farmer can plant $N=4$ different crops: 1) corn, 2) wheat, 3) soy, and 4) rice. Let $\widetilde{X}$ denote the most profitable of these crops to plant, and let $\widetilde{Y}$ denote the crop with the highest current futures price. Suppose the choice and price variables have the following joint distribution:

(14) $\begin{align*} \bordermatrix{~ & x_1 & x_2 & x_3 & x_4 \cr y_1 & \sfrac{1}{8} & \sfrac{1}{16} & \sfrac{1}{32} & \sfrac{1}{32} \cr y_2 & \sfrac{1}{16} & \sfrac{1}{8} & \sfrac{1}{32} & \sfrac{1}{32} \cr y_3 & \sfrac{1}{16} & \sfrac{1}{16} & \sfrac{1}{16} & \sfrac{1}{16} \cr y_4 & \sfrac{1}{4} & 0 & 0 & 0 \cr} \end{align*}$

e.g., this table reads that $25{\scriptstyle \%}$ of time rice has the highest futures contract price, and conditional on rice having the highest future price the farmer should always plant corn the following year. In this world, the conditional entropy of the farmer’s decision after seeing the price signal is given by:

(15) $\begin{align*} \mathrm{H}[\widetilde{X}|\widetilde{Y}] &= \sum_{n=1}^4 \mathrm{Pr}[\widetilde{Y} = y_n] \cdot \mathrm{H}[\widetilde{X}|\widetilde{Y} = y_n] = \sfrac{11}{8} \end{align*}$

in units of bits.

Here’s the punchline. If we rearrange Fano’s inequality to isolate the error rate on the left-hand side, we see that there is no way for the farmer to plant the right crop more that $\sfrac{13}{16} \approx 81{\scriptstyle \%}$ of the time:

(16) $\begin{align*} \mathrm{E}[Z] \geq \frac{\mathrm{H}[X|Y] - 1}{\log_2(N)} = \frac{\sfrac{11}{8} - 1}{\log_2(4)} = \frac{3}{16} \end{align*}$

What’s more, this result is independent of how the farmer incorporates the price information.