1. Introduction
Imagine trying to fill up a gallon bucket using a hand-powered water pump. You might end up with a half-full bucket after
hour of work for either of
reasons. First, the spigot might be too narrow. i.e., even though you are doing enough work to pull
gallons of water out of the ground each hour, only
gallons can actually flow through the spigot during the allotted time. This is a bandwidth constraint. Alternatively, the pump handle might be too short. i.e., you have to crank the handle twice as many times to pull each gallon of water out of the ground. This is an effort constraint.
Existing information-based asset pricing models a la Grossman and Stiglitz (1980) restrict the bandwidth of arbitrageurs’ flow of information. The market just produces too much information per trading period. i.e., people’s minds have narrow spigots. However, traders also face restrictions on how much work they can do each period. Sometimes it’s hard to crank the pump handle often enough to produce the relevant information. e.g., think of a binary signal such as knowing if a cancer drug has completed a phase of testing as in Huberman and Regev (2001). It doesn’t take much bandwidth to convey this news. After all, people immediately recognized its significance when the New York Times wrote about it. Yet, arbitrageurs must have faced a restriction on the number of scientific articles they could read each year since Nature reported this exact same news months earlier and no one batted an eye! These traders left money on the table because they anticipated having to pump the handle too many times in order to uncover a really simple signal.
This post proposes algorithmic flop counts as a way of quantifying how much effort traders have to do in order to uncover profitable information. I illustrate the idea via a short example and some computations.
2. Effort via Flop Counts
One way of quantifying how much effort it takes to discover a piece of information is to count the number of floating-point operations (i.e., flops) that a computer has to do to estimate the number. I take my discussion of flop counts primarily from Boyd and Vandenberghe and define a flop as an addition, subtraction, multiplication, or division of floating-point numbers. e.g., I use flops as a unit of effort so that:
(1)
in the same way that the cost of a Snickers bar might be . I then count the total number of flops needed to calculate a number as a proxy for the effort needed to find out the associated piece of information. e.g., if it took
flops to compute the average return of all technology stocks but
flops to arrive at the median return on assets for all value stocks, then I would say that it is easier (i.e., took less effort) to know the mean return. The key thing here is that this measure is independent of the amount of entropy that either of these
calculations resolves.
I write flop counts as a polynomial function of the dimensions of the matrices and vectors involved. Moreover, I always simplify the expression by removing all but the highest order terms. e.g., suppose that an algorithm required:
(2)
In this case, I would write the flop count as:
(3)
since both these terms are of order . Finally, if I also know that
, I might further simplify to
flops. Below, I am going to be thinking about high-dimensional matrices and vectors (i.e., where
and
are big), so these simplifications are sensible.
Let’s look at a couple of examples to fix ideas. First, consider the task of matrix-to-vector multiplication. i.e., suppose that there is a matrix and we want to calculate:
(4)
where we know both and
and want to figure out
. This task takes an effort of
flops. There are
elements in the vector
, and to compute each one of these elements, we have to multiply
numbers together
times as:
(5)
This setup is analogous to being a dataset with
observations on
different variables where each variable has a linear affect
on the outcome variable
.
Next, let’s turn the tables and look at the case when we know the outcome variable and want to solve for
when
. A standard approach here would be to use the factor-solve method whereby we first factor the data matrix into the product of
components,
, and then use these components to iteratively compute
as:
(6)
We call the process of computing the factors the factorization step and the process of solving the equations
the solve step. The total flop count of a solution strategy is then the sum of the flop counts for each of these steps. In many cases the cost of the factorization step is the leading order term.
e.g., consider the Cholesky Factorization method that is commonly used in statistical software. We know that for every there exists a factorization:
(7)
where is lower triangular and non-singular with positive diagonal elements. Cost of computing these Cholesky factors is
flops. By contrast, the resulting solve steps of
and
each have flop counts of
flops bringing the total flop count to
flops. In the general case, the effort involved in solving a linear system of equations
for
when
grows with
. Boyd and Vandenberghe argue that “for
more than a thousand or so, generic methods… become less practical,” and financial markets definitely have more than “a thousand or so” trading opportunities to check!
3. Asset Structure
Consider constraining traders’ cognitive effort in an information-based asset pricing model a la Kyle (1985) but with many assets and attribute-specific shocks. Specifically, suppose that there are stocks that each have
different payout-relevant characteristics. Every characteristic can take on
distinct levels. I call a (characteristic, level) pairing an ‘attribute’ and use the indicator variable
to denote whether or not a stock has an attribute. Think about attributes as sitting in a
-dimensional matrix,
, as illustrated in Equation (8) below:
(8)
I’ve highlighted the attributes for Micron Technology. e.g., we have that while
since Micron Technology is based in Boise, ID while Western Digital is based in SoCal.
Further, suppose that each stock’s value is then the sum of a collection of attribute-specific shocks:
(9)
where the shocks are distributed according to the rule:
(10)
Each of the indicates whether or not the attribute
happened to realize a shock. The
term represents the amplitude of all shocks in units of dollars per share, and the
term represents the probability of either a positive or negative shock to attribute
each period.
If value investors learn asset-specific information and Kyle (1985)-type market makers price each individual stock using only their own order flow in a dynamic setting, then each individual stock will be priced correctly:
(11)
where denotes the aggregate order flow for stock
. Yet, the high-dimensionality of market would mean that there still could be groups of mispriced stocks:
(12)
where denotes the sample average price for stocks with a particular attribute,
. This is a case of more is different. If an oracle told you that
for some attribute
, then you would know that the average price of stocks with attribute
would be:
(13)
since in a dynamic Kyle (1985) model where informed traders have an incentive to trade less aggressively today (i.e., decrease
and thus
) in order to act on their information again tomorrow. In this setting,
will be less than its fundamental value
even though it will be easy to see that
as
.
4. Arbitrageurs’ Inference Problem
So how much effort does it take to discover the set of shocked attributes, :
(14)
given their price impact? What’s stopping arbitrageurs from trading away these attribute-specific pricing errors? Well, the problem of finding the attributes in boils down to solving:
(15)
for where
,
, and
. i.e., this is a similar problem as the linear solve in Section 3 above, but with
additional complications. First, the system is underdetermined in the sense that there are many more payout-relevant attributes than stocks,
. Second, arbitrageurs don’t know exactly how many attributes are in
. They know that on average,
; however,
itself is a random variable.
It’s easy enough to extend the solution strategy in Section 3 to the case of an underdetermined system where a solution is a member of the set:
(16)
where is a matrix whose column vectors are a basis for the null space of
. Suppose that
is
-dimensional and non-singular, then:
(17)
Obviously, setting is one solution. The full set of solutions defining the null space
is given by:
(18)
Thus, if it takes flops to factor
into
and
flops to solve each linear system of the form
, then the total cost of parameterizing all the solutions is:
(19)
Via the LU factorization method, I know that the factorization step will cost roughly:
(20)
Moreover, we know from Section 3 that the cost of the solve step will be on the order . However, there is one detail left to consider still. Namely that arbitrageurs don’t know
. Thus, they have to solve for both
and
by starting at
and iterating on the above process until the columns of
actually represents a basis for the null space of
. Thus, the total effort needed is:
(21)
where is the convergence rate and the calculation is dominated by the effort spent searching through the null space to be sure that
is correct. More broadly, this step is just one way of capturing the deeper idea that knowing where to look is hard. e.g., Warren Buffett says that he “can say no in
seconds or so to
or more of all the [investment opportunities] that come along.” This is great… until you consider how many investment opportunities Buffett runs across every single day. Saying no in
second flat then turns out to be quite a chore! Alternatively, as the figure above highlights, this is why traders use personalized multi-monitor computing setups that make it easy to spot patterns instead of a shared super computer with minimal output.
5. Clock Time Interpretation
Is flops a big number? Is it a small number? Flop counts were originally used when floating-point operations were the main computing bottleneck. Now things relating to how data are stored such as cache boundaries and reference locality have first order affects on computation time as well. Nevertheless, flop counts can still give a good back of the envelope estimate of the relative amount of time it would take to execute a procedure, and such a calculation would be helpful in trying to interpret the unit of measurement “flops” on a human scale. e.g., on one hand, arbitrageur effort would be a silly constraint to worry about if the time it took to execute real world calculations was infinitesimally small. On the other hand, flops might be a poor unit of measure for arbitrageurs’ effort if the time it took to carry out reasonable calculations was on the order of millennia since arbitrageurs clearly don’t wait this long to act! Actually doing a quick computation can allay these fears.
Suppose that computers can execute roughly operations per second. Millions of instructions per second (i.e.,
) is a standard unit of computational speed. I can then calculate the amount of time it would take to execute a given number of flops at a speed of
as:
(22)
Thus, if there are roughly characteristics that can take on
different levels and
out of every
attributes realizes a shock each period, then even if arbitrageurs guess exactly right on the number of shocked attributes (i.e., so that
) a brute force search would take
days to complete. Clearly, a brute force search strategy just isn’t feasible. There just isn’t enough time to physically do all of the calculations.
6. A Persistent Problem
I conclude by addressing a common question. You might ask: “Won’t really fast computers make cognitive control irrelevant?” No. Progress in computer storage has actually outstripped progress in processing speeds by a wide margin. This is known as Kryder’s Law. Over years the cost of processing has dropped by a factor of roughly
(i.e., Moore’s Law). By contrast, the cost of storage has dropped by a factor of
over the same period. e.g., take a look at the figure below made using data from www.mkomo.com which shows that the cost of disk space decreases by
each year. What does this mean in practice? Well, as late as 1980 a
hard drive cost
, implying that a
hard drive would have cost upwards of
. These days you can pick up a
drive for about
! We have so much storage that finding things is now an important challenge. This is why we find Google so helpful. Instead of being eliminated by computational power, cognitive control turns out to be a distinctly modern problem.
You must be logged in to post a comment.