von Neumann and Morgenstern’s “Theory of Games and Economic Behavior” is the famous basis for game theory. One of the central accomplishments is the rigorous proof that comparative “preference methods” over fairly complicated “event spaces” are no more expressive than numeric (real number valued) utilities. That is: for a very wide class of event spaces and comparison functions “>” there is a utility function u() such that:
a > b (“>” representing the arbitrary comparison or preference for the event space) if and only if u(a) > u(b) (this time “>” representing the standard order on the reals).
However, an active reading of sections 1 through 3 and even the 2nd edition’s axiomatic appendix shows that the concept of “events” (what preferences and utilities are defined over) is deliberately left undefined. There is math and objects and spaces, but not all of them are explicitly defined in term of known structures (are they points in R^n, sets, multi-sets, sums over sets or what?). The word “event” is used early in the book and not in the index. Axiomatic treatments often rely on intentionally leaving ground-concepts undefined, but we are going to work a concrete example through von Neumann and Morgenstern to try and illustrate a bit more of the required intuition and deep nature of their formal notions of events and utility. I also will illustrate how, at least in discussion, von Neuman and Morgenstern may have held on to a naive “single outcome” intuition of events and a naive “direct dollars” intuition of utility despite erecting a theory carefully designed to support much more structure. This is possible because they never have to calculate in the general event space: they prove access to the preference allows them to construct the utility funciton u() and then work over the real numbers. Sections 1 through 3 are designed to eliminate the need for a theory of preference or utility and allow von Neuman and Morgenstern to work with real numbers (while achieving full generality). They never need to make the translations explicit, because soon after showing the translations are possible they assume they have already been applied.First we introduce some terminology: preference and utility.
We will leave undefined, for now, our space E of events. But all items we are going to talk about a,b,c … should be thought of as coming from this space. This and the slightly clunky pre-order definitions are to stay closer to the sequence of presentation in von Neumann and Morgenstern. Unlike von Neumann and Morgenstern we will explicitly work through a positive general example of an event space later in this writeup.
A preference is a symbol such as “>” which is interpreted as “a > b” means “a is preferred to b.” “b < a” is just a shorthand for “a > b”. A preference is also called a pre-order and is transitive and acyclic. Transitive means if a > b and b > c then we also have a > c. Acyclic means that for any k ≥ 1 we can never find a1, a2 …. ak such that a1 > a2 > … > ak > a1. In particular this implies a preference is non-reflexive (can’t have a1 > a1 by the k=1 case) and non-symmetric (as we can not have both a1 > a2 and a2 > a1 by the k=2 case). We call an pre-ordering complete if it obeys the following extra laws:
- b > c and not(b > a) implies a > c
- c > b and not(a > b) implies c > a
That is: elements that have neither a > b or b > a are considered equivalent in preference and not considered incomparable. By “not(b > a)” we do not mean “a ≥ b” but instead that “b > a” is not a fact in our pre-ordering (i.e. this relation is absent). Notice this is weaker than the rule of total ordering which insists for every a not equal to b we have a > b or b > a.
A utility is a function u() from our space of events into the real numbers that is monotone or order preserving. That is u(a) > u(b) (in the real numbers) if and only if a > b (in preference). von Neuman and Morgenstern further insist on a form of linearity which is the ability to interpolate (eq: 3:5:b):
u( z a + (1-z) b ) = z u(a) + (1-z) u(b)
where z is a number from 0 to 1. We call this the interpolation equation. The requirement ensures the utility is compatible with probability theory in that one can compute expected values. Take care to note that while the right-hand side is standard arithmetic over the real numbers we have not yet defined multiplication by real-numbers or addition in our space of events (used on the left hand side of the equation). This is in fact the tricky point to look out for.
von Neuman and Morgenstern claim prior to their work utility was considered less expressive than preferences (i.e. there were important natural preferences that seemed to not correspond to a useful utility function):
Many economists will feel that we are assuming far too much (c.f the enumeration of the properties we postulated in 2.1.1), and that our standpoint is a retrogression from the more cautions modern technique of “indifference curves.”
(“indifference curves” being a level-set method for dealing directly with preferences without assuming a utility function).
Utility theory should not be confused with “naive utility theory.” Naive utility theory defines the space of events E as a set of simple outcomes and utility directly in expected dollars. For example take each element of E to be a number of dollars with the usual multiplication and addition of real numbers. In this case the only possible utility functions compatible with probability theory are of the form u(x) = s*x + t (s>0). But this naive utility theory is soundly rejected for failures such as: being forced to value a one-half chance at two million dollars as have the same utility as a certain one-million dollars. There are some direct fixes for pricing risk and dealing with the diminishing marginal utility of dollars such as the Sharpe ratio (risk adjusted returns) and heuristics like log-utility (valuing dollars at the logarithm of the total, and idea going back to Daniel Bernoulli and used in the Kelly criterion). But neither of these are simultaneously linear in dollars and probabilities (so will cause difficulties when trying to compute expected values- hence “not compatible with probability theory”). Also note that the trivial preference (value equals dollars) and log-dollar preference are indistinguishable on simple outcome (log() being a monotone function). So the poor expressiveness of point-wise or completely determined outcomes does note hurt just utility functions it damages preferences before we even attempt a conversion. Or we should be suspicious if an event space E made of up only outcomes is rich enough to support even useful preferences.
What von Neumann and Morgenstern knew is you can solve the problem by introducing a richer event space E. As I said they intentionally leave the concept of “event” undefined (in some sense leaving the concept as rich as possible). But, it is easier to reason about if we restrict the concept of event with some more structure. Anything von Neumann and Morgenstern claim will continue to be true (since they worked in purely abstract terms, which is whey they worked in those terms) and we can see a bit more of the intuition. We have a few competing clues as to what events might mean:
- The suggestive choice of word “event.” This is likely referring to a measure-theoretic use of the word a la Kolmogorov‘s “Foundations of the Theory of Probability” (1933) which would have certainly been current in the time of “Theory of Games and Economic Behavior” (1943, research started in 1928).
In this usage an event is (roughly) a possible future state of the whole universe under consideration and a random variable is a deterministic function from this state to measurements. We point this out as this differs from the standard English meaning where an event is an individual measurement. For example if the entire universe of possibilities is me getting paid $10 or $100 and also you being paid $20 or $200 a measure-theoretic event is the simultaneous complete determination of both outcomes at the same time (say I get $100 and you get $200). The individual outcomes (for example what I get) are then merely measurements from this larger event.
- The concept von Neuman and Morgenstern decided to axiomatize is allowing addition and multiplication by real numbers on events.
We think of dollars as adding (which is in fact a red-herring, we don’t want to use addition directly on dollars), but other kinds of events (like complimentary goods) are not obviously naturally additive or scalable.
Our naive event space of real numbers that represent dollars did obey the axioms, but it did not have enough structure to represent any non-trivial utility of dollars (like even the logarithm of the number of dollars). We want to move up to something with a bit more expressiveness; like portfolios, distributions, formals sums, maps, sets or vector spaces.
The simplest example I can think fits the axioms in a meaningful way and can encode meaningful preferences is: E is finite formal weighed sums of complete outcomes/events (states of all the world, encoded as stings for convince). We define E to be the collection of e:
e = w_1 S_1 + … + w_k S_k
(or some k) where w_1,…,w_k are positive real numbers no greater than 1 and S_1,…,S_k are distinct total outcomes (which we will represent as strings). These sums are “formal” in that we have no interpretation of the multiplication w_i S_i (it is just a convenient notation for a real number and a string to be associated together). Remember: formal sums are just a convenient way to write vectors or maps from the S-terms to real numbers.
- Two elements e = v_1 U_1 + … + v_k U_j and f = w_1 S_1 + … + w_k S_k are considered equivalent (as the same event) only when they are just permutations of each other’s terms (that is only when j=k and if there is one-to-one/onto function f(i) : {1,…,j} -> {1,…,k} such that w_f(i) = v_i and S_f(i) = U_i for all i). So we are really working with sums modulo this standard equivalence (this abuse of notation should not cause undue confusion).
- Multiplication by an event by a positive real number r ≤ 1 is defined by:
r * ( w_1 S_1 + … + w_k S_k) = (r w_1) S_1 + … + (r w_k) S_k
and defined as the empty sum for r=0.
- Addition of terms f and e (written as f + e) is defined term by term (as the element with terms v_i U_i where and U_i ≠ S_h for any h; plus terms w_i S_i where S_i ≠ U_h for any h; and plus terms (v_i + w_h) U_i where U_i = S_h).
We only need multiplication and addition for use in the interpolation equation, so it comes for free we will not form any coefficients outside of the range [0,1] when adding or multiplying elements. If fact we will actually be restricted to the subset of E with a non-zero number of terms, the positive coefficients summing to exactly 1 and a reasonable subset of the possible strings. Our formal sums are really just a notation for writing down distributions.
This already sounds complicated (as it is a fairly abstract) and does not capture the full theory (as we could imagine wanting to handle non-discrete systems). In my opinion this E is the simplest event space that can support a good part of the intended interpretation. To evaluate preferences and read an element e = w_1 S_1 + … + w_k S_k as “you have a probability w_1 of being in situation S1, or a probability w_2 of being in situation S2, … or a probability of w_k of being in situation S_k” where all probabilities are considered disjoint (so exactly one of the events S1 through S_k should be considered to occur). We then say a complete system of preferences over E is given by resolving e > f as: do you prefer to be in situation e to f? von Neuman and Morgenstern insisted something claiming to be a preference be able to answer questions of this nature:
It is very natural extension of this picture to perming such an individual to compare not only events, but of combination of events with stated probabilities.
This is a fairly rich system. Different users can express some fairly rich preferences in what set of strings they use and what meaning they associate with the strings. For example we choose the strings that represent dollars in decimal format: “$1.00” , “$0.50”, “-$10.00” and so on. Then different interpretations like the earlier unsatisfactory naive utility could be implemented just as preferring strings with higher dollar value. We can implement the log-utility (for non-negative dollar amounts) by preferring strings with higher log-dollar value (notice as a preference on strings representing positive dollar values this is the same set of decision outcomes as the naive utility). An “at least $10” preference could be implemented by preferring any string encoding at least $10 to any string encoding a smaller value. A crude Sharpe ratio could be (roughly, ignoring the risk-free return) implemented with strings of the form “expectation=$A,stddeviation=$B” and prefering strings with higher A/B.
Notice we have only specified the preferences on pure strings (that is e in E of the form 1 * S where S is a string). We are going to drop the fiction that we like working directly with preferences (since there are so few operations we can perform with them) and move to utilities. For all of the preference functions defined in the last paragraph we clearly defined a utility on strings. We will now define the utility on a full expression as the obvious linearization of the utility on strings to formal sums. So define for e = w_1 S_1 + … + w_k S_k:
u(e) = w_1 u(S_1) + … + w_k u(S_k) .
Now we can finally distinguish the trivial utility apart from the log-dollar utility as trivial(0.5*”$10″ + 0.5*”$100″) = $110 and log-dollar(0.5*”$10″ + 0.5*”$100″) = 1.5 (assuming log base 10 or = log(31.62) not log(110)). Why this is at all interesting follows from Jensen’s inequality. Jensen’s inequality states that for a concave function f() (like log()) we have f(E[x]) ≥ E[f(x)]. So the mixed solutions represented by log-dollar(0.5*”$10″ + 0.5*”$100″) come out conservatively as in in our notation we are taking the expectation outside of the string utility function.
We were a little fast. There are 11 axioms in section 3.6 we claim to obey. Some of these are not trouble. For example axiom 3:A states that the preference you are trying to encode is complete. But if we are trying to show utility functions are rich enough to encode complete preferences: this is not a hinderance (it is something we want to assume in the proof, not a real limitation). Other axioms are about distributivity and monotonicty (also natural for preferences). The real issue are axioms 3:B:c and 3:B:d which encode an Archimedian or intermediate value type property (standing in for continuity). These axioms are:
- (3:B:c)
u < w < v implies there exists z (0 < z < 1) such that
z u + (1-z)v < w - (3:B:d)
u > w > v implies there exists z (0 < z < 1) such that
z u + (1-z)v > w
These axioms are severe additional restrictions to what might be considered an acceptable preference. If you want to attach the von Neuman and Morgenstern utility theory a good counter example would be a preference somebody cares about that meets most of the section 3.5 axioms (most of these are central to the idea of utility) and violates the 3:B:c/d axioms (thus establishing the 3:B:c/d axioms violate some intuition). And von Neuman and Morgenstern do exactly this in their appendix dealing with axiomatic utility. They introduce an event space E that is very close to our construction but give it a lexicographic order to force gaps (whereas we implicitly gave an pre-order arising from the additive structure). A lexicographic order is exactly the type of ordering you need to express a preference like “the most expected return of all the returns that are certain of at least $10.” They admit that this preference can not be encoded as a utility (their statement is “these utilities are clearly non-numerical” to be read the algebra of their complex objects doesn’t have a homomorphism to the real numbers). However they end with:
Such a non-Archimedean ordering is clearly in conflict with our normal ideas concerning the nature of utility and preference.
I disagree, as I feel there are natural situations where you may want to maximize expectation subject to certain return being above a given bound. It is true enforcing certainty is very expensive (you tend to give up a lot of positive expectation to do so, which is why this is not how we do our retirement planning)- but it is still a natural ask to consider and price. I also find it telling that von Neuman and Morgenstern carefully lay out the algebra for this complicated example (how addition, multiplication and comparison works) implying they may not have intuitively felt they needed to work directly over spaces as complicated as our E prior to this counter-example.
As I mentioned earlier, von Neuman and Morgenstern never have to calculate in on outcome space like our E (prior to working on a counterexample in their appendix) because they essentially use the theorem that any preference can be mapped to real numbers to avoid worrying about structures of preferences. They assume the transformation to real numbers has already been applied and happily work there for most of the book. Their stated intuition (despite having proven general theorems) keeps utility close to the notion of dollars. Indeed they say:
This difficultly [ … likely the risk of different equivalent preferences/currencies labeling different production/consumption plans as optimal … ] indeed has been ploughed under by our assuming in 2.1.2 a quantitative and even monetary notion of utility.
([]- comment, mine).
The 3:B:c/d axioms essentially disallow preferences that have gaps (which would require discontinuous jumps that a continuous utility just can not encode). These axioms also tell us once we have assigned utility values to a pair a > b we can use bisection search on the interpolation equation to approximate all other utilities.
You can express some non-linearities and higher order interactions in an Archimedian system; but it has to be in the strings not the coefficients. For example you could use the strings to encode ordered sets of items: “{bottle-opener}”, “{hammer}”, “{wine}”, “{hammer,wine}”, “{bottle-opener,wine}” … . You can then create utility functions that only value certain combinations (complementary goods, like the wine and the opener; or substitutable goods like wine and beer). What is going on is we are encoding all cleverness in how the atomic events (the strings) are valued. Also notice that items in the “event space” are not events in the measure theory sense- but formal sums over events. This is part of our reason for working an explicit example, you really have to see that if E is just simple point measurements (instead of set-valued events) the math does not give you anything other than trivial valuations that can not represent interesting preferences (so the name events space is a bit of a misnomer, but extending to sets, formals sums, vectors is a classic math move when your structures are not detailed enough to carry the information you wish to manipulate).
And that, to my mind, is a faithful decoding of the content of the first three sections of von Neuman and Morgenstern. We see both preferences and utilities only become interesting when you move up to formal spaces over individual outcomes (sets, tuples and formal sums over sets and tuples). One way to tap some of the power of the theory is to design utilities over the atoms (the indivisible events you are writing formal sums over) and let the linearization force the rest of the behavior of both the preference and the utility. Certain preferences can not be encoded as utilities until you move to a sufficiently rich space to work over. Hopefully this won’t be seen so much as mis-read of “Theory of Games and Economic Behavior” but a worked exercise.
Categories: History Mathematics Opinion
jmount
Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.