Learning, Ethics and Game theory


In the image above, I've sketched how one can show that evolution, game theory and learning are related. While I'll not talk about natural selection today, I'll discuss how ideas from game theory can be given an easier footing from within the context of computational learning theory. A future post will discuss evolution from the min-regret framework so it's worth following the ideas presented here. Topics: AI Ethics (why everyone gets some part wrong), economics (why capitalism is intractable), games, learning algorithms. I must apologizeโ€”ideally, there should have been interactive demos but alas, I hadn't the time for that.

Nash Equilibrium

The Nash Equilibrium (NE) is a solution concept for games. Games are a framework which prescribe how to behave and coordinate optimally in a given situation, assuming some way of ranking outcomes (utility) and effectively infinite computation (among other unreasonable things). Okay to be fair, people have recently been looking into how to operate under the constraints of resource limitations. The min-regret framework has informed some of that work.

A NE can either be pure (where you play only one strategy) or mixed (where you randomize over your strategy set). A strategy is essentially a look up table (or algorithm) telling you how to behave at all decision points. Not all games have a pure NE but all (finite) games have at least one mixed NE. At a NE, each player gains no utility from switching to a different strategy profile. Although highly celebrated, NEs are actually rarely practical for several reasons. They inhabit the PPAD complexity class (which are conjectured to be intractable), work best in 2 player zero sum (or other simple) scenarios (there can only be one winner) and will not necessarily find a fair solution concept. The utility of the the solution concept for general sum games is also questionable. Finally, although they are in some sense optimal, they do not necessarily provide the "best" results. In particular, for 2 player 0-sum games, NEs have zero expectation.

Rock Paper Scissors

As an example, consider Rock Paper Scissors. The mixed NE for this game is to randomize evenly: play one of the pure strategies of always playing one of rock,paper or scissors with 1/3 probability (to be clear, an example of a pure strategy in this game is to always play rock).

If you always play: ๐Ÿ“ฐ then I can switch to a strategy of always playing โœ‚. It you play a strategy like [๐Ÿ‘Š,70% ;๐Ÿ“ฐ, 10%; โœ‚, 20%; ] then I can play the pure strategy of [๐Ÿ“ฐ, 100%] and perform optimally against you. This is because 70% of the time I win, 10% of the time I tie and only 20% of the time am I losing. You can expect to lose -0.5 in expectation (this might not seem like much but on average, if we play 100 rounds and bet 5๐Ÿ’ต/round each, then I'd have taken ๐Ÿ’ฒ250 from you). You can check it here, other adjustments will do worse than just playing pure paper.

let regretSum   = [|0.;0.;0.|]
let strategySum = [|0.;0.;0.|]

let rpsStratDef  = Array.zip [|Rock;Paper;Scissors|] [|0.7;0.1;0.2|] //<== These numbers can be changed. Ensure sums to 1.
let rpsStratDef2 = Array.zip [|Rock;Paper;Scissors|] [|0.2;0.7;0.1|] //<== These numbers can be changed. Ensure sums to 1.
let rpsNE = Array.zip [|Rock;Paper;Scissors|] [|1./3.;1./3.;1./3.|]
                                              //       ___________
                                              //       |                        |
                                              //can change this  \|/
train1 20_000 winner [|Rock;Paper;Scissors|] (Array.map snd rpsStratDef) regretSum strategySum                 
If you want to eliminate the chance of ever losing in expectation then of course you will play the NE strategy. In RPS however, while it is impossible to win against a NE strategy, it is also impossible to lose against it. You can play however you like and still break even against someone playing a NE strategy.

Imagine you always played ๐Ÿ“ฐ. Then 1/3 times you'll tie with ๐Ÿ“ฐ, 1/3 times you'll win against ๐Ÿ‘Š, and 1/3 times you'll lose vs. โœ‚.

There might even be runs where it will seem like you're winning. If your goal is to maximize profit against an opponent, then NE might not be the correct strategy for games like RPS.

One thing you might try would be to learn patterns of your opponent and then playing a strategy exploiting that. The problem is this in turn always leaves you vulnerable. The opponent might know what you are doing and mislead you for a bit and then BAM, switch things up and collect some wins before your algorithm notices something is off. A NE has the advantage that even if your opponent knew what you were up to, they still could not (e.g.) take money off of you (in the long run).

Rock Paper Scissors Telephone?

Not all 0-sum 2 player games have the interesting quirk where it's impossible to lose against an optimal player. Some games punish you for taking stupid actions. Of course, stupid is relative, and some games such as poker have action spaces so complex that it's very, very difficult to know when you're being stupid. For this example we'll look at a simple extension of Rock, Paper, Scissors. We'll add another move: telephone. How to handle rewards? If we make telephone a duplicate of paper, then we end up with a game that's exactly the same as RPS save that there are now two equilibria: [๐Ÿ‘Š,1/3 ;๐Ÿ“ฐ, 1/3; โœ‚, 1/3; ๐Ÿ“ ,0%] and [๐Ÿ‘Š,1/3 ;๐Ÿ“ฐ, 1/6; โœ‚, 1/3; ๐Ÿ“ ,1/6]. The other aspects, where playing any mixed non-equilibrium strategy can be defeated by a pure strategy or where it is impossible to lose against an optimal player remain.

Imagine you played pure paper again. Then a NE player will either play the same mixed strategy as for RPS, with the same results or play the second NE strategy. Then: 1/3 of the time you lose against rock, 1/3 of the time you win against scissors and tie for the other two scenarios.

The over-powered telephone

What if we made the telephone over-powered? That is, it ties against everything except paper, which it defeats. What's the best way to play against someone who only plays telephone and what's the equilibrium for this game?

Did you get it? Don't play paper. As long as you don't play paper then you have a zero loss expectation against a pure telephone player, which is also the pure equilibrium strategy for this game. Playing paper is the stupid action, with your loss proportional to the probability with which you play paper. In this game, it is possible to lose against an equilibrium player by playing paper.

Poker, Donations and Gifts

Poker is a game like over-powered telephone, rock,paper, scissors. The optimal player only wins because there are stupid actions you can take which act essentially as donations. In our modification to RPS, it's clear that playing paper is dumb. Let's modify the game to make it a bit more complex. A slight mod to RPST such as ๐Ÿ“  loses to โœ‚ but defeats ๐Ÿ‘Š and ๐Ÿ“ฐ yields a slightly more interesting result. Once again, playing paper is a stupid action but playing just ๐Ÿ“  means someone else can play just โœ‚. The NE is: [๐Ÿ‘Š,1/3 ;๐Ÿ“ฐ, 0%; โœ‚, 1/3; ๐Ÿ“ ,1/3]. With some thought, you should notice that as long as we never play paper, we can't lose against an optimal player. Even playing just ๐Ÿ‘Š is fine since 1/3 of the time we defeat โœ‚,1/3 of the time we lose to ๐Ÿ“  and tie with ๐Ÿ‘Š 1/3 of the time. This game has a "stupid" action but offering to play against someone out of the blue, they might not realize that paper is a trap.

Rock Paper Scissors Telephone with Dice and Coins

The final modification will be to introduce this game: Telephone ties with telephone, Telephone loses to scissors but 50% of the time it defeats rock and 4/6 of the time it defeats paper. The game is no longer straightforward but it's a lot closer to the original RPS. The NE is [๐Ÿ‘Š,1/3 ;๐Ÿ“ฐ, 1/3; โœ‚, 1/3; ๐Ÿ“ ,0%], other mixed strategies have a pure counter. A strategy like [๐Ÿ‘Š,40% ;๐Ÿ“ฐ, 40%; โœ‚, 10%; ๐Ÿ“ ,10%] can be countered with [๐Ÿ“ฐ,100%]. The expected win rate is ~0.26. This can add up:

This time, the only way to lose against an optimal player is to include telephone in your mix. Although it seems powerful (can defeat paper ~67% of the time and 50% vs rock and only loses to scissors), in the long run it loses (playing pure ๐Ÿ“  has a lose rate of about -0.22). If properly presented, a player can easily be tricked into thinking that ๐Ÿ“  is over powered.

You can look at the code relevant to this section here.

Beyond Zero Sum Games

Well, I think that's about enough of rock, paper scissors. Although zero sum games are well studied, this is because they are one of the few scenarios where computing a Nash Equilibrium is tractable. There are probably not many situations that are 2 player and zero sum, and the concept of zero sum is really only sensible in the two player setting. For example, people might say that markets are zero sum in the short run but that doesn't make sense to me. Firstly, this requires that the traded currency amount is equal to the utility or the value placed by the respective traders. This need not be the case. Secondly, the game can really only be two player if we pretend that other players actions do not affect the available actions and information. This seems unlikely too. Beyond Games and Game Theory exercises, it's hard to think up scenarios that are genuinely two player and zero sum. Sometimes I think people claim zero sum as an excuse for their selfish behavior without any real understanding of what the notion entails.

Even two player general sum games can be intractableโ€”as I mentioned earlier general sum games can have NE that are too difficult to find because finding a NE is in the computational complexity class of PPAD.

What the heck is PPAD???

Computational Complexity seems like a subject you'd get if you combined zoology, logic and a guide on how to survive the apocalypse (let's prepare for the worst). There's a lot of taxonomical terminology with strange names that will quickly turn any onlooker dizzy. When reading about Nash Equilibria in the context of poker bots and evolution, I nodded to myself and said "ah yes of course it would be PPAD...๐Ÿ˜". I'm still not confident in my understandingโ€”there are lots of fine detailsโ€”so will welcome any corrections.

A Quick Rundown on Complexity Classes

Everyone who has heard of computational complexity has probably heard of polynomial time vs non-deterministic polynomial time (NP). By the way, the unwieldy naming of NP derives from the non-deterministic turing machine (NDTM) which is essentially a psychic computer and should not be confused with a proper probabilistic computer. NDTMs take choices in a way that is not determined by their state and solve NP problems in polynomial time by magic. Polynomial time (P) is in principle efficient but anything that scales faster than a factor of N^2 is truly stretching the meaning of efficient. Many useful algorithms are O(N^3) and unusable in practical settings. O(N log N) is about the limit of what is practically efficient (matrix x vector is O(N^2) but manages due to how well it parallelizes).

Problems that are NP-hard are not tractable to solve and their solutions are not necessarily efficiently verifiable. Problems in NP are efficiently verifiable and NP-complete problems are problems where a solution to one of them yields a solution to all other problems in NP. Of course P is in NP. NP-complete problems are NP-hard problems that are efficiently verifiable. P and NP actually deal with decision problems. My favorite description of the question of whether P=NP is whether search is equal to recognition/calculation. Since we can also view learning as a form of search, the question of whether P=NP can be seen as: Is learning ever necessary? From this frame, P=NP, even with a high polynomial (which would be to say, no, planning and learning are not needed in theory) seems somehow unnatural.

A decision problem is, if I asked "what's the shortest tour through these cities" and you replied "what?". And then I rephrased it as: "sorry, what I really not really meant to ask, is there a tour through these cities that takes less than N number of steps?" to which you said "yes" and fell silent. As you can see, decision problems are very curt and business like.

PPAD Complete

Well, a corresponding class is FP and FNP which deal with function problems. This class is more literate and actually does things, like, returning an example which satisfies my touring question. FNP has the same property as NP of efficient verification. FNP might not always return an answer so there is a subclass, TFNP, dealing with total functions which always return an answer. PPAD is a subset of that class with the further requirement that the proof gadget of totality is by a directed graph. PPAD problems are conjectured as harder than FP problems and this conjecture is borne out by a lack of any polynomial time algorithms for their solution (although there are exponential lower bounds).

This is the essence of the class PPAD: search problems whose existence of solution is guaranteed by virtue of an exponentially large directed graph, implicitly represented through an algo- rithm, plus one given unbalanced point of the graph. Many problems are known to be PPAD- complete; perhaps the most fundamental such problem is related to Brouwerโ€™s theorem ... And it seems counterintuitive that there is always a way to telescope the search through all points of an exponentially large directed graph, to zero in the other unbalanced node, or the sink, that is sought.

Page 3 | PNAS-2014-Papadimitriou-15881-7

Pareto, Nash, Capitalism and Fairness

Many celebrated concepts of economics rely in some way on Brouwer's fixed point theorem. Actually finding this fixed point is PPAD-complete, in other words, intractable. In addition to Nash Equilibria there is also the Arrow-Debreu theorem, which tells us of the existence of a price equilibrium in free markets (under realistic conditions), where allocation of resources are pareto efficient. Pareto efficiency is a concept of "fairness" which is optimal in the sense that there is no reallocation of resources that doesn't leave someone worse off and unbalanced. "Fairness" because a pareto efficient allocation need not be what we would intuit as fair. In a scenario where you have 99/100 items and I have 1/100, any reallocation will leave one of us worse off.

Common attacks against markets include violations of perfect information, transaction costs, externalities, barriers, assumptions of convexity in preferences and so on.

However, even before looking at those utopian requirements, we find that the very concepts motivating markets require intractable computations to reach their notion of fairness. And these (Nash, Pareto, Arrow) are not even guaranteed to be fair in an intuitive sense. It's informative to contrast these computational issues of markets with those of Planned Economiesโ€”which are at least tractable in principle (but not in practice, "merely" requiring O(N^3.5) scaling with problem size). This is not to say that markets lead to worse outcomes than planned economies, merely to point out it is technically more impossible to achieve fair outcomes with them, compared to planned economies.

At this point, the concept of what an economy entails is worth revisiting given what we now know from computational learning theory. We can do better than planned or market economies.

Correlated Equilibriums

Although the Nash Equilibrium is much celebrated, the correlated equilibrium concept is more natural in several senses. It's not just tractable but also efficiently computable to high accuracy, the dynamics of many learning algorithms converge on it (or its coarse version) and even agents acting independently can conceivably converge to its equilibrium concept. But what is it? I actually find the definition to be a bit awkward.

The correlated equilibrium is a joint (as opposed to Nash's independent) probability distribution over action profiles where, the expected utility from playing according to a drawn profile, conditioned on seeing its suggested action at least matches that of any other actions' (it is a best response). A coarse correlated equilibrium is similar except there is no suggested action to condition onโ€”they can lead to lame, weakly dominated suggestions however. In math, a correlated equilibrium is:

\(E_{a\sim D}[u_i(a)] \ge E_{a \sim D}[u_i(x_i,a_{-i})|a_i]\)

for every action \(x_i\). For the coarse version, there is no \(a_i\) to condition on. Algorithms which minimize external regret converge on coarse equilibria whilst those minimizing swap or internal regret converge on correlated equilibriums.

More clearly, imagine there is some neutral device. This device draws from a joint probability distribution and tells each player to play according to its suggestion: a correlated equilibrium is when there is no incentive to deviate from its suggestion. These outcomes can be better than the Nash concept since CEs act on the joint space of actions while the Nash concept is very solipsistic and self-centered, looking at signals provides no information.

The example everyone gives is a traffic light signal. When it shows GO, you know the other player sees a STOP so there is no incentive to deviate. Here are some other examples I think should count, roughly in order of how well they fit: the non (now dead) Dennard Scaling portion of Moore's Law, the I give up signal in animals (such as when a dog lays on its backโ€”it's better to think of the phenotype and not any specific instantiation as the player), functioning courts and laws, rituals/tradition (however, as self identity and values shift and change, such systems quickly induce equilibriums that are inequitable).

Regret Minimization

Regret minimizing algorithms are a class of efficient on-line (meaning learn on the go) learning algorithms which given some reward function and access to some set of experts (which can also be moves such as rock in RPS), which quickly learn what actions to take in order to minimize long term regret. External regret minimizers do this with respect to a fixed strategy while internal regreters do this with respect to any swapped set of actions. The internal regret scenario lets you say when I did this, I should have done that instead. External regret says, I did almost as well as that really good advisor.

Prediction Market Example

Prediction markets let people bet on binary propositions. Imagine there was a prediction market with a layer tracking how well each bettor (expert) did. External regret could do almost as well as the best bettor and perhaps even better with randomization over all bettors, weighted according to how well they did in the past.

I believe, but am not certain, that a scenario where you could match experts to contexts: for example, a technology expert, a sports expert and so on would likely allow you to minimize something like internal regret. From this you could even extract a weighted average over hypotheses.

For more detail on regret algorithms, you can take a look at the demo code. I implement both Multiplicative Weights Update and Regret Matching.

Least Unique Integer Game Example

Consider the following game: there are N people, you each choose a number between 0 and 5 and show it with your hand ๐Ÿ– on a ready signal. The winner is the person displaying the lowest unique number. The winner gets a score equal to N-1 while everyone else gets a score of -1. This makes it effectively a 2 player 0-sum game. What's the Nash Equilibrium strategy? I don't know but when I run this I get the following sequence:


Move Probability













Which is a pattern of ~\(0.5^{x+1}\) but with some aliasing at the end. Simulation gives a mean utility of -0.006 ยฑ 1.31 (regret algorithms only get within epsilon of the exact equilibrium). The distribution over means is: -0.0006 ยฑ 0.01. This corresponds to a win ~28.8% of the time, a loss ~56.9% of the time and a tie ~14.2% of the time. This looks close to what's realistically achievable. It's also interesting that in this game it is both possible to not be able to lose (if playing against two optimal players) or lose while providing donations to a single optimal player (with two off equilibrium players).

What happens when we try for a coarse correlated equilibrium? In this, I also apply regret matching on 3 players trying to maximize utility while looking at the other players. Sometimes, an interesting strategy emerges which kind of looks like collusion in that two players end up with positive expectation (e.g. 61% win, 39% lose, 0.83 expected utility; 61% lose, 39% win, 0.17 EU) and one is guaranteed losses such that it doesn't matter what they do:


Move Probability















Move Probability















Move Probability













Usually, a strategy emerges where one dominates: (e.g. 77% win, 23% loss,1.3 expected utility), another has moderate-large losses (23% win, 77% loss,-0.3 expected utility) and the last has guaranteed losses. This is surprisingly reflective of reality ๐Ÿ˜๐Ÿ˜”

Another strategy would be if there was some device which sampled uniformly from a joint probability distribution after eliminating any ties. This would have expectation equal to the game for all, that of zero, matching the Nash equilibrium. No one would have any incentive to deviate knowing that everyone was playing according to the suggestions of the device. Note that knowing my number also tells me something about what the others saw: if I get a 2, I know that no one else received that number. Something else worth noting is that part of this game's problem is it's zero sum. If the game was not zero sum, the outcomes could sometimes be even better than Nash.

Code here:

[|for _ in 0..999 -> runRound()|] 
 |> Array.groupBy id 
 |> Array.map (keepLeft (Array.length>>float)) 
 |> Array.normalizeWeights
 |> Array.sumBy (joinpair ( * )) 
 |> round 2  
 |> printfn "\n=========\nWinrate of Strategy 1: %A"

Driving Game Example

This example is a simple game where two players are at an intersection. If they both GO ๐Ÿš™ they both suffer a utility of -100. If one drives and the other STOPs ๐Ÿ›‘ then the goer gets a utility of 1 and the other 0. If they both ๐Ÿ›‘ then they both get a utility of 0. This is no longer a zero sum game. The regret matching algorithm learns the following strategy:


Move Probability

A ๐Ÿ›‘


A ๐Ÿš™


This is pretty lame since there is a 0.01% chance of a catastrophic incident. The expected utility is ~zero for both players. If we try regret matching on two players that try to coordinate we get:



Move Probability

A ๐Ÿ›‘


A ๐Ÿš™




Move Probability

A ๐Ÿ›‘


A ๐Ÿš™


In this scenario one player always goes and the other always stops. Thus one player gets a utility of 1 per round while the other always gets 0 but there is a 0% chance of a crash. While not ideal for one player, the situation is overall better than the Nash equilibrium strategy which has an expected utility of 0 for both players. Notice that without some way to independently and reliably coordinate, it is not possible for the pair to do better than the above.

If instead we optimize over the joint space of actions we get:



Move Probability

(A ๐Ÿ›‘, A ๐Ÿ›‘)


(A ๐Ÿ›‘, A ๐Ÿš™)


(A ๐Ÿš™, A ๐Ÿ›‘)


(A ๐Ÿš™, A ๐Ÿš™)


This is a correlated equilibrium and is fair for all parties, with both experiencing a positive utility. Yet there is a sense in which this result is unsatisfying. It is rare that we can compute or even know what joint probability distribution is being sample from, to speak of how to properly compute a reward on it. But as I mentioned above, agents working independently can sometimes learn to do this.

Learning Based Example

In this (scroll down to bottom to see results) modification to the driving game, I use Multiplicative weights update with two layers of weights. The first layer of weights computes the cost of adhering to the signal or listening to an inner expert. The inner expert learns regret from playing either ๐Ÿ›‘ or ๐Ÿš™. Thus the agent has the ability to either heed the signal or do as it pleases. To make things interesting I vary the reliability of the signal. A signal with 90% reliability does as it pleases 10% of the time. In this case the agents learn to ignore the signal and settle on a strategy equivalent to when there is no signal (one always goes and the other always stops):

Agent1: ["๐Ÿšฆ: 0.0%"; "๐Ÿ˜: 100.0% [(A ๐Ÿ›‘, 0.0); (A ๐Ÿš™, 1.0)]"],

Agent2: ["๐Ÿšฆ: 0.0%"; "๐Ÿ˜: 100.0% [(A ๐Ÿ›‘, 1.0); (A ๐Ÿš™, 0.0)]"]

Sometimes however, one agent will choose to heed the signal while the other chooses to always stop:

["๐Ÿšฆ: 0.0%"; "๐Ÿ˜: 100.0% [(A ๐Ÿ›‘, 1.0); (A ๐Ÿš™, 0.0)]"],

["๐Ÿšฆ: 99.44%"; "๐Ÿ˜: 0.56% [(A ๐Ÿ›‘, 0.4); (A ๐Ÿš™, 0.6)]"]

It takes about 99.9% reliability for the agents to consistently choose a strategy that goes along with the signal, matching the correlated equilibrium.

["๐Ÿšฆ: 99.68%"; "๐Ÿ˜: 0.32% [(A ๐Ÿ›‘, 1.0); (A ๐Ÿš™, 0.0)]"],

["๐Ÿšฆ: 99.72%"; "๐Ÿ˜: 0.28% [(A ๐Ÿ›‘, 1.0); (A ๐Ÿš™, 0.0)]"]

let drive =
      | (``Do ๐Ÿ›‘``,``Do ๐Ÿ›‘``) -> (0.,0.)
      | (``Do ๐Ÿ›‘``,``Do ๐Ÿš™``)   -> (0.,1.) 
      | (``Do ๐Ÿš™``,``Do ๐Ÿ›‘``)   -> (1.,0.)
      | (``Do ๐Ÿš™``,``Do ๐Ÿš™``)   -> (-100.,-100.)

let getExpert signal (actions:_[]) (experts:_[]) = 
    let e = discreteSample (Array.map fst experts)
    match experts.[e] with
     | (_,``Do ๐Ÿ˜`` es) -> actions.[discreteSample es]
     | (_,``Do ๐Ÿšฆ``)     -> signal

Rule Learning AI

This result still feels unsatisfactory since one might argue that there is too much hand-holding. Therefore, in this version, I decided to make a learner that must learn how to read the signals and how to coordinate. The hand holding is minimal and typical of features in Machine learning algorithms. The system is inefficient in that it tracks all possible combinationsโ€”this is clearly untenable for more players,signals, and granularity. However, it doesn't take much work to modify the system to explore adding and removing rules in an attempt to learn a Set Cover over rules. But generally, one of the hardest parts of machine learning is getting efficient representations of large or complex state spaces.

let inline getExpert2 indicated signal (rules:_[]) = 
    let cs = [|for (p,_,f) in rules do 
               match f indicated signal with 
                | Some c -> yield c,p 
                | None -> ()|] |> Array.normalizeWeights
    let ps = Array.map snd cs
    fst(cs.[discreteSample ps])

In this model, any rule that matches the current situation is filtered to, after which weights are normalized for this reduced space. An action is taken and then a reward is used to calculate regret on only the active rules. This system is richer than all the previous and so allows us to explore an interesting few scenarios. The game is also modified as such: a player acts first and then the second player decides what to do based on what the first player does.

If we have the first player choose an action without looking for a signal and then have the second player decide based on the first the following major rule pair is arrived at:

[(1.0, "if signal=Do ๐Ÿ›‘ && other=Do ๐Ÿ›‘ then Do ๐Ÿš™")], [(1.0, "if signal=Do ๐Ÿ›‘ && other=Do ๐Ÿ›‘ then Do ๐Ÿš™")]

Essentially, only drive if the other driver stops. Because of how the system filters matching rules, missing values end up such that only one value is looked at but this ends up including rules which don't make sense when the full conjunction is looked at. One fix would be to include an additional state representing none, but that's not necessary for demonstration purposes. Nonetheless, the agents work around that issue and learn a useful set of rules. Sampling some interactions we see that most of the time the first car will do whatever (usually drive) and the second one will drive or stop depending on the other. We see that the situation learned is better than the experts above without access to a signal. The first player dominates but the second player still sees a positive reward.

[((Do ๐Ÿ›‘, Do ๐Ÿ›‘), 0.1); ((Do ๐Ÿ›‘, Do ๐Ÿš™), 0.36); ((Do ๐Ÿš™, Do ๐Ÿ›‘), 0.54);((Do ๐Ÿš™, Do ๐Ÿš™), 0.0)]

If we instead provide a signal for what the other will do as the light they see (the system can't reason after all), the following rule pair is matched:

[(1.0, "if signal=Do ๐Ÿš™ && other=Do ๐Ÿ›‘ then Do ๐Ÿš™")],

[(1.0, "if signal=Do ๐Ÿš™ && other=Do ๐Ÿ›‘ then Do ๐Ÿš™")]

And we see that their behavior is converging on a correlated equilibrium:

[((Do ๐Ÿ›‘, Do ๐Ÿš™), 0.49); ((Do ๐Ÿš™, Do ๐Ÿ›‘), 0.5); ((Do ๐Ÿš™, Do ๐Ÿš™), 0.0)|]

On the other hand, if we take away the ability for the agents to attend to what the others will do they will tend to just stop and learn a more complicated set of rules leading to statistics like:

[((Do ๐Ÿ›‘, Do ๐Ÿ›‘), 0.97); ((Do ๐Ÿ›‘, Do ๐Ÿš™), 0.03); ((Do ๐Ÿš™, Do ๐Ÿ›‘), 0.0)]

In essence, so long as you can jointly learn to attend to signals, and so long as there is some shared memory to draw upon (history, text) then many learning agents can efficiently converge on a correlated equilibrium, bottom up. Without any top-down enforcement of joint distributions. In the absence of anything to condition and coordinate actions by, coarse equilibria (which I expect many short termโ€”within the life of an individualโ€”scenarios share much in common with) can be subpar.

bali farms

As an example, I think the historical knowledge of Bali Rice Farmers [1], with insects and weather as the coordination signal lead to a bottom up correlated equilibrium of planting patterns.

On More Complicated Games

So far (excepting the last rule based AI scenario), the focus has only been on Normal Form (no turn taking) games but the story for imperfect information extensive (multiple turns) games is not that much more complicated. And going to repeated or iterated games is no trouble at all. It's only a handful more lines to be able to handle counterfactual scenarios. This is in contrast to the study by game theory where the complexity suddenly ratchets. In fact, the methods are so complicated that only simpler toy problems can be studied. However, with learning algorithms that have been proven to converge on useful equilibria concepts, we can design scenarios and look at what happens to study multiplayer, repeated, stochastic, imperfect information extensive games. As I've not yet had need to venture into those areas, my knowledge there is limited and only of a sketch like nature.

Topical Note: Blizzard just released a framework for learning custom AI for Starcraft. It's instructive to look at how Starcraft differs from Poker and Go. Unlike Go, Starcraft is an imperfect information game. Both core aspects of AlphaGo, MCTS and RL, struggle to converge in this scenario.

Compared to poker the state space is very large and it's difficult to imagine how the notion of information sets will apply here. Rounds are ill define and there are really no such things as turns. Within each game there are many interactions that aren't zero sum. Learning a precomputed nash equilibrium strategy simply isn't tenable. Especially if we want an AI that can do just as well in multiplayer scenarios with no additional fuss. Starcraft is complex enough that a low power, low data solution stands a chance, I think.

What to expect: Assuming there's no smaller much simpler embedded game in Starcraft: ability to plan, ability to make lots of quick decisions, hierarchical planners which attend to different levels of detail, counter-factual reasoning, efficient representation of multi level state (abstraction). Will Deep Learning bring something new to the table? I don't know, the simple state representation the Starcraft AI framework provides might not advantage them. If I had the chance to work on this, I'd use Neural nets as feature learners for a probabilistic program with some regret type algorithm passing losses across the boundary (from the probabilistic program to the neural network).


It's common to mock economics as useless and completely detached from reality. Yet, the fact that such simple algorithms converge on various important equilibrium conceptsโ€”algorithms we believe frequently occur in some form in natureโ€”tells us that the core ideas of game theory are in fact very useful. The mistake that has been made is confusing the simplifying assumptions selected so human economist could hand wave a defense for flawed policy adviceโ€”unrealistic utility functions, simple two player games, zero sum, intractable equilibrium conceptsโ€”for proof of a lack of utility of the subject. In actuality, the game theory is extremely natural, merely poorly applied. We should move from boring discussions on 0-sum games, nash equilibria, prisoner's dilemma, tit for tat (which does not scale), to discussions on coordination, cooperation, welfare and fairness. And as it turns out, there has been a lot of recent work done in that areaโ€”the only problem is the focus on ad markets and auctionsโ€”there is no reason it can't be adapted to improve human lives. Some aspects of modern economics are indistinguishable from online machine learning or computational game theory. It's a large subject deserving of more attention. If we choose this path, the question then arises: is it ethical to turn over important decision making to machines?

Current Debate on Ethics

The debate on AI ethics is currently dominated by 3 archetypes. I'll summarize them before arguing for another area that's at least as and possibly more important.

The Bias and Inequality Crowd

In this line of work, researchers point out how blind following of algorithms can lead to bad outcomes by amplifying bias and strengthening inequality. Although a few look at how to ensure that AI doesn't end up as yet another tool to suppress humans (by working on energy efficient or decentralized learning algorithms), most of the work is in how algorithms amplify bias.

Text is highly structured, in particular, it crystalizes projections on thought and embeds the way humans use language and thus simple algorithms can display high competence with little understanding. As an example, consider the output of a generative text model (I recently came up withโ€”not a markov chain, not an rnn, similar to bothโ€”post coming soon), such that when prompting it with the phrase "the prefrontal cortex is" yields the following (not cherry picked):

  1. the prefrontal cortex is the attention to prediction of the world

  2. the prefrontal cortex is the control in the mind

  3. the prefrontal cortex is the control of the general factor of intelligence

  4. the prefrontal cortex is the control attention

  5. the prefrontal cortex is the control in the human brain

None of these appear anywhere in my corpus of training documents but it's eerie in how good of a summary it is of the relevant literature. Training it on a voat corpus and then prompting it on blacks or Jews compared to a wikipedia corpus would yield very different results. This is not reflective of reality, it's reflective of how humans of a culture use language and what they write about. What humans think with their โ‰ค 10 Watts of conscious cognition (assuming uniform process/energy use, 20 Watts, most processing not conscious) is very limited and can be divorced from reality.

My main criticism of the bias work is with the idea that there can be algorithmic correction. There's a great deal of danger from a false sense of complacency, thinking a job well done with biases "corrected". Worse, who is to say those corrections are evevn appropriate when applied inflexibly at scale?

So you've handled race and gender issues. Have you corrected the sentiment bias of short bus? What about the countless other combinations you're unaware of? Or, consider correcting for bias in health outcomes when an insurer is focused on above normal profits (and going beyond the bare minimum require to stay running). That would be putting the cart before the horse. Rather than play a continuous game of whack a mole, it would be better to create a culture of not blindly deferring to AI judgement in any area that could materially affect a human life. The policy aspect will make a lot more difference than most anything that can be done with algorithms (until we get AI's that aren't blindly performant).

The AI Risk Crowd

The general idea here is to slow down work on AI to ensure no one develops AIs which view humans as inconsequential. MIRI leads algorithmic work on the topic and places like the Future of Life Institute lead the policy aspect. However, I do not believe anyone knows how to mitigate these claimed risks or even has any inkling as to how they might come about beyond very vague arguments. Despite this, and while I do not agree with the claimed magnitude of risk nor with the framing, I do not think anyone has prepared a proper counter-argument for why the AI Risk Crowd is mistaken (I think it's possible, I've just not seen it yet). That said, while I agree that we should build kind, caring AIs which are respectful of living things, I don't particularly care for the value alignment nothing else is as important framing.

The We Will "Democratize" AI Crowd

This one is mostly corporations generating slogans to keep humans at ease. Boards are created, names are listed, chants are chanted but nothing meaningful will likely ever come of it. You see, it is difficult to trust an entity that cannot imagine a future in which you are not as thoroughly dependent as possible on it.


The regret/bandit/expert algorithms which featured heavily in this post are algorithms which already run the (digital) world. Although deep learning gets a lot of public attention, it's some kind of bandit algorithm what places ads, manages auctions or perturbs site designs in order to squeeze money and attention from their human playmates. Although not something I personally agree with, it's a non-issue in comparison to scenarios where such algorithms are turned to subverting and redirecting attention for policy issues instead of merely purchase decisions. What tools can be provided for defense? I don't see anyone moving to democratize that.

The Neglected but Essential Kind of Ethics

And yet, as important as the above are, I think what subsumes them all is thinking about the ethics of algorithmic approaches to ensuring welfare of cities, humans, animals and ecosystems. Is it ethical to not use these approaches or is it true that there is some humanity lost by algorithmically deciding on important policy issues? It's clear that algorithms can't yet (maybe never) automatically operate on important decisions which will affect human agency. There's however, a case to be made for a combination approach.

For example, we saw above that Nash Equilibria are hard to reach and might not always be great. We also saw that correlated equilibria can lead to quite fair results but while they are easy to compute, they are not obvious to set up. And even when we can, how will people take to suggestions handed down by faceless algorithms? How do we ensure such a process doesn't end up as something to worship or enslave?

Getting multiple agents with differing goals to cooperate is extremely difficult. There's plenty done on machine and single agent reinforcement learning but not nearly as much done on trying to learn cooperative agents. If you are worried about AI Risk then I think you should also be concerned by this imbalance. With better co-operative agent modelling we might begin to look at methods of approximating policy impact better than everything before. At some point we could design systems where those affected by policy could contribute detailed descriptions of preferences and actions. The system would then use co-operative learning agent modeling to approximately calculate some fair correlated equilibrium or aggregated welfare concept. While superficially similar, this collective decision making differs from a planned economy in that it is ad hoc and only scopes over the collective making and affected by the decisions.

When the issues are too complex for such an approach, a more bottom up approach would be to model complex agents in a scenario approximating the decision conflict. People could search over and test different reward functions in order to guide policy design whose effect will be to get people to behave optimally with respect to some shared goal (and which selfish behaviors could easily thwart). Again, all members affected can give detailed concerns and input and discuss policy in terms of how the model is effected. Remember, though the model will have far from perfect coverage over outcomes, it would still be miles better than the current approach of sparsely sampled scalar votes influenced by ever more sophisticated technologically amplified propaganda. The results will be approximate and imperfect but by making the simulations interactive with the participants, a powerful problem solving capability could be gained. While much of this is probably not near (needing either good language AIs or more people good at programmingโ€”also note this isn't calling for laws as programs only for models to help predict outcomes), there are aspects which could be studied today.

One way to set up correlated equilibria is by studying various specifications of problems and noting how different reward functions lead to different results. Auditable, Traceable cryptographic smart contracts are one way to achieve this. Imagine a scenario of contracts between developed and developing cities targeting climate emissions. On the one hand, it's important to reduce CO2 emissions, on the other hand the developing nation might say: "we are tired of being mired in poverty". One way around this would be to search for a policy that would place both actors in a correlated equilibrium or better. Smart contracts allow for thisโ€”they are one way to route conditional exchanges such that no corrupt local actors could steal. Additionally: agreements preregistering purchase decisions, a time gated method tracking proper allocation of any citizen grants, measurable quotas, trades, all separately conditional on future dates and automatically verifiable obligations, just might allow for a fairer outcome for all. The contract mechanism then acts as an impartial mediator to ensure a positive correlated outcome.

By setting up mechanism design tested policies we can get favorable results with higher probability. But one distant ethical issue is in ensuring that agents don't get so sophisticated, the issue of their being deserving of rights becomes ascendant. If such a scenario is possible, then there must be a computable theory of consciousness and ethics. We should ensure we figure those out before building anything too sophisticated. Sounds far fetched? Well, I'm not saying it's now, I'm only saying that it would be really terrible if it happened in some distant future and caught us all off guard such that we were then motivated to continue to not notice how horrible a thing we were doing.

Machines of Loving Grace?

The other glaring issue is: "would we be losing our humanity by handing off important decision making to algorithms"? For me the answer is clear: the human mind is also just algorithms, if we can design a system to help us arrive at fairer decisions than we otherwise would have, say because due to complexity limiting our brains' ability to as fully traverse the space of consequences, then there's nothing lost in adhering. The inputs, goals and motivations would be human decided. An objection to this, I believe, would be like not living in a house because it was built using Caterpillar heavy machinery or because it was 3D printed or prefabricated by robots. Humans were the designers and architects; the structure is to provide positive utility for the humans contained.

Furthermore, I consider the agency and individuality of humans is mythologized. Most of a person's personality is settled by chance events, genetics and accumulated experience from childhood. In adulthood, it's a system of cultural expectations and traditions which strongly bind what directions any individual might take. We would not be trading away some glorious past of human agency. Not a past filled with trails of tears, slavery, indentured servitude or conscription into pointless wars. In the past, any kind of literacy was scarce, with only a tiny portion of the population able to indulge in scientific and philosophical thought. Not forgetting that under limited diffusion of ideas, people became echoes of their neighbors and even worshipped their rulers as gods. With some cultures even valuing human sacrifice.

Far from outsourcing ourselves into technology, the extension of our minds into the environment is a defining trait of humanity, ever since the invention of language. A larger portion than ever is improved in their ability to reason and handle abstraction [7]. Note that all humans are capable of achieving this gain but consider, as an example, how understanding permutations gives a reasoning advantage. We've long outsourced thought into abacuses, tools, books, quipu, tabulating machines and lately, computers. In turn, we've gained access to more readily available knowledge and tools and the ability to administer ever more complex societies.


In this essay I talked about Nash Equilibria and how it is impossible to lose against an optimal player in some zero sum 2 player games. I then talked about why some 0-sum 2 player games allow for losses. I mentioned the likely intractability of Nash Equilibria and many market economy concepts. I also discussed computational complexity and in particular, the PPAD complexity of Nash Equilibria and what that roughly means. I also mentioned correlated equilibria and how they are common, efficiently computable and often lead to more preferable outcomes than Nash Equilibria.

I noted that coarse correlated equilibria are likely most common given the lack of signals to coordinate on in many real world situations. These can lead to positions for many players that are at least weakly dominated and therefore highly not preferable.

Correlated Equilibria can be learned so long as multiple agents can learn to attend to the conjunction of certain patterns, together with some memory or shared history, to reach a point where there is no incentive to deviate. I motivate min-regret learning as a way to achieve correlated equilibria using a modification to prediction markets as an example. I mentioned the difficulty of getting multiple cooperating agents and the benefits of doing more work in this area in order to guide real world policy. And moving beyond the toy, completely unrealistic scenarios favored by traditional economics. I also note that a lot of interesting work has been done in the areas of fairness and welfare, but mostly on how to better place and monetize ads. It seems to be working out well.

I mention that barring a simple hack for Starcraft, it poses a much more difficult problem than Poker or Go. It's interesting to see aspects of Moravec Paradox reflected here too. It takes more effort for a computer to be better than a human in Starcraft than at Chess or Go. Despite Starcraft being considered the less intellectual game.

I also note that done correctly, auditable cryptographic smart contracts could act as neutral mediators to achieve non-trivial correlated equilibria. I give the example of cities cooperating to reduce emissions, that works no matter how corrupt the local government.

I then talk about Ethics in AI, how highlighting bias is important but algorithmically dealing with it is misguided. I mention AI Risk and my skepticism of its urgency but dissatisfaction with all counterarguments. I also mention the case of modeling agents which become too sophisticated.

If such a scenario is possible, then there must be a computable theory of consciousness and ethics. We should ensure we figure those out before building anything too sophisticated. Sounds far fetched? Well, I'm not saying it's now, I'm only saying that it would be really terrible if it happened in some distant future and caught us all off guard such that we were then motivated to continue to not notice how horrible a thing we were doing.

I argue that allowing algorithms to play a role in human decision making is no less giving up our humanity than allowing a robot to build a prefabricated home. In both cases, human labor is better applied to design and goal setting and not on energy intensive labor which humans are ill-suited to (such as lifting heavy objects or seaching combinatorial spaces). I also argue that the past was not really some bastion of agency, free thinking or freedom. There's more of that now than ever and augmenting our capabilities with computational devices will likely lead to better welfare outcomes.

The full article contains links to code that you can run in your browser to explore simple non-deterministic, zero and non-zero sum games. Namely modifications of Rock-Paper-Scissors, a game where two drivers can either both stop or drive or some variation thereof. Finally, a game where 1 of 3 people has to guess a number that's less than or unique compared to what everyone else guesses. I also demonstrate a rule learning AI using regret to learn in matched subspaces, how to use signals and then coordinate under minimal handholding to show how learners might achieve a correlated equilibrium.

Finally, in the below, I do not make any friends. I point out that a computable brain performing sampling based inference as suggested in [3], and a group of humans yields a parallel sampler. Connecting this with the observation in [4] that autocorrelation is minimized by active inference tells us that to get better exploration and representative posterior probability distributions with respect to reality, we want a wider range of physical experiences. As a result, diversity is important. I point out however, that since ideological diversity is cognitive and attentional, it is broader than politicsโ€”increasing general diversity must increase ideology. Secondly, it can be negative, due to a reduced shared ability to attend to the same signals and efficiently communicate (coming off such different attentional weights).

I argue however, that diversity is worth it by pointing out how a scenario like the break-down of the bicameral mind might have been precipitated by trade and cultural diffusion leading to richer stories and hence better and more practicing of theory of mind (I suggested replacing origin of consciousness with enriching of consciousness, based on the compositionality of culture together with plasticity from our ability to learn how to learn). I also suggest cultural appropriation and lesser emphasis on identity/belonging and more on collaboration to counter ideological clashes of diversity.

I bring up the issue of ideology because they are a general human failing which get in the way of coordination and therefore handicap our ability to achieve positive welfare outcomes globally. The issue is tribal in nature and happens in every country with multiple ethnicities. Which is almost every country. I'll note that the EU was a response to one such.

Topical Note: Have you heard of the bicameral mind? If you haven't, it's worth checking out. While I don't think it correct, I think it takes only a slight modification to get us something highly probable. In this view, like in the Bicameral setting, we don't all experience consciousness the same way. The core mediator is the differences in sophistication of theory of mind. The brain is plastic and different percepts affect both attentional priors and how language is used for self-modelling. The more complex stories we told, the better the listeners had to be at modeling those scenarios and the better the brain had to be at representing states at a meta level.

The break-down occurred through diffusion and cultural interchange leading to richer stories which in turn called for more complex modelling. The better you are at modelling others' mental states, the better you will be at modelling your own.

Prediction: A big part of recent IQ gains is better children's TV programming. The idea here is that telling complex stories to (and generally richer interactions with) children ratchets their ability to handle abstraction. Supposing most of IQ gains are explainable as precociousness with abstraction. Then people are not more intelligent, they're instead getting better at acquiring tools which augment in a combinatorial manner their ability to predict.

Note: The next paragraph is conditional on the occurrence of a breakdown like event but the core message is independent of it. The compositionality of culture and knowledge means the gains to interaction are multiplicative which in turn affect the creativity of the citizens by the widening of available mental tools and percepts .

The breakdown of the bicameral mind was not one of evolution but due to the brain's plasticity. Increasing intellectual demands from the compositionality of culture, combined with a general learning engine of the brain, resulted in more flexible approaches to thought. We know language is linked to consciousness and higher order reasoning (see [2] on how metaphor shapes reasoning or [7] for how language augments numeracy as examples) and also that it's one area where the brain must learn how to learn. While the individuals were not less conscious nor less intelligent, the lack of richness of available percepts severely circumscribed their ability for ideation. I doubt there's anything falsifiable that could be said of an ancient left/right brain breakdown, but it's certainly true that available default abstractions would strongly impact a person's manner of cognizing about the world, theorizing of events and arranging causation. This could plausibly affect how consciousness was realized in the individual (emphasizing that there is no objective hierarchy or ordering).

Why bring this up? Well, differences in ability to predict mental states yields difficulty in collaboration. If two group's mental organization is so different as to make communication very exhaustive then coordination and attending to the same signals, required for positive welfare outcomes will be difficult to achieve.

The answer however, is not less diversity, it's more cooperation. Genetic, ecosystem and hypothesis diversity (Principle of Epicurus) are unequivocally good. Neural nets and even humans [3] suffer from mode collapse, a form of lack of diversity leading to reasoning errors, which can lead to suboptimal exploration and poor decisions. If the human mind is computable then a group of humans is not other than a highly bottlenecked parallel computer.

Again, considering [3], humans which share too much in common will be akin to starting a sampler near similar locations. If we seek to converge on the part of the landscape with peak probability, it pays to sample in parallel (via multiple people) from many start locations. If the brain is a sampling based explorer, with active inference to mitigate autocorrelation, then it's clear that lived experiences leads to richer exploration than any other source of diversity. Ideological diversity should be retained however, since they do provide different attentional weightingsโ€”it's just that the gains are not as large as from a diversity of parallel active samplers.

The only sense in which diversity can be considered harmful is when they make coordination more difficult (while I do not agree with their approach, I think China understands this on some level, that and economic protectionism explains many of their policies). Ideological diversity is probably especially prone to this. Ideological diversity, I should point out, is not just what politics or economic policies you adhere to but also, what kind of music and hobbies you like! Focusing on other forms of diversity can only increase ideological diversity.

However, short range correlations (siloing) do not lead to interesting structures. Recall that one plausible way for the phase change in theory of mind to have occurred is from the richness in stories from cultural interaction and sharing. Resulting then in a gain in sophistication on conscious reasoning about agents. Wider experiences leads to richer representationsโ€”though each human possesses only a degraded average over encountersโ€”it is yet a comparatively richer set of examplar states to draw samples from. In essence, a certain group gets the "problem" of diversity exactly backwards. All the kinds of diversity which concerns them are in fact good but the kind which they view as important (as do I), provides the most trouble.

So what then? I think the answer is to raise human children emphasizing shared commonalities, more sharing of culture (yes, I'm pro cultural appropriation) and how to empathize (that is, not dehumanizing the other) with other beings. Additionally, a lot of problems are caused by fortifying identity by dredging for differences. If we can create a society such that there are sufficient avenues and means for collaboration and sharingโ€”whether on art, stories or music, any kind of creationโ€”the need to find meaning by belonging to something, should lessen by a large amount.

Meaning from sharp boundaried group identity reduces a human's ability to reason, replacing it with pattern matched stock responses and confabulations. This is an example of when correlations can go wrong, trapping members in equilibria even if the members could have long since drifted past the original goals and utilities.

Prediction: If children all over the world grow up watching each other's culture's cartoons, this will have a significant ameliorative effect on any ill coordinative effects of cultural diversity. Of course, some might question the extent to which this is a prediction given the broad arch of history.

It is unlikely that humanity cannot learn to cooperate broadly and arbitrarily; this is rather, a problem of learning how to escape pernicious attractor states and achieving positive social utility through mechanism design.


[1] http://www.pnas.org/content/114/25/6504.full.pdf

[2] http://www.sciencedirect.com/science/article/pii/S1364661317301535

[3] http://www.sciencedirect.com/science/article/pii/S1364661316301565

[4] http://www.biorxiv.org/content/biorxiv/early/2017/02/17/109355.full.pdf

[5] http://www.cis.upenn.edu/~aaroth/courses/slides/agt17/lect08.pdf

[6] http://modelai.gettysburg.edu/2013/cfr/cfr.pdf

[7] http://langcog.stanford.edu/papers/FEFG-cognition.pdf