Wednesday, 9 October 2013

Temporal Slices of Pirates with no Past or Future buying T-Shirts to avoid Infinite Loops

A long time ago Ziggyny had a post about pirates killing each other - or in this scheme choosing not to kill each other - for money.  Specifically the captain - pirate A - chooses a scheme for divvying up 100 coins.  The pirates vote - including the captain - and the captain wins in the event of a tie.  If the captain's scheme is voted in then it is executed.  If the scheme is not voted in then the captain is executed and the deputy captain - pirate B - takes over.  The pirates are named A, B, C, D, and E and decide who replaces a dead captain based on seniority.  Thankfully their seniority is coincidentally the same as the alphabetical ordering of their names.

Now we start talking about rational, self-interested beings.  The pirates all want to get as much loot as possible and know that the other pirates want to get as much loot as possible.  Also, in this story, the pirates value their lives at precisely nothing - rather than valuing them at quite a lot - so I don't really know why the bit about them getting killed is included.  It seems like the puzzle would be less confounding if, upon losing a vote, pirate A simply was excluded from this round of treasure allotment.  After all, that would remove the implicit gain to be had by killing a colleague and thus having fewer people to split the loot next time - the game doesn't say its iterative, but the set-up makes it easy to think of it that way.

To find the solution we think back from the two pirate scenario.  With two pirates the captain splits the loot 100-0 and votes for the split and wins the tie so D gets all the loot and E gets none.  Clearly, E must avoid this scenario, so E should be willing to take even one coin to make sure it doesn't happen.  Induction quickly brings us to a split of 98-0-1-0-1 because C and E will be forced to take it.

For a while I've had a bit of a problem with the concept of a "rational, self-interested" being.  Obviously in real life rational self-interest probably leads to a more equitable split since the captain knows he is going to have to work with these other pirates in the future.  How motivated is pirate B going to be to plunder when he knows he never gets anything?  But we are supposed to limit ourselves to the assumptions stated in the game so the pirates are temporal-slices of pirates who have no future and no past but nonetheless care about how much gold they get.

I said it was odd that the game includes the detail that the pirates die since clearly we are not meant to think of them as valuing their lives.  In the analysis here the value of their life doesn't matter, but it's still a little strange.  I also find games like this don't specify certain details, like that time flows forward.  Okay, so we normally assume that time flows forward in most situations, and maybe it would be up the game to specify that it doesn't.  But let's consider what it means for time to flow forward.

Newcombs paradox is a scenario that specifically questions our notion of time flowing forward.  A super-intelligent alien claims to be able to perfectly predict human behaviour based on a brain scan.  To prove this he sets up a game.  Box A is a clear glass box with $10,000 in it.  Box B is an opaque box.  The human has two choices: either pick box B or pick both boxes.  If the alien predicted that the human would pick just B then there is $1M in box B, otherwise there is nothing.

Again, if we assume time flows forwards then there is a very obvious solution for the human.  Both boxes always contain more money together than box B does alone, so the human always picks both.  But the alien being able to predict the human's behaviour calls that idea into question.  Let's make this clearer.  The "predicting behaviour" thing is a specifically intended to raise philosophical questions.  Instead of claiming to be able to predict behaviour, the alien demonstrates before the game is played that it can travel through time and explains how time travel works in a way that satisfies you that it will effectively - by your understanding of time - be making the choice of whether to put the money in box B after you choose which option to take.  Now it is obvious that the human should just pick box B.  The idea that picking both is better relies heavily on time going in one direction - that how much is in box B is completely determined before you make your choice.

So back to the pirates.  By time flowing forward, we mean that the pirates can choose their strategy based on information in the game at each turn.  Just like in Newcomb's paradox, as long as you have information about the past - that the money is either there or not - you can make a decision based on that information.  If you can't rely on that information you use a different strategy.

So suppose that all pirates agreed ahead of time to strategy S1: [If they get floor(loot/#pirates) then they vote yes, anything less they vote no.  If they get to split the split it evenly {here evenly will mean giving any extra that cannot be divided evenly to the person dividing it}.]  One clever pirate realizes that the game takes place over time.  Can that pirate improve his lot by changing strategies?  Pirates B through E can't.  Pirate A can: simply give two other pirates 20 and 60 to themselves.

What if they use S2: [Vote yes for floor(loot/(ceiling(#pirates)).  Split by randomly giving that amount to two other pirates and keeping the rest.]  Now pirate A, C, D and E can't improve their strategy by noticing how time works, but pirate B can.  Pirate B can decide to vote no.  Then when he gets to split the loot he splits it 50-50 between himself and one of the others and gets 50 instead of 33.

This means that these strategies are not in "Nash Equilibrium."  A strategy is in Nash Equilibrium if no player can improve upon it by unilaterally changing their own strategy.  Note that Nash Equilibrium doesn't mean good result.  In the game of two cars driving in opposite directions on the street, there are three Nash Equilibria: both go left, both go right, and both flip a coin.  That last one results in a head-on collision 50% of the time.  A Nash Equilibrium means that no player can better their payoff without another player also changing their strategy as well.  I have every reason to suspect that by allowing each pirate to iteratively change their own strategy we could end up with the 98-0-1-0-1 split given in the simple analysis.  That *is* in Nash equilibrium.  If anyone changes their strategy unilaterially then they get equal or fewer coins.

So why would we talk about Nash Equilibria instead of simply solving the question as we do above.  The answer is basically because of Nash's Existence Theorem that proves that in games with finite players and finite choices there *is* a Nash Equilibrium.  The problem with doing analysis in an ad hoc way, no matter how convincing, is that with time flowing forward and players being able to make decisions based on information in the game and based on what they believe other players will do, you cannot actually prove that there is a solution.  The proof above looks pretty good, except that I could ask, "But what if pirate E votes no to the first vote?"  Well, that's easy, you say, then pirate B splits 99-0-1-0 and E gets nothing so it's worse for E.  

But now everyone has the information that E voted no.  How does that change B's assessment of the right way to split now that he knows that giving a pirate 1 coin doesn't necessarily buy their support?  We don't know that there even is a solution.  By voting no player E may put the pirates into an infinte loop.  The rules of the game didn't say that they were given a finite amount of time to make up their minds how to vote, so that may destroy the entire game.  At any rate, we don't know what would happen, and if we don't know what would happen we can't promise that it is a worse result for pirate E, who is the one who has to make the decision.  The problem with the solution that I outlined is that it both assumes time flows foward *and* relies on not getting certain kinds of disturbing information about the past.  If it gets to just pirate D and pirate E then pirate E gets 0 - that's for sure, but how sure are we that taking one coin instead of zero is the strategy that gets you the most loot?

After all, isn't a better strategy for E to demand 2 coins instead of one?  If that is E's strategy and player A knows it and plays rationally then player A gets 97 and player E gets 2.  Sure, you can say that E knows the split wasn't 97-0-1-0-2 by the time it gets to him so he should go ahead and take his one.  But if pirate A is Newcomb's alien then Pirate E has to be ready to vote no to one coin in order to get his two.  The pirates don't have to be able to communicate ahead of time to realize this possibility, and assume that there is perfect information and a perfect ability to reason out the other pirates' strategies they all already know this problem.  They are at a standoff and pirate A never splits at all.

Suppose that all pirates play S3: [Split the loot evenly.  If it's an even split, vote yes, otherwise vote no.]

Now can any pirate improve their standing by fixing their strategy?  No, they can't.  If player A doesn't split it 20-20-20-20-20 then he gets zero and everyone else gets 25.  If any other player changes their strategy it doesn't matter because they will just be outvoted on the first round anyway.  This is a Nash Equilibrium.

So, suppose our temporal slices of pirates show up in the game wearing T-shirts printed with "Even split or I vote 'no'."  The opening problem didn't say anything about how the pirates were dressed, but we were meant to assume something, so I guess it's up to me.  Giving up their strategy in advance and sticking to it gets players B through E 20 instead of 0.5 each - just like knowing that the alien can actually see the future gets you a million dollars instead of just $10k.

Most games seem to favour the player who goes first.  There isn't a particular reason that this is true, it can be a lot of different things.  As we've seen here, sometimes the reason is that the first player gives information to the second player that the second player can act on.  It seems strange that this might put the second player at a disadvantage, but in Newcomb's paradox - and possibly also in the pirate game - it actually makes the second player much worse off to know about the first player's play play.  Fortunately in a game where time does flow forward the second player has an out - they can predetermine their actions or determine them at random up the point that having the extra information is no longer a detriment.

The non-first players can only actually gain this advantage, however, by sticking with the plan.  A being of rational self-interest has no ability to stick to a plan even when it is rational and self-interested to do so.  Ultimately for such beings there is Nash Equilibrium and there is nothing.  Of course, for these pirates, perhaps an infinite consideration of the problem is the best outcome: as soon as the problem ends they wink back out of existence.

1 comment:

  1. "And then Pirate A realised that he wasn't what he thought he was. He was no more a rational self-interested pirate than he was words displayed on a screen. He wasn't even a facet of the totality of the universe. He was a cage imprisoning nothing. And then the door to that cage swung wide and it was consumed by the nothing inside. Pirate A had escaped the wheel of samsara.

    Pirate B proposed an even split and it was unanimously accepted."

    ReplyDelete