Vestin: As far as I remember - that's what the theory is all about. With perfect information your certainty would be at 100% and there would be no need for a theory dealing with probability and predictions.
The theory itself is from the realm of epistemology, which is why I've heard about it in the first place.
So the problem is a misapplication of Bayes' Theorem which I take it is the point of the paper - i.e. people applying the theorem when they shouldn't?
One can construct an even more extreme example:
You have two coins which at the start of the experiment are both facing tails-end up: Both coins (labeled 1 and 2) are weighted to land heads-up 80% of the time and tails-up 20% of the time. You can only toss coin 2 if you get tails on coin 1. What is the probability after the experiment of coin 2 now facing heads-up given that coin 1 landed tails?
By the logic stated in the paper:
P(C2-Heads | C1 - Tails) = P(C1 - Tails | C2 - Heads) * P(C2 - Heads) / (PC1 - Tails)
Since one knows one has to have had tails in C1 to even have tossed the coin 2, C1 must have been tails (similarly to reach Warsaw, one must travel through Janki), so P(C1 - Tails | C2 - Heads) = 1. We know P(C2 - Heads) = 0.8 and (PC1 - Tails) = 0.2 so therefore the P(C2-Heads | C1 - Tails) = 4. Clearly nuts. :)
So what went wrong? Bayes' Theorem was misapplied. See it's true that one must reach a certain result for coin 1 in order to toss coin 2 at all but that only determines whether or not the coin 2 toss happens - not the result of the toss. So if coin 1 lands tails-up, coin 2's toss is now independent. When dealing with independent events:
P(A | B) = P(A and B)/P(B) However, since P(A) is independent P(A and B) is the P(A) * P(B). So the P(A | B) = P(A). That would be the correct application of the theorem.
Let's do a quick sanity check: In the coin example what is the probability that coin 2 lands tails-up given that coin 1 lands tails up. From the law of total probability:
P(C2-Heads | C1 - Tails) + P(C2-Tails| C1 - Tails) = 1
0.8 + x = 1
x = 0.2 = P(C2-Tails)
Now let's do the other one:
P(C2-Heads | C1 - Tails) + P(C2-Tails| C1 - Tails) = 1
4 + x = 1
x = -3 = just as impossible as P(C2-Heads | C1 - Tails) = 4 :)
I can make my example act eaxactly like the Krakow - Warsaw example - just make the first coin weighted at 0.3/0.7 heads/tails and the second one 0.5/0.5. However, since I can play with the weights of the coins in more ways than the distances can be played with, I can push the logic in the Krakow-Warsaw example further.
Essentially the set up of the Krakow-Warsaw problem misleads one to think one might have extra information upon reaching Janki, but because one doesn't really have new information, Bayes' theorem is then misapplied in this instance.
It's an interesting example of how one can confuse about whether or not there is a statistical dependency between two events. BUT Bayes' still works just dandy if you get the dependency right. :) That's my take on it at least. That may be the point of the paper though I'm simply to tired to comprehend that. :)
EDIT: One can formulate a near identical problem where one does have more information upon reaching Janski - let's say you are given a car that has an equal probability of have 1-10 whole numbers of gas in it. You are told you need at least 4 gallons to reach Janski (7/10 probability) and and at least 6 gallons to reach Warsaw (5/10 probability). You know you have at least 4 gallons in the car upon reaching Janski - that leaves only 4-10 as possible values - so the probability of reaching Warsaw has been raised to 5/7. In this instance, using Bayes' theorem upon reaching Janski is correct and the conditionalist is right and the determinist is wrong. However, they did apply the constraint in the paper that gas consumption is indeed known to be linear ... in which case the conditionalist is probably right (I'll have to think about it) ... remove that restriction and the paradox definitely still "works".