4 Easy Ways to Lie with Bayes' Rule and Call It Rationality

4 Easy Ways to Lie with Bayes' Rule and Call It Rationality (https://www.reddit.com/r/SneerClub/comments/9ninfm/4_easy_ways_to_lie_with_bayes_rule_and_call_it/)

posted on October 12, 2018 09:01 AM by u/TheHiveMindSpeaketh

102

Would you like me to show you the path to enlightenment? Feast your eyes:

P(B|A) = P(A|B)*P(B) / P(A)

Is the path not yet clear to you? I’ll give you all a bit more of a push. Some say Bayes’ rule is the ideal decision-making procedure. I say it’s also the perfect way to smuggle your personal biases and hobby-horses into any argument or discussion, all under the cover of mathematics (insert sparkle effect).

A simpleton sees in Bayes’ rule a way to derive the probability of a hypothesis (or belief) B, having learned the fact A, by the use of only three numbers: P(A|B) (the likelihood of A given B), P(B) (the prior probability of B), and P(A) (the probability of the evidence we’ve learned). A humble formula which, when watered from the garden hose of truth, bears delicious fruit. But I can teach you how to lie and deceive with each of these three values, plus a bonus method I’m throwing in as a special favor to each of you - four ways to baffle and bamboozle your friends under the guise of pure mathematical reasoning. Here they are, in order from least to most useful.

The first way to lie makes use of the evidence, P(A). Now admittedly, it’s often not useful to lie about P(A). Typically when we want to deceive people via Bayes’ rule, it’s because we’re trying to inflate or deflate the perceived probability of the hypothesis, B. A is the evidence, which (hopefully) by virtue of having already happened, is uncontroversial and hard to lie about. And P(A) has nothing to do with B - it’s a scaling parameter which applies to any old hypothesis equally. Some people simply ignore it in their calculations. So can we use it at all?
Of course! A feature of probability is that the sum of P(B|A) across all mutually exclusive hypotheses B should equal 1. The scaling parameter is needed precisely because some hypotheses are not compatible with A and their posterior probability goes to zero. The probability of the remaining hypotheses must be raised in turn. And if the evidence was very unlikely (say P(A) = .1), the posterior of a compatible hypothesis scales up in turn (in this case by a full order of magnitude!). So if you’re motivated to say that a hypothesis B is still unlikely even after some surprising evidence which makes competing hypotheses untenable, leave P(A) out of your calculation!
P(A) is also an (in)sanity check of the reasoner. For example, I recall a case where an interlocutor told me they had a prior P(B) = .3 and a likelihood P(A|B) of .5. Fine, but then he assigned a value P(A) = .9! Check quickly to convince yourself that these numbers are impossible! Of course, if you can slip something like this by, an over-inflated P(A) can be used to decrease the posterior probability of whatever you’re interested in (or vice-versa for under-inflation).

The second way to lie relies on the likelihood, P(A|B). Now, conditional probabilities are difficult to come to any agreement on for complex, real-world issues. So why does our formula require us to use one? Well, Bayes’ rule is fantastic for solving things like STATS 101 word problems, which will give you P(A|B) explicitly. In the real world things are much less obvious - but that’s good, because it means more ways to lie!
Assigning a value to P(A|B) asks you to generate a story about how plausible it is for the evidence A to come about in a world where B is true. If you’re a truly motivated reasoner (and I’m sure you are), you should be very good at coming up with such stories! I recommend practicing at coming up with all kinds of stories about various sorts of worlds, so that you can make a convincing case for any value of P(A|B) for any A and any B (bonus points if you can get a value that’s less than 0 or more than 1). Here, watch:
A = Trump says no collusion, B = Trump didn’t collude
P(A|B) is obviously as high as 1, he’s an innocent man defending himself against unfair treatment!
P(A|B) is obviously as low as 0, an innocent man wouldn’t rail about his innocence, he’d let the investigation bear out!
What’s extra nice about lying with P(A|B) is that assigning a value here asks you to reason “back in time” about a probability of A occurring. But A has already occurred! This naturally means people will accept over-approximations of the likelihood of A occurring, even if in fact it was a very unlikely event. If you want to get a higher posterior, pump up that P(A|B) and say it was always obvious that it was going to happen! Extra points if you can simultaneously inflate P(A|B) while deflating P(A) to really get that posterior up there!

The third and omnipresent method of deception uses the prior, P(B). An ideal Bayesian reasoner attempts to discard personal biases in favor of dispassionately weighing the evidence observed and the relevant probabilities. But we ain’t ideal here, people! A motivated Bayesian reasoner notices that while an ideal prior is built on a sequence of Bayesian updates starting from a uniform distribution and updating on every relevant piece of evidence, their prior for a particular update can simply be - well, whatever the hell you want! Your priors are your darlings, and among friends, you can adjust them as much as you like to achieve the results you desire. Tougher crowd? Offer a slight justification for the number you pulled out of your ass, and then handwave any disagreements away by saying that the process is what matters and good Bayesians will converge to identical priors after enough updates. Make sure to imply that anyone with a different prior from you is an inferior Bayesian!

And finally, the fourth way to lie is to simply never perform a Bayesian update. Look, the numbers are nice, but sometimes they give you new numbers that you don’t really like. Plus, making up numbers can be a lot of work, and sometimes people take issue with the numbers you’ve gone to the trouble of making up. Why bother, when instead of being used as an actual mathematical reasoning tool, Bayes’ rule can simply be a piece of scientific jargon that you sprinkle over all your opinions to make them seem more legitimate? Really, all this requires is a vocab change. For example, instead of saying “this cherry-picked article appeals to my prejudices”, say “this evidence updates me toward (pet nonsense theory) being true”. Instead of “you’re a moron for disagreeing with me”, try “it seems like your prior may be miscalibrated”. By throwing the vocab words you’ve learned today into your arguments, it’s easy to make it seem like you’re a rigorous and objective technocrat whose beliefs are substantiated by cold, hard facts. When in reality, the jargon is just a signalling game used to boost the status of your beliefs while not-so-subtly implying that those who don’t adorn themselves with Bayesian garments are hysterical zealots in service of the Dark Side!

I hope you’ve learned something about how to lie with Bayes’ rule! If you’ve got any other nice methods, share with the class!

u/yemwez 39 points at 1539351611.000000

This post has updated my priors towards rationalism being dumb.

permalink

u/gohighhhs 21 points at 1539362451.000000

This post was more informative than anything on Bayes Theorem rationalists have ever written.

permalink

u/TheHiveMindSpeaketh 23 points at 1539363299.000000

Damned by faint praise indeed

permalink

u/eario 14 points at 1539362816.000000

“it seems like your prior may be miscalibrated”

Does anybody seriously say that, while pretending that they´re “bayesian” and that “frequentism” is wrong?

permalink

u/TheHiveMindSpeaketh 20 points at 1539363351.000000

Some rationalists talk about frequentists as though they're actually mentally ill. I haven't managed to put myself in a place (excuse me: identify the proper priors) where I can replicate that.

permalink

u/l_lecrup 13 points at 1539432616.000000

The basic criticism of Bayes rule is: garbage in, garbage out. And humans are just garbage at probability estimations.

permalink

u/curiouskiwicat 7 points at 1539390806.000000

OP this is an interesting post. I might be stupid, but I don’t understand how

P(B) = .3 and a likelihood P(A|B) of .5…. P(A) = .9

is impossible as you say it is.

When

P(B|A) = P(A|B) * P(B) / P(A)

then we have

P(B|A) = (0.5 * 0.3) / 0.9

= 0.167

Nothing impossible about that! It does mean that B makes A less likely. That’s only wrong if in your particular situation P(A|B) >= P(A).

permalink

u/TheHiveMindSpeaketh 14 points at 1539391556.000000

Apologies in advance if this explanation sounds condescending; I find understanding probability ridiculously difficult and tend to need lots of explanation when discussing it. For the same reason that P(A) + P(~A) = 1 for any A, it's true that P(A|B) + P(~A|B) = 1 for any A and B. Say that from the example, P(A|B) = .5. Then it's the case that P(~A|B) = .5. Now we also know that P(B) = .3. This gives us the following: .15 <= P(A) <= .85. Why? For a picture imagine the full probability space as a large circle, B as a smaller circle inside the space (covering 30% of the area), and A as shaded area. Half of B is shaded (that's P(A|B) = .5), and half of B is not (that's P(~A|B) = .5). Since P(B) = .3, we know that at P(A) is at *least* .15 - if there is no shaded area outside B. We also know that it's at *most* .85 - if all the area outside B is shaded, and the only unshaded area is half of B. So assuming it's true that P(B) = .3 and P(A|B) = .5, then it's not possible for P(A) = .9. If isn't fully convincing, imagine a reverse argument: color 90% of a circle yellow, for P(A). Now color 30% of the same circle blue, for P(B). At minimum, 20% of the circle is now green, which is at least 2/3rds of the blue circle, so P(A|B) is at minimum .666...

permalink

u/curiouskiwicat 7 points at 1539391997.000000

Thanks! To be honest, that was a bit much to follow all the way through without pen & paper, but I could work it out thanks to your reminder to check out how this would work when you look at P(~A). Once again, enjoyed your post. This should be taught to every student learning how to solve problems using Bayes rule just so they know how open to abuse it is.

permalink

u/l_lecrup 7 points at 1539432770.000000

A slightly quicker "sanity check" explanation than OPs is as follows: in the example, B happens without A half the time. Since B happens with prob 0.3, that means B happens without A with prob 0.15. But if A happens with prob 0.9, the probs sum to more than 1.0

permalink

u/zhezhijian 5 points at 1539452540.000000

Just to add on to your brief sanity check: If P(A) is 0.9, then P(~A) must be 0.1. B happening without A cannot exceed 0.1, as the sum of P(B happening without A) and P(~B happening without A) must equal 0.1.

permalink

u/Vincent_Waters 6 points at 1539801646.000000

I my experience, rationalists just insist their prior on their opinion is 1-epsilon and close their ears to any new evidence.

permalink