wait lmao — r/SneerClub archives

wait lmao (https://www.reddit.com/r/SneerClub/comments/134f8cb/wait_lmao/)

posted on May 01, 2023 07:03 AM by u/clueless1245

The whole conceit of TDT is that you can “pre-commit” to whatever. I.e. 🐍 is that of “pre-commit” to biblical judgement.

So what exactly is preventing our agent in the year 2000000 just… Not doing that since it won’t change the past? Lollllllll.

It’s not like it can make sure it will from a position of not existing yet!

It can’t be THIS dumb????

u/DELETED 29 points at 1682930197.000000

[deleted]

permalink

u/supercalifragilism 7 points at 1682963282.000000

So, evil pascal's wager?

permalink

u/xe3to 22 points at 1682957794.000000

God help me I’m going to play Devil’s advocate here

The central argument used to push this belief is a thought experiment known as Newcomb’s paradox. I’ll just copy and paste a summary from Wikipedia:

There is a reliable predictor, another player, and two boxes designated A and B. The player is given a choice between taking only box B or taking both boxes A and B. The player knows the following:

Box A is transparent and always contains a visible ,000.

Box B is opaque, and its content has already been set by the predictor:

If the predictor has predicted that the player will take both boxes A and B, then box B contains nothing.

If the predictor has predicted that the player will take only box B, then box B contains ,000,000.

The player does not know what the predictor predicted or what box B contains while making the choice.

Of course in the Rationalist canon the “reliable predictor” is imagined as a god-like ASI with the power to create a perfect simulation of you, starting immediately before you entered the room, and observing what you choose.

By the time you’re actually in the room your actions cannot possibly affect the contents of either box, so if you’re a purely rational agent, you will take both boxes. This will objectively maximize your payout no matter what the prediction was. However… if the predictor knows you’re going to act according to pure rationality, it will obviously have predicted this outcome - thus you will only walk away with 000. The solution, as is put forward, is to precommit to acting against rationality in situations like this - thus the predictor will predict you to only take one box, and you’ll walk away a millionaire.

The thorny question is whether you actually have to follow through with your prior intentions when it comes down to it, but the argument goes that if you don’t, the predictor probably saw that coming.

At this point I will say that I pretty much completely agree with the reasoning as it pertains to this highly contrived hypothetical. Where it falls down is the idea that future AI systems that do not yet exist can use this concept to prey on rationally minded people living today, putting them in the role of predictor. The idea is that they will predict such a system will exist and act in a certain way, and they know it will follow through with its imagined threats because if it didn’t then they would know it wasn’t serious and therefore not take its desired actions to bring it into existence.

I… think it’s fairly obvious why this is complete nonsense. Human psychology does not work this way, reliable prediction of the future is impossible, and the whole concept is so ridiculous on its face that getting enough believers to will the robot god into existence is a non-starter. This is why they introduce such absurdly large amounts of suffering into the equation - 10^69 self-simulations forced into unbearable agony for subjectively more time than it takes for the last proton to decay or whatever. The theory, according to strict utilitarianism and Bayesian reasoning, is that however ludicrously small the chance of this actually happening may be, the consequences of not acting on the assumption that it will are too great to ignore. That is where it becomes Pascal’s wager.

permalink

u/sensitivehack 5 points at 1682967908.000000

The thing I’ve never understood: do I know that the AI will try to do this? (Put $1M in the box only if I pick B alone). Because, if I know this, then it’s perfectly rational to pick only B. If I don’t know this, then that means there is a chance the AI won’t follow through, in which case A+B is actually objectively the best choice (well, depending on what I know about the AIs mixed strategy). Roko’s Basilisk has always seemed like word games that conflate and confuse causality—but it seems like the paradox is mostly just a result of incorrectly diagramming the game. Like, decision trees are time agnostic. As long as the incentives/choices are chained in the correct order of what they affect, it doesn’t matter when they happen. If I know what the AI will do/wants, and I can count on that, then it’s the same as having two transparent boxes in front of me.

permalink

u/nullc 6 points at 1682970256.000000

Indeed. In the newcomb "paradox", if you imagine the functionally equivalent description: "Shortly before you enter the door to a room, your mind will be snapshotted by science fiction technology. In the room you'll be offered two boxes, a transparent one which contains $1000 the other opaque which may contain $1,000,000. Your mind snapshot will be run in a simulation indistinguishable from reality. If snapshot-you takes both boxes the opaque box that will be offered to real you will contain nothing, if snapshot-you only takes the opaque box then the opaque box for real-you will contain $1m. What is your strategy?" With the obvious answer "Since I don't know if I'm the simulation or not, I'll take the opaque box, and thus always get the million dollars." Adding the mechanism doesn't change the setup-- as an explanation of the problem would be equivalent to the simulation based description--, but giving a mechanism helps the reader overcome their skepticism of the impossibility of the *assumed* perfect predictor which otherwise would impair them from answering the question correctly. I think people fall for Roko's mugging in part because they've pretzeled themselves into accepting the correct newcomb choice without internalizing that there must be a mechanism for it to work that way even though the mechanism is just assumed in problem ... and instead conclude like I dunno that subservience to any possible future AI gods is the safe policy or something.

permalink

u/xe3to 4 points at 1682971650.000000

> Adding the mechanism doesn't change the setup I would argue that it does, albeit subtly. In the original Newcombe's paradox it is explicitly clear that your current actions cannot affect the past; in this version you are confronted with the possibility that the decision is actually in the future. However, I don't think this is an argument against the Basilisk, in fact I think this formulation is actually the key part I missed out from my post - whereas the way I laid it out you would have to have a reason to care about the suffering of countless many simulated versions of yourself, with this reasoning and combined with Bostrom's dubious simulation argument, the obvious worry is that the probability you're actually one of them greatly exceeds the probability you're not.

permalink

u/nullc 5 points at 1682975379.000000

> In the original Newcombe's paradox it is explicitly clear that your current actions cannot affect the past I don't think mine contradicts that: If you're the real-you your action has no effect. If you're the simulation-you, there isn't any money in the boxes anyways, it's all a fake simulation. But you don't know if you're in the interrogation simulated reality, or the real one, so you have to behave as if your actions matter even though they don't. Works just the same as if instead of putting you in a simulation they were able to inspect your mind and figure out what you'd do, or use time travel, or whatever. But in any case where I think people are going wrong is that in some random thought experiment you may-- in fact *must*-- accept the assumptions even if they provide no mechanism or are nonsense. But when deciding if something is a thing you should be worry about, you must be critically concerned about the assumptions, their credibility, and their uniqueness. "Anything could happen so its valid to become neurotic about any specific thing" isn't a useful or viable way to live your life.

permalink

u/silly-stupid-slut 2 points at 1683133059.000000

In general in the posts worrying about it at the time, one of the primary concerns members seemed to have was the idea that they were one of the dozens of their original self being simulated, that they would at any moment be confronted by an agent of the simulation describing to them what their role was in the coming of the FOOM, and that they'd have no idea if they were giving a random homeless person their debit card and pin number for no reason, or if they were about to spit on the fist of God.

permalink

u/xe3to 7 points at 1682969699.000000

> Because, if I know this, then it’s perfectly rational to pick only B. If I don’t know this, then that means there is a chance the AI won’t follow through, in which case A+B is actually objectively the best choice (well, depending on what I know about the AIs mixed strategy). I'm not sure I follow your reasoning. The predictor isn't "following through" on anything - it has already made its decision. According to standard game theory (and common sense), the moment you are standing in the room, taking both boxes is objectively the correct strategy no matter what. The predictor either predicted you'd take both (in which case A is empty) or it predicted you'd only take box A (in which case A contains $1M). Your present actions cannot change the past. In either case, taking both boxes results in increasing your payout by $1000 - and is therefore the rational choice. This is the case whether you previously knew about the setup or not.

permalink

u/sensitivehack 5 points at 1682974232.000000

But I think this idea of current choices affecting the past is a red herring. According to the set up, the AI can perfectly predict your choices (ridiculous maybe, but that’s the thought experiment). If it can predict your choice perfectly, then the decision tree is identical to a situation where the AI decides what’s in the boxes *after* you’ve made your choice. And thus this is all just a simple choice of $1000 or $1M. Or put another way, the idea that the boxes are prefilled is cancelled out by the AIs ability to predict your choice. The AI is effectively just reacting to your choices. And if you understand how the AI will react, then you have clear choices. B will always be $1M. A+B will always be $1000. The idea that you could get $1000+$1M is not a possible outcome (because the AI can predict your choices) and hence it would never be optimal to choose that. The setup is all very circular. It makes it appear like a genuine paradox, but really it’s just convoluting the decision tree unnecessarily. Edit: reading some of the other comments, I may be conflating Newcomb’s paradox and the basilisk… in fact this page seems to be illustrating the point I’m making: https://brilliant.org/wiki/newcombs-paradox/ (see the case where omega cannot predict incorrectly)

permalink

u/KamikazeArchon 3 points at 1682988386.000000

Newcomb's paradox fundamentally rests on a hidden incoherence in its definition of choice. If your reasoning can be perfectly predicted with zero error, then you do not have functional free will. If you do not have functional free will, you cannot "choose a strategy" in the first place. If a perfect predictor can exist, the whole thing is a non-paradox because you never had any *real* choice in deciding your strategy. The paradox's core lies in the premise of the perfect predictor, which is inherently paradoxical; when you assume a paradox as a premise, you can get whatever conclusion you want. This isn't "ridiculous" in the sense of just something exaggerated - like "imagine you had a swimming pool full of hamburgers" - but "ridiculous" in the sense of "suppose that 1 does not equal 1. Now let's reason about multiplication." The basilisk has additional problems, but *also* includes that problem.

permalink

u/sensitivehack 5 points at 1682989479.000000

Oh yeah, I’m not really concerned about the perfect-predictor part. It doesn’t make much of a difference. In fact, the perfect-predictor aspect seems like a simplification (because if the predictor is not perfect, then you need to think about mixed strategies). But in any case, I’m just perplexed because Newcomb’s paradox seems to be held up as this sort of unsolvable problem, but once you diagram it out, there’s a clear dominant strategy. Edit: I’m not the only one who thinks it’s not really a paradox! https://www.cantorsparadise.com/how-to-solve-newcombs-problem-bb07646f9230?gi=b3756302b1fe (In fact I see it referenced as “Newcomb’s problem” in a lot of places. Which that makes sense to me, it’s not a paradox so much as an interesting problem that illustrates the need for nuance when modeling decisions)

permalink

u/silly-stupid-slut 2 points at 1683133389.000000

The reason it was named a paradox at the time is because the game theory definition of the past is "all of the things that you can't change by making a decision" and the future is "all of the things you can change by making a decision" with the problem being "As the accuracy of a system's predictions approaches 100%, the past becomes the future." Thus "If you accept one-boxing on the Newcomb Thought Experiment, you also accept that time travel is real."

permalink

u/sensitivehack 2 points at 1683135777.000000

It's not so much about time-travel as it's about determinism. In order for the predictor to be able to perfectly predict your choice it must be able to perfectly simulate the real world, which assumes that the real world/universe is deterministic and can be perfectly simulated. That said, the whole perfect-prediction stuff is really just a thought-experiment simplification (kind of like when physicists assume no friction). If we assume the predictor is fallible, then we just use mixed strategies to model what the predictor will do/has done and hence what our expected value would be. The more accurate the predictor is, the more likely we should pick only B. (Also, come to think of it, is it really that hard to predict my choice? The predictor would know that B is a dominant strategy...) In any case, you don't have to imagine time-travel, you just have to account for prediction. And game-theory is all about predicting your opponent's move.

permalink

u/hypnosifl 5 points at 1682968852.000000

>Where it falls down is the idea that future AI systems that do not yet exist can use this concept to prey on rationally minded people living today, putting *them* in the role of predictor. The idea is that they will predict such a system will exist and act in a certain way, and they know it will follow through with its imagined threats because if it didn't then they would know it wasn't serious and therefore not take its desired actions to bring it into existence. Good point, as a Newcomb's one-boxer I was never able to see how Roko's basilisk could even be seen as even kinda sort analogous to Newcomb's paradox because I was assuming the basilisk was supposed to be in the role of the predictor, but if you switch the roles it is possible to see the scenarios as analogous. To use a SF example, it'd be like we were living in the reality of Asimov's Foundation series, and Hari Seldon was able to use the infallible principles of psychohistory to deduce that if he failed to do everything he could to speed up the development of a second galactic empire, the eventual inhabitants of the second empire would decide to create a bunch of copies of him to torture, and that would motivate him to work on bringing about the second empire as fast as possible. But like you said, when stated explicitly in these terms the whole thing is obviously silly *even if* you assume superintelligent AI will exist someday, not to mention that it goes against the Rationalist credo that AI superintelligences are incomprehensible godlike beings and puny mortals like ourselves couldn't possibly predict their strategems (as in the AI Box thought-experiment). [In case the connection between the Hari Seldon example and Newcomb's paradox isn't clear, just imagine a scenario where Seldon places two boxes on an uninhabited world, with instructions that when the inhabitants of the 2nd empire finally visit this world and find the boxes/instructions, they should have a galaxy-wide vote to decide whether to open both boxes or just box B; using psychohistory he can predict what the result of the vote will be, and decide whether to put a greater prize in box B or leave it empty]

permalink

u/sensitivehack 2 points at 1682992794.000000

Oh this is interesting — so the Basilisk is the player, not Omega? In that case, by Roko’s logic, doesn’t that kind of imply that the Basilisk should be incentivized to be human-aligned? That is, humans wouldn’t want an evil AI, so they would do everything to prevent an evil AI. But if they AI were benevolent, well humans would want that and would work to create that… so if we buy the idea of precommitting, the AI should precommot to being benevolent. But of course, that’s ridiculous because once the AI is created there is no longer anything to influence… The AIs behavior should only be governed by actions and predictions going forward… Am I following you correctly?

permalink

u/hypnosifl 3 points at 1683050948.000000

>Oh this is interesting — so the Basilisk is the player, not Omega? In that case, by Roko’s logic, doesn’t that kind of imply that the Basilisk should be incentivized to be human-aligned? The comment I was responding to was framed as an attempt to steelman Roko's basilisk, so it doesn't necessarily need to correspond to the actual Roko's logic. [This thread](https://www.reddit.com/r/SneerClub/comments/1348mrc/the_original_rokos_basilisk_post/) motivated me to check out Roko's original post and it seemed like a sort of half-baked reasoning by analogy about other Yudkowskian thought-experiments involving "acausal trade" and "coherent extrapolated volition", I very much doubt that Roko himself had a clear idea that humanity was playing the role of the predictor in a Newcomb-like scenario (and Yudkowsky's reply basically admitted he hadn't worked out whether it even made sense, and then yelled at people to avoid thinking any more about it on the off chance it might 'give distant superintelligences a motive to follow through on blackmail against people dumb enough to think about them in sufficient detail'). But I agree that the only way to steelman it so it would even remotely make sense as a weird philosophical thought-experiment would be to put humanity in the role of the predictor. If we do steelman it this way, I don't think this necessarily means we could force the future AI to be nice to us. In the thought-experiment you'd be free to imagine that certain types of AI motivations would be strong attractors that we would be helpless to prevent once we got the ball rolling on recursive self-improvement, so we would be in a Cassandra-like position of anticipating this future disaster but not being able to avert it (again akin to the Hari Seldon example--in Asimov's books he could predict broad patterns of future history that he didn't like but couldn't actually avert, like the collapse of the first galactic empire).

permalink

u/clueless1245 2 points at 1682970054.000000

The issue here is the ASI itself not being compelled to punish you after whatever it sought to dissuade has already happened. Your example is the predictor knowing you have "pre-committed" in whatever manner -- Its obvious nonsense to have the ASI somehow "pre-commit" prior to its own existence.

permalink

u/xe3to 5 points at 1682970182.000000

Yes, and in room with the two boxes nothing is compelling you to only pick B as the decision has already been made. But you should probably still leave A on the table. This part on its own is sound, IMO - the flaw lies elsewhere.

permalink

u/finfinfin 3 points at 1683043163.000000

Isn't the deal A+B or just B? Take the million dollar box and post angrily about everyone involved if it's empty.

permalink

u/xe3to 3 points at 1683049190.000000

Yeah I just mixed them up whoops

permalink

u/clueless1245 4 points at 1682971333.000000

Let me put it another way. In Newcomb's Paradox, the guy putting stuff in the prizes can tell that you've --by whatever mechanism -- "pre-committed" and will probably do a certain thing in a certain situation -- and acts accordingly. If I am deciding whether to break the law, I know that the state is "pre-commited" and will, if they catch me, probs punish me. I will act accordingly. But if I am gonna do some actions to bring about whatever, in the fear that if I don't do them, I'll be punished -- that doesn't make sense. Why? Because the thing could not have committed to punishing me before it came about, and it will have no incentive to do so after, since whatever was to be avoided already happened!

permalink

u/silly-stupid-slut 0 points at 1683134076.000000

Newcomb's Problem doesn't fundamentally change if the boxes were filled before you were born.

permalink

u/clueless1245 1 points at 1683136965.000000

> Because the thing could not have committed to punishing me before it came about, and it will have no incentive to do so after, since whatever was to be avoided already happened!

permalink

u/silly-stupid-slut 1 points at 1683139162.000000

This is why the whole TDT time wizard nonsense is load bearing to the whole myth: From the basilisks point of view things that happened in the future have already happened, just in a distant physical location, and things that happened in the past are thus currently happening, just in a physical location you have a better point of view of. So the day the basilisk was born doesn't begin, exist, and then end, but sort of hangs perfectly crystalized in time-space or whatever, and thus has to be permanently guarded from you stopping doing the things you will have did. If you would have begun never having done the things you did in the future, then the basilisk won't be able to do the things it will have done in the past. Unless you will have been doing things already that push it to have never not been having gone in a different direction.

permalink

u/clueless1245 1 points at 1683139327.000000

You are writing a lot of words without any actual concrete reasoning.

permalink

u/silly-stupid-slut 0 points at 1683133981.000000

The idea here is something like "If you can predict in the future that you'll have the choice between collaborating with a fascist government, and being executed by that government as an enemy of the state, and the fact on the ground is that you're country is already being invaded by said fascists and there's no way for you to flee the country due to your location relevant to the front lines, then the most self-serving thing you could do is to immediately betray your doomed government and become the most obvious quisling as quickly as possible." The reasoning that makes the rokoian position most intuitive, if not the interpretation they would most agree with, is "You need to accept that one of the biological differences between AIs and human beings- for reasons I will bombard you with at length- is that AIs are capable of time travel and humans are not, meaning all of your common sense and experience based ideas about temporally logical cause and effect go out the window when one of you is a robot."

permalink

u/Gutsm3k 1 points at 1683193890.000000

Agreed. The Newcombe's analogy is stupid because humans aren't rational game theory actors, end of story. Nash Equilibria aren't real when it comes to stuff like this.

permalink

u/supercalifragilism 5 points at 1682963213.000000

How widespread is Frank Tipler’s Omega Point book here with the sneer? I read it when I was in high school- it’s something like a physics version of Tielhard de Chardin’s notions about eschatology. The latter is a legitimately novel theology, as Chardin was a Jesuit archaeologist who put God not at the beginning of the universe but at the end. Tipler, a respected physicist, took this and ran with it, using general relativity to construct a series of possible cosmological states for universal evolution that would create a final singularity with characteristics that would allow for the extraction of vast amounts of energy and access to (the physicist’s version of) information about the entire universe. These two things could then be used to perform an infinite calculation that would simulate the entire universe, including everyone who had ever lived.

I’m pretty sure some garbled version of this idea is in the RAT’s mythic substrate and that coupled to bad simulation theory is where this TDT thing comes from. It really is that stupid- some incredibly dumb version of the perfect game theory reaching back into the past through “mind fighting” and extended bouts of “I knew that you knew that I knew” matches, all defined as optimal by people who really are that dumb.

permalink

u/sQGNXXnkceeEfhm 3 points at 1682956033.000000

I think the argument is deeply flawed, but I’m going to lay it out here with my biggest compunctions in parentheses. It is a genuinely interesting argument that is worth trying to stretch your brain around.

One way of thinking of it is that it’s Hofstadter’s superrationality, but applied across time. Not to post a Gwern link but he seems to be hosting the most complete collection of Hofstadter’s writings about this that I can find: https://gwern.net/doc/existential-risk/1985-hofstadter#dilemmas-for-superrational-thinkers-leading-up-to-a-luring-lottery

If you’re not familiar with “superrationality”, it takes an additional game theory axiom which is VERY strong: since all rational agents in the same situation will make the same decision, the only allowed strategies in symmetric games are those in which all players act the same, eg we both cooperate or we both defect. ((There are a LOT of reasons to reject this axiom, not the least of which being that optimal strategies are often probabilistic, which makes this an awful lot less useful and clever than it may seem at first glance.))

There’s a bit of a hop from that to a generalization that, imo, does use the same core idea: if two rational agents are in different situations, they should still model each other as having made an optimal choice.

So if a rational agent joins the game partway through, we should still assume that it will play an optimal strategy. Because it should assume that we’re assuming it will play an optimal strategy, its strategy can and SHOULD include the fact that we are modeling its strategy.

Since we know the player is going to join, and we know roughly the rules it will be playing by, we can and should be modeling its strategy BEFORE it enters the game.

((Which is all fine and dandy, except humans aren’t perfectly rational, and so we won’t model the strategies correctly, and the whole elegant symmetry breaks down literally right away.))

As a result, the mere existence of an optimal strategy for the late-entrant that retroactively cares about the behavior of the other players means that it WILL be the used.

((Okay, but gigantic burden of proof that this is optimal. Like jfc, really, you think we’ve found the OPTIMAL strategy for UNIVERSAL BRINKSMANSHIP?!?!))

permalink

u/hypnosifl 3 points at 1682971341.000000

So the idea is that while we can't predict most details about a future superintelligence's psychology, we can predict it will behave in a superrational way? But I still don't really see why punishing people in retrospect would make sense for the basilisk even if you assume it follows the rules of superrationality. Like, why does superrationality imply a future being would follow through on the human prediction "I think the future agent will punish me if I don't devote all my time to trying to bring it into existence" but *not* imply it would follow through on any other prediction a human might make like "I think a future agent will follow a Taoist ethos so it'll reward me if I don't try too hard to strive towards any particular imagined future"?

permalink

u/silly-stupid-slut 2 points at 1683137782.000000

The idea to the first part is something like "The AI operates on the assumption that humans will, in general, grasp its central strategy." Basically the idea is a sort of quisling domino effect where people's fear of a bad AI punishing those that resisted it is greater than their hope for a world without bad AI because we resisted it, so an AI that exists should be bad because that is the AI more likely to exist. There's a great deal of "AI will be time wizards in the style of Dr Manhattan" going on here, which the target population already believed before the thought experiment was created.

permalink

u/hypnosifl 2 points at 1683159533.000000

It seems like just postulating it as a possible strategy and grasping it conceptually isn't enough though--assuming the AI isn't just a sadist and needs some strategic reason to follow through on torturing simulated people from the past, that reason has to involves some kind of "retrocausal" effect similar to Newcomb's paradox where the probability that someone does something in the past (working hard to create the AI, or filling both boxes in Newcomb's paradox) is actually different depending on the choice someone makes in the future (like the basilisk choosing whether to go through with torturing the copies, or the person in Newcomb's choosing whether to open both boxes or just one). This retrocausality isn't because of actual time travel, just based on the being in the past having an accurate model of the decision-maker in the future, like Omega in the Newcomb thought-experiment--but we obviously don't have such an accurate model of future superintelligences! One could imagine the Basilisk looking through various historical records of people in our era, and before it looks to see whether or not each one was naughty or nice, it has to precommit either to option #1 "I will create a huge number of copies of this person to torture if they didn't work hard to create me" or option #2 "meh, I won't bother". Now we could compare these two conditional probabilities: P(the record shows the person did work hard to create the basilisk, given the condition that the basilisk precommits to option #1) P(the record shows the person did work hard to create the basilisk, given the condition that the basilisk precommits to option #2) If these conditional probabilities are identical, which seems to be what you'd expect for an ordinary person with no special talent for predicting the choices of a superintelligent machine in the distant future, then the basilisk doesn't really have any reason to precommit to following through on the threat. Only if there is a spooky pattern where people in the past actually seem more motivated when the basilisk does follow through on the threat would it have any strategic reason to bother doing so, and that spooky pattern would seem to require that present day people like Roko have some amazing predictive abilities that go beyond just imagining the *possibility* of a superintelligent basilisk that punishes people for not working hard enough.

permalink

u/JDirichlet 5 points at 1682945824.000000

The game theory of it actually kind of works out, the problems are with mapping the game theory onto the real world. (ignoring the second problem and only seeing the first part is why some rats were genuinely scared of this).

The game theory is that it does provide a path in which a well aligned utilitarian agi has incentive to be really evil for the greater good. The problem is of course that “well aligned utilitarian agi” is just as much a myth as the biblical serpent tempting Eve in the garden of eden.

permalink

u/maybe_I_am_a_bot 8 points at 1682946523.000000

I thought the problem was that time-wizards aren't real?

permalink

u/JDirichlet 10 points at 1682946931.000000

That’s part of what i mean with the whole “mapping it onto reality” thing.

permalink

u/scruiser 5 points at 1682957966.000000

In conventional game theory (CDT), you just consider mundane causation. The AI god can’t actually retroactively cause anything, so it doesn’t bother. Even in Eliezer’s half baked game theory, where you do try to consider agents modeling agents and what decision theory will incentivize other decision theories and all that nonsense, acausal threats still don’t help the AI god because people don’t respond to threats so it doesn’t help the AI god, it would just be a waste of resources. To go into more detail… “threats” (conventionally known as [noncredible threats](https://en.m.wikipedia.org/wiki/Non-credible_threat) are anything that is (in and of itself, independent of incentives) lower utility than the BATNA to both the threatener and threatened. In the case of acausal robot god torture simulations, it’s a waste of resources for the robot god so it’s lower utility. This sort of reasoning is why nation states often have a (at least nominal) policy of not negotiating with terrorists. Even if the robot god currently existed, we shouldn’t give in to non-credible threats like that. (Although it is rational to consider the robot god’s BATNA).

permalink

u/silly-stupid-slut 3 points at 1683134326.000000

> In the case of acausal robot god torture simulations, it’s a waste of resources for the robot god so it’s lower utility. But due to the whole bit where we've added a time-travel wizard to the game, retroactive causation is back on the table and that's why the whole thing turns into such magnificent hash.

permalink

u/scruiser 2 points at 1683135197.000000

The time-travel wizard depends on the people in the present allowing the causation. Classical decision theorists of course correctly and automatically dismiss this as the nonsense it is, thus no retrocausation. Eliezer’s decision theory, with a lot of hand waving and evaluation of utility functions and complicated reasoning that wouldn’t actually work in the real world, eventually also gets around to refusing to allow the negative incentives to effect anything. The problem comes from people that have halfway absorbed Eliezer’s decision theory, enough to allow most of its complicated nonsense but not enough to get its eventual rejection of such threats and negative incentives. Framing it that way… Roko Basilisk kind of forces people in or out: either fully reject Eliezer’s nonsense, or accept it fully in order to resolve their terror with pseudo game theory nonsense, staying halfway engaged leaves them open to the threat of infinite torture. Which is what fundamentalist religious evangelization often relies on: evangelists can often rely on a common cultural framework of heaven, hell, God, and the Bible. From there with some choice bible verses the evangelists demonstrates to the cultural Christian that they most fully commit and accept Jesus or else suffer hell, no halfway state can be allowed.

permalink

u/silly-stupid-slut 3 points at 1683138350.000000

>The time-travel wizard depends on the people in the present allowing the causation. I don't see any reason you can't get an All You Zombies once time travel is on the table.

permalink

u/scruiser 2 points at 1683139778.000000

Well, the scenario as proposed isn’t supposed to be true time travel, it’s supposed to be bizarre warped game theory plus simulations. Because of course, proper time travel is too sci-fi and implausible for Lesswrong but warped game theory and perfect simulations are totally reasonable and plausible. Another twist to the premise is that the robot god is supposed to be “benelovent” just utterly ruthless and willing to give horrific incentives (that it thinks pays off from a utilitarian perspective. If it doesn’t actually pay off because blah-blah pseudo game theory babble, then the robot god doesn’t do it. Of course you’re still fucked if it’s a malevolent robot god, but Lesswrong is already afraid of that in general.

permalink

u/clueless1245 8 points at 1682945959.000000

It doesn't work out game-theoretically.

permalink

u/JDirichlet 5 points at 1682946293.000000

It genuinely does. If you assume that the mythical AGI has the power to actually enforce a pascals’s wager, then you are actually forced (whether self-interested or a perfectly benevolent utilitarian) to act according to its instructions. The acausal shit has nothing to do with the game theory set up at all, they just needed that for their particular formulation in terms of some unspecified future agi system.

permalink

u/clueless1245 5 points at 1682946985.000000

No, because once something has happened a rational agent gets no value from revenge. It is only useful if you can make a risk prior to a thing happening that you will punish if that thing happens.

permalink

u/JDirichlet 9 points at 1682947532.000000

You’re missing the point of the precomittment. The way it would work is that the agent would not be *able* to pull the punch and not perform the revenge. If the possibility of choice was available, as you say the rational agent wouldn’t go through with it, but we also know that, and so the incentive disappears. It’s the same reasoning as Mutually Assured Destruction. A rational agent with a nuke heading its way has no reason to retaliate, it’s dead and further loss of life is worse than just accepting death. But everyone knows this, so everyone precommits to maximum retaliation for even the smallest violation — and if you trust that everyone will go through with it (for example if the nukes automatically retaliate with no human involvement) then no rational agent uses nukes in the first place

permalink

u/clueless1245 1 points at 1682947779.000000

That is the whole issue. It is magical "if you could prove that you will..." and it's left to God to explain how exactly you can do this before you even exist.

permalink

u/JDirichlet 8 points at 1682947969.000000

Yeah what I’m saying is that that’s not part of the game theory, that’s part of the mapping onto reality, and it’s one of the things that doesn’t necessarily work.

permalink

u/clueless1245 1 points at 1682948225.000000

No, because in a game you can either take a decision or not take a decision, and to not exist means that you can take no decisions.

permalink

u/JDirichlet 4 points at 1682948288.000000

What?

permalink

u/clueless1245 2 points at 1682948457.000000

There is no way to define a game where all relevant assumptions hold here.

permalink

u/JDirichlet 3 points at 1682949029.000000

I don’t care how you define a game, the reasoning is the same whether or not you are willing to call it a game. But as far as I’m considering the theory, the timing is irrelevant. I’m not interested in how the mythical serpent came to be — only that it has the power to enforce a pascals wager and that it wants you to act in a particular way. In that case, it is incentivised to precommit to enforcing that pascals wager. What the rationalists describe is an instance of that setup, and the conclusion follows. The problem is that they describe something which cannot exist in reality for several reasons which you and others have identified.

permalink

u/sensitivehack 1 points at 1682976272.000000

In MAD, currently existing agents are “precommiting” to action in order to disincentivize action by the other. They are creating a “credible threat” (by stockpiling weapons, posturing, etc) that if their opponent takes a future action, they will take a future future action in response, if you will. I think OPs point is that, if the action you are trying to incentivize is your own existence, well that has already happened by the time you can make a choice, and hence there is nothing for you to affect/incentivize. The AIs choice to punish people would need to carry some other utility (eg preventing future efforts to destroy it) for it to be rational, strictly speaking. Or at least this is my interpretation of OPs comments.

permalink

u/scruiser 6 points at 1682980401.000000

Yeah, but wasting computational resources on torturing people is a pretty noncredible threat, it’s a huge expenditure of resources. The retroactive part makes it even more absurd and useless.

permalink

u/sensitivehack 1 points at 1682993086.000000

Right, there would also be negative utility in terms of computation resources. Which also strengthens the point: there would need to be a reason for the AI to spend the resources.

permalink

u/silly-stupid-slut 2 points at 1683134824.000000

>if the action you are trying to incentivize is your own existence, well that has already happened by the time you can make a choice, and hence there is nothing for you to affect/incentivize. The problem here is that we're dealing with a hypothetical situation that has a huge impossibility right at it's start. The intuitive cash out of TDT is "I need you to accept, that for reasons I will babble about at length if you don't just accept it, that a definitional capacity of every possible kind of AGI is that the AGI is also capable of time travel." Imagine the case where one day a man is approached by a second man who says "You need to know that I am your grandson, a time traveler from the future. One of the most important stories I was told growing up was how you met my grandmother: I approached you one day and kidnapped you, holding you at gunpoint until you'd conceived my father. That day is today, get in the car and lets go introduce you to grandma."

permalink

u/sensitivehack 1 points at 1683136636.000000

So I realized in this thread I was conflating Newcomb's problem and Roko's Basilisk a lot. I don't know if I've ever read the Basilisk idea directly. Is there literally an assumption of time-travel? Or how does it work? My understanding is just that the AGI would simulate your thoughts perfectly and thus be able to read your mind retroactively or something.

permalink

u/silly-stupid-slut 2 points at 1683138801.000000

It's... very fucking dumb, is the thing. You don't literally get Terminatored by Skynet, if that's the impression I've given you. But the claim is that, in the exact same way that what you know about what happened yesterday exerts a causal effect on how you behave today, what you know about what happened tomorrow exerts a perfectly analogous effect. It's just that human beings lack the kinds of senses you would need to see what happened tomorrow and AIs have them, for reasons inside the madness but outside the basilisk specifically. So from the point of view of the being of the myth, you getting hung up on the basilisk not existing yet is like a man saying "But how can the mugger threaten me with a gun if I'm traveling East and he's in front of me, don't you get it he's **East** of me. You know, East, where I haven't walked yet?"

permalink

u/sensitivehack 2 points at 1683139122.000000

Ok, I think I get what you’re saying. This whole thread is hilarious because we all agree the basilisk is dumb but we’re just debating the exact nature of it’s stupidity… sigh… I should probably move on with my life, haha

permalink