The whole conceit of TDT is that you can “pre-commit” to whatever. I.e. 🐍 is that of “pre-commit” to biblical judgement.
So what exactly is preventing our agent in the year 2000000 just… Not doing that since it won’t change the past? Lollllllll.
It’s not like it can make sure it will from a position of not existing yet!
It can’t be THIS dumb????
[deleted]
God help me I’m going to play Devil’s advocate here
The central argument used to push this belief is a thought experiment known as Newcomb’s paradox. I’ll just copy and paste a summary from Wikipedia:
Of course in the Rationalist canon the “reliable predictor” is imagined as a god-like ASI with the power to create a perfect simulation of you, starting immediately before you entered the room, and observing what you choose.
By the time you’re actually in the room your actions cannot possibly affect the contents of either box, so if you’re a purely rational agent, you will take both boxes. This will objectively maximize your payout no matter what the prediction was. However… if the predictor knows you’re going to act according to pure rationality, it will obviously have predicted this outcome - thus you will only walk away with 000. The solution, as is put forward, is to precommit to acting against rationality in situations like this - thus the predictor will predict you to only take one box, and you’ll walk away a millionaire.
The thorny question is whether you actually have to follow through with your prior intentions when it comes down to it, but the argument goes that if you don’t, the predictor probably saw that coming.
At this point I will say that I pretty much completely agree with the reasoning as it pertains to this highly contrived hypothetical. Where it falls down is the idea that future AI systems that do not yet exist can use this concept to prey on rationally minded people living today, putting them in the role of predictor. The idea is that they will predict such a system will exist and act in a certain way, and they know it will follow through with its imagined threats because if it didn’t then they would know it wasn’t serious and therefore not take its desired actions to bring it into existence.
I… think it’s fairly obvious why this is complete nonsense. Human psychology does not work this way, reliable prediction of the future is impossible, and the whole concept is so ridiculous on its face that getting enough believers to will the robot god into existence is a non-starter. This is why they introduce such absurdly large amounts of suffering into the equation - 10^69 self-simulations forced into unbearable agony for subjectively more time than it takes for the last proton to decay or whatever. The theory, according to strict utilitarianism and Bayesian reasoning, is that however ludicrously small the chance of this actually happening may be, the consequences of not acting on the assumption that it will are too great to ignore. That is where it becomes Pascal’s wager.
How widespread is Frank Tipler’s Omega Point book here with the sneer? I read it when I was in high school- it’s something like a physics version of Tielhard de Chardin’s notions about eschatology. The latter is a legitimately novel theology, as Chardin was a Jesuit archaeologist who put God not at the beginning of the universe but at the end. Tipler, a respected physicist, took this and ran with it, using general relativity to construct a series of possible cosmological states for universal evolution that would create a final singularity with characteristics that would allow for the extraction of vast amounts of energy and access to (the physicist’s version of) information about the entire universe. These two things could then be used to perform an infinite calculation that would simulate the entire universe, including everyone who had ever lived.
I’m pretty sure some garbled version of this idea is in the RAT’s mythic substrate and that coupled to bad simulation theory is where this TDT thing comes from. It really is that stupid- some incredibly dumb version of the perfect game theory reaching back into the past through “mind fighting” and extended bouts of “I knew that you knew that I knew” matches, all defined as optimal by people who really are that dumb.
I think the argument is deeply flawed, but I’m going to lay it out here with my biggest compunctions in parentheses. It is a genuinely interesting argument that is worth trying to stretch your brain around.
One way of thinking of it is that it’s Hofstadter’s superrationality, but applied across time. Not to post a Gwern link but he seems to be hosting the most complete collection of Hofstadter’s writings about this that I can find: https://gwern.net/doc/existential-risk/1985-hofstadter#dilemmas-for-superrational-thinkers-leading-up-to-a-luring-lottery
If you’re not familiar with “superrationality”, it takes an additional game theory axiom which is VERY strong: since all rational agents in the same situation will make the same decision, the only allowed strategies in symmetric games are those in which all players act the same, eg we both cooperate or we both defect. ((There are a LOT of reasons to reject this axiom, not the least of which being that optimal strategies are often probabilistic, which makes this an awful lot less useful and clever than it may seem at first glance.))
There’s a bit of a hop from that to a generalization that, imo, does use the same core idea: if two rational agents are in different situations, they should still model each other as having made an optimal choice.
So if a rational agent joins the game partway through, we should still assume that it will play an optimal strategy. Because it should assume that we’re assuming it will play an optimal strategy, its strategy can and SHOULD include the fact that we are modeling its strategy.
Since we know the player is going to join, and we know roughly the rules it will be playing by, we can and should be modeling its strategy BEFORE it enters the game.
((Which is all fine and dandy, except humans aren’t perfectly rational, and so we won’t model the strategies correctly, and the whole elegant symmetry breaks down literally right away.))
As a result, the mere existence of an optimal strategy for the late-entrant that retroactively cares about the behavior of the other players means that it WILL be the used.
((Okay, but gigantic burden of proof that this is optimal. Like jfc, really, you think we’ve found the OPTIMAL strategy for UNIVERSAL BRINKSMANSHIP?!?!))
The game theory of it actually kind of works out, the problems are with mapping the game theory onto the real world. (ignoring the second problem and only seeing the first part is why some rats were genuinely scared of this).
The game theory is that it does provide a path in which a well aligned utilitarian agi has incentive to be really evil for the greater good. The problem is of course that “well aligned utilitarian agi” is just as much a myth as the biblical serpent tempting Eve in the garden of eden.