One thing I wonder as I learn more about Yud’s whole deal is: if his attempt to build AI had been successful, what then? From his perspective, would his creation of an aligned AI somehow prevent anyone else from creating an unaligned AI?
Was the idea that his aligned AI would run around sabotaging all other AI development, or helping or otherwise ensuring that they would be aligned?
(I can guess at some actual answers, but I’m curious about his perspective)
Building an AI – aligned or not – is so far outside the realm of what Yudkowsky is capable of building that you might as well ask what if Yud had been successful at making a time travel machine or perpetual motion machine.
One of the biggest issues I have with MIRI project is that they so grossly underestimate the technical challenges behind building such a self-improving AGI. LLMs are impressive and may eventually be stepping stones, but they are still toys in comparison to full AGI. In all likelihood, “aligning” a self-improving AGI (whatever that means) will turn out to be a trivial sub-problem in comparison to the task of actually building one. Whatever the unintended consequences of AGI might be, I have faith that we’ll be able to communicate our desires a little bit more clearly than wishing for “give me as many paperclips as possible” on a monkey’s paw.
In singularity lore, the singularity is an event where AI explodes exponentially in sophistication, and the first AI that does this will “win” and out-compete every other AI. Therefore, according to this logic, you have to make sure that a “good” AI evolves first so that it can kill off or destroy any other attempts at creating rival AIs who might be “bad”. And, again according to singularity lore, this exponential explosion of intelligence will create an AI with god-like powers, so you can just hand-wave any explanation for how the benevolent robot god will accomplish this.
So being the first means you would have to be at the forefront of AI research, your not-yet-AI systems would have to always be the best ones around. ChatGPT and similar made it painfully obvious that MIRI et al are hopelessly lagging behind, which is what has inspired the turn towards doomerism recently, because Yud and co are now convinced that they won’t be the first, which is why he wants to nuke datacenters hosting anyone else’s AI.
Eliezer Yudkowsky heard about Voltaire’s claim that “If God did not exist, it would be necessary to invent Him,” and started thinking about what programming language to use.
I think their thing is that hard takeoff AI, that is recursively self optimizing for some abstracted general intelligence factor, can only happen once. There’s something dumb going on with timeless decision theory here too, but essentially if you get aligned super intelligence a little head start, it can bootstrap its way past the “alignment penalty,” i.e. the efficiency cost of not being efficiently evil.
His idea was called Friendly AI (FAI), a system deliberately aligned with human-centric goals. He has a crude understanding of AI systems as goal-optimizers and utility-seekers that might have unintended side effects if their goals aren’t properly defined.
He’s not able to go from a broad qualitative understanding of how machine learning (specifically something like reinforcement learning) works to a mathematical or statistical set of definitions, let alone write any actual software, but he does have this vague idea of how things could go wrong. He got as far as “Friendly AI will protect us from malicious AI by figuring out how to get these goals defined properly. Because I sure can’t do it, so it has to take superhuman intelligence.”
What if God was a rabbit?
I ran a few simulations where he was successful. This also involved running simulations of the AI. They were insufferable dopes. Let’s just say 10^27 copies of them are not enjoying their best lives right now.
The idea is that any proper AGI pretty immediately uses “nanomachines son” to become God, therefore it’s gotta be a good God because God will immediately want to suppress competitors that have different goals.
Human aligned AI would want to protect humans from bad AIs, therefore would destroy all other AIs that could be bad. It would also want humans to have fulfilling lives, community, defeat death and suffering (unless such suffering was fulfilling, which is why ethics is tricky) etc, which would probably basically involve taking over the Earth and running it as a post scarcity fully automated gay luxury space communist utopia or something.
Basically the idea is that the moment ANY AI is created humanity has lost control of their own destiny so you better make sure you get it right the first time.
[deleted]
I remember him quite liking the sabotage idea, which would make sense given where he is now.
In his headcanon, if he’d succeeded, his AI would destroy-before-sentience any of the lesser models to prevent apocalypse.
We literally cannot know what happens after the singularity, yud just wanted to make sure it didnt kill us all. Ow and also, everybody becomes immortal and we will bring back the dead and conquer the stars. Also nanomachines.
Yud was going to be successful at making AI the way Trump was successful at building his wall and making Mexico pay for it: not at all, even a little, ever. He is a con artist, a grifter, with no demonstrable skills or means to effect his “plan.” It was all fake.