AKA one of the people who came up with the LLM topology in the first place.
https://twitter.com/aidangomezzz/status/1651053357719535622
The thread it’s in reference of is also great if you like seeing yud getting called on his claims and then getting rebutted with screenshots of his own posts, and also some MIRI lore:
https://twitter.com/perrymetzger/status/1651061744800788480
And a semi in-depth explanation of what is in fact wrong with the loss function comment:
https://twitter.com/jeremyphoward/status/1651830965717843968
Jesus christ it’s always stupider than you think.
When he talks about the robot god escaping the box, I had assumed he meant that the model would be fitted to serve as a general purpose optimizer itself and would use that ability in deployment to yada yada yada…
No, it’s so much dumber than that. He thinks that the model will attain self awareness and escape the box during training, and that there needs to be a giant red button on the wall to kill the training job to stop it from getting out.
For people who don’t know, this is insane and completely impossible. It’s like being afraid that a math equation is going to leap off the page and stab you to death if you get too clever when solving it, so you have to be careful and make sure that you don’t solve it too well.
edit: here’s an even better metaphor! It’s like thinking that you need to be extremely careful when solving the equations for designing a nuclear bomb, because if you solve them too quickly then they’ll literally explode.
[deleted]
https://imgur.com/VsB6ed2.jpg
so cringey to see him try to use industry lingo
Read an Arthur C. Clarke quote once.
The thread the screenshot is from is great, especially the part about MIRI having a policy of not explicitly writing down research so malevolent AGIs can’t use it as training data.
Or the part about MIRI having inner and outer teachings.
Or the part about EY claiming he had no idea about such policies when he was manager, and not realizing how if true that’s worse.
Or the ex MIRI chiming in to say well he wasn’t around much at all actually.
damn. i hate when im reading through old stuff on reddit and in the middle of a sparkling, scintillating discussion i find someone has written over all her old comments with nonsense, fragmenting the discussion permanently. what hilarious, moving, romantic, haunting things could she have said? just to wash it all away, in this digital era of permanency? wow. that takes courage. i bet she was really cute, too
So the thing they’re actually worried about is skynet during the training run and then when you forwards pass it using its phenomenal mind powers to somehow hack into the matrix when you eval??? They really get everything from scifi.
See, this explains why we must be ready to do tactical air strikes on data centers if we hear they’re starting training runs that are too powerful.
But what if someone accidentally or on purpose drops some of Ada Lovelace’s DNA into the loss function? How will we guard against that?
I am known to do the wop (Wop)
Also known for the Flintstone Flop
Tammy D getting biz on the crop (Crop)
Acausal Boys known to let the loss function
“MMM, D-r-r-rop!”
Do it
Got to love how on the rare occasion that yudkowsky says anything concrete, it reveals utter and complete lack of any knowledge of the topic hes supposedly an expert in
Glad to have done my part in eliciting that thread 🫡
https://twitter.com/jeremyphoward/status/1651832655871352832?s=20
While I like dunking on big yud as much as the next guy, being pedantic about shit like calling it a “loss function” instead of just “loss” is peak 🤓 energy and you deserve to be mildly bullied if you do that kind of shit
Aside from the training/deployment distinction, I still have trouble seeing wtf Eliezer is saying here. In the context he’s talking about, “sudden drops in loss function” means “the model is getting smarter/better in unreasonably small time”…but better in *what*? The model is not updating its own weights…is it? Are you checking for instances where you somehow stumbled, in the current epoch, upon a magical configuration of weights that, when further updated by the same dumb algorithm in subsequent epochs, will accelerate the rate at which the model becomes smarter?
That’s freaking insane. The usual “well it’s mysterious and superintelligent and you don’t know what it will do” is not going to work here because it’s not an AI god that does the weight updating. Even if, in theory, this configuration exists inside the ‘inscrutable matrices’, why on earth you think you can ever come across it? Is there any earthly reason the training mechanism will have a tendency to converge towards *that*? That’s like worrying that the next time I breathe the air molecules will assume the exact states needed in order to align the motion of all the molecules in the world towards the Sun.
This person really gives me the impression that he really, *really* thinks “would make for passable sci-fi story” and “is actual science” are very close. As in, if you juggle and rearrange in your narrative technical concepts with the freedom a sci-fi writer is afforded, then you come very close to actual research(if you are a genius like him, I suppose). I know this is the most common critique of him, but damn if ‘we are seeing sudden drops in the loss function’ doesn’t come straight from Bioshock-like audio logs.