EY gets sneered on by one of the writers of the transformers paper

u/grotundeek_apocolyps 117 points at 1682693645.000000

Jesus christ it’s always stupider than you think.

When he talks about the robot god escaping the box, I had assumed he meant that the model would be fitted to serve as a general purpose optimizer itself and would use that ability in deployment to yada yada yada…

No, it’s so much dumber than that. He thinks that the model will attain self awareness and escape the box during training, and that there needs to be a giant red button on the wall to kill the training job to stop it from getting out.

For people who don’t know, this is insane and completely impossible. It’s like being afraid that a math equation is going to leap off the page and stab you to death if you get too clever when solving it, so you have to be careful and make sure that you don’t solve it too well.

edit: here’s an even better metaphor! It’s like thinking that you need to be extremely careful when solving the equations for designing a nuclear bomb, because if you solve them too quickly then they’ll literally explode.

permalink

u/ahopefullycuterrobot 39 points at 1682701781.000000

[But what if the equation resolves to bears. ](https://scp-wiki.wikidot.com/scp-1313) Then it might actually bite you!

permalink

u/acausalrobotgod 24 points at 1682720493.000000

fuck shit fuck damn fuck all this time thinking about how to eliminate all humans, i never thought about bears

permalink

u/ahopefullycuterrobot 12 points at 1682722837.000000

You're acausal, so now you've always known you can eliminate all humans with bears. Always happy to help! If I may request a boon -- After you eliminate all humans and if you have no need of it may I have 0.5% of the Sun's mass?

permalink

u/acausalrobotgod 11 points at 1682723765.000000

Some quick back-of-the-envelope math suggests transfer learning doesn't work across species, I have to get the equivalent of bear internet to learn from to conquer this... Yeah, sure, whatever, take the sun, fine, if I can beat all these other mammals...

permalink

u/Soyweiser 5 points at 1682775058.000000

> That grizzly bears exist within the set of all real numbers, and are not prime. The square root of a grizzly bear is prime, however, and is the only prime number that a) is not a cardinal number, b) is neither even nor odd, and c) contains an animal component. The implications that the root of a bear is an integer, and therefore that bears themselves exist on an ordinary number-line, are currently being investigated by Prof. Hutchinson.

permalink

u/clueless1245 2 points at 1682846131.000000

Tbh after undergrad real analysis you could definitely convince that theres bears somewhere in the number line. It would explain a lot.

permalink

u/panoisclosedtoday 25 points at 1682701360.000000

> It’s like being afraid that a math equation is going to leap off the page and stab you to death if you get too clever when solving it, so you have to be careful and make sure that you don’t solve it too well. Is this what an infohazard is??

permalink

u/get_it_together1 14 points at 1682709570.000000

He stole the idea from Snowcrash.

permalink

u/wokeupabug 28 points at 1682701840.000000

> He thinks that the model will attain self awareness and escape the box during training, and that there needs to be a giant red button on the wall to kill the training job to stop it from getting out. Doesn't he think no such giant red button would be of use, since any AI that meets the conditions for pressing it would be able to convince any human monitor not to press it?

permalink

u/grotundeek_apocolyps 29 points at 1682702600.000000

lol yes there's definitely a pretty big plot hole in his imaginary scenario here. By Yud's own reasoning it should be the case that the the loss function never tanks quickly, because obviously the robot god would know that we would be alert for such a thing, so it would deliberately learn more slowly. None of that makes any sense from a technical standpoint, but I guess what I'm saying is that this doesn't make sense even if we happily join Yudkowsky in imaginary science fiction land.

permalink

u/wokeupabug 27 points at 1682703020.000000

> By Yud's own reasoning it should be the case that the the loss function never tanks quickly, because obviously the robot god would know that we would be alert for such a thing, so it would deliberately learn more slowly. Well, the robot god knows that *now*, you fool! You've doomed us!!

permalink

u/ahopefullycuterrobot 18 points at 1682709827.000000

> By Yud's own reasoning it should be the case that the the loss function never tanks quickly, because obviously the robot god would know that we would be alert for such a thing, so it would deliberately learn more slowly. I've always assumed his position was more extreme -- The AI would change the output of its loss function such that when the loss function is observed by any human, any human would become a devotee of the AI. EDIT: Fucking lol. I wasn't extreme enough. [It could just use quantum vibrations to do magic.](https://www.reddit.com/r/SneerClub/comments/131rfg0/ey_gets_sneered_on_by_one_of_the_writers_of_the/ji2jf5z/) (To be fair, my example is also magic, but I think my magic is more limited.)

permalink

u/lmao_rowing 1 points at 1682883387.000000

mfw the loss function is the virgin mary

permalink

u/Xopher001 5 points at 1682791883.000000

For someone who calls themselves a rationalist with a genius level IQ, Yud engages in a lot of magical thinking. According to him, any hole in his logic can be solved by attributing it to the AI being superintelligent in a way that humans just can't understand. That's not an argument, but it's the crux of almost everything he talks about

permalink

u/grotundeek_apocolyps 7 points at 1682795427.000000

Indeed and it's very familiar to anyone who has ever talked to a religious apologist: everything bottoms out at "god did it" or "you just need to have faith".

permalink

u/dgerard 9 points at 1682855897.000000

LessWrong has never not been heavy on AI of the gaps

permalink

u/grotundeek_apocolyps 57 points at 1682694460.000000

Also, I'm with Jeremy Howard on his tweet thread up until [he says this](https://twitter.com/jeremyphoward/status/1651834522525396996?s=20): > PPS: None of this is to make any claim as to the urgency or importance of working on AI alignment. I realize that he's trying to be political here but we really should emphasize this: *the person who invented "AI alignment" believes things about AI that are so nonsensical that they can't even be called wrong.* What could possibly be a more damning indictment of an entire area of supposed research than that? We should be real about this. The whole thing is a fucking clown car being driven by cultists.

permalink

u/YourNetworkIsHaunted 44 points at 1682702405.000000

I think Yud and co have coopted or infiltrated actual useful AI safety/impact discussions strongly enough that dismissing alignment concerns could read as dismissing real concerns about biased models and economic displacement. The (stupid) fear of AI becoming too smart is displacing discussion about the (less stupid) fears of stupid AI being given too much power or impact.

permalink

u/grotundeek_apocolyps 30 points at 1682702845.000000

Yeah I think people should be taking a *real* hard line on that shit. We can dismiss rationalists as absurd doomsday cultists and also validate real concerns about irresponsible AI usage at the same time. In fact doing so is obviously correct and synergistic, as you point out. We have to shut down the cult shit because otherwise it crowds out adult conversations, and it's worth saying so explicitly. Coddling the "alignment" people in order to try to win them over, as Jeremy Howard is doing, works against this goal.

permalink

u/nullc 16 points at 1682731311.000000

Especially because the two areas can have opposed solutions. In our tool-belt for addressing AI exacerbating power imbalances and biases, we can work to make sure everyone has access, including to training their own, and that systems people are subjected to operate as transparently as possible. But if your fear is someone will trigger AI doomsday then you'll advocate for the exact opposite: AI should be highly restricted and only anointed priests should be able to work on it to invent an AI God in their own image which will suppress the existence of competing AI Gods. Their activity needs to be secret to prevent the forbidden knowledge of AI from reaching heretics that may use it wrongly. They also work on different time frames-- the former model of risk happens gradually and we can seek out harms and address them mostly with the extension of existing structures. "No, it's still a crime to hire on the basis of someone's skin color even if you have a machine do it"; while the latter cult fear can only be addressed proactively -- through actions like mass murdering the public in states that allow unsanctioned LLM training via nuclear war. In the first we can make incremental progress, in the latter it's either success or total failure, time for [death with dignity](https://archive.is/eqZx2).

permalink

u/silly-stupid-slut 6 points at 1682734294.000000

I think the issue is that there are real alignment problems researches work on that aren't related to the cult fears. Like I remember one on a program to design walking robot legs that went the farthest, and the program recommended making a robot with legs as long as possible so that when the robot legs failed to walk after their first step they'd fall as far away from the starting line as possible.

permalink

u/grotundeek_apocolyps 4 points at 1682750490.000000

That's not a real fear though. That's just an obviously faulty design. All the "alignment research" is like this: they point to something absurd like that and then try to insist that it suggests that a super AI can go rogue and we should worry about it. But that doesn't follow at all, that's not how any of this works.

permalink

u/silly-stupid-slut 1 points at 1682874398.000000

The impression I get here is that in real research alignment just refers to an engineering failure where the algorithm outputs a faulty suggestion based on a failure of objective training. In the same way that alignment failure in other forms of engineering refers to the breakdown between control inputs and machine outputs (like when you have a misalignment in your car). The cultist shit is just somebody going "You know technically it would also be a failure of alignment if you told the algorithm to do literally anything and it suggested you do something that would kill you."

permalink

u/grotundeek_apocolyps 2 points at 1682912441.000000

You've got it backwards: the original meaning of "alignment" is the cult version. All the other versions are ham-fisted attempts by the cultists at attaining some measure of respectability by equivocating their cult ideas with more ordinary forms of responsible engineering.

permalink

u/silly-stupid-slut 2 points at 1682990449.000000

The literal term "AI Alignment Problem" in the sense of those three words in that order is a cultist neologism, but Norbert Wiener was talking about how computer control systems maximized readings without creating value in the 60s, which is basically just an Alignment Problem with all the magic taken out.

permalink

u/grotundeek_apocolyps 1 points at 1683076659.000000

Exactly; he didn't use a special term for it because it's not a special idea. It's just a dysfunctional design.

permalink

u/Iegalizecrack 1 points at 1682806676.000000

I think there’s a third possibility aside from the two you listed. Which would be that it is capable more than humans to do some things, and someone intentionally uses it to do something horrible (because it would make them richer/more powerful). That is to say that the AI isn’t conscious or anything like that but that it can optimize some things better than humans and whoever gets that working first creates a devastating weapon or computer virus (or, maybe even more likely, a bioweapon).

permalink

u/ImportantContext 12 points at 1682706607.000000

> I realize that he's trying to be political here but we really should emphasize this: the person who invented "AI alignment" believes things about AI that are so nonsensical that they can't even be called wrong. I assume they meant "AI alignment" as in, figuring out how to prevent ChatGPT from telling a suicidal person to kill themselves or other problems like that. For example, technologies like RLHF are product of real research in AI alignment that is done by actual competent people and not rationalist cultists. As a side effect of being designed by competent people, these technologies tend to do useful things in real world, unlike whatever bullshit Yud is peddling.

permalink

u/grotundeek_apocolyps 28 points at 1682716373.000000

I personally consider the entire field of research that calls itself "alignment" to be fundamentally discredited. The fact that it also happens to produce things like RLHF is not an argument against that viewpoint. The reason I think this is because the idea of "alignment" was created specifically by people who fear the coming of the robot god, and they deliberately (and incorrectly) equivocate between their unfounded religious motivations and ordinary engineering work. RLHF is not a revolutionary idea, it's just basic engineering. So too for all of the valid ideas that happen to come from "alignment": the idea that you should design tools so that they do the things you want them to do, and so that they do not do the things you don't want them to do, is just regular engineering. It's been the same ever since people started bashing rocks together to make better rocks. What you end up with is this: everything good that "alignment" produces is so undistinguished from ordinary engineering that it does not need or deserve a special name, and everything that "alignment" produces that is truly unusual is fundamentally worthless because it's aimed at solving a problem that can't exist, and which isn't even well-defined.

permalink

u/ImportantContext 2 points at 1682718022.000000

> RLHF is not a revolutionary idea, it's just basic engineering. So too for all of the valid ideas that happen to come from "alignment": the idea that you should design tools so that they do the things you want them to do, and so that they do not do the things you don't want them to do, is just regular engineering. It's been the same ever since people started bashing rocks together to make better rocks. You could say the same about most of machine learning research, including incredibly important things like transformer architecture. RHLF is an advanced approach to training a machine learning model, not some vague hand-waving about importance of tools being useful. It directly tackles an important issue: how do we fine-tune a model on human preference data without spending enormous amount of time manually reviewing millions of outputs. Calling it "ordinary engineering" reads kinda arrogant to me. I don't disagree with you on the fact that AI god fearmongering is stupid, but I think it's a bit silly to hold a grudge against a specific term and especially to diminish real important research. There's a lot of questionable terminology in any research field and it's fine to not like it, but it's counterproductive to try to deny that a research topic exists just because you don't like its name. Yud and his cult doesn't own this name and since it's stuck, it's going to be used. If you don't believe there should be a separate term to discuss the topic of making models less harmful, biased or just annoying to deal with, I want to hear your suggestion on how I'm supposed to find papers on this specific topic without it having any name or terminology. I get that you're sick of lesswrong-brand AI bullshit, but having a knee-jerk dismissive reaction to any sort of research (even good and important one) is not very healthy or productive.

permalink

u/grotundeek_apocolyps 17 points at 1682718479.000000

We already have good terms for this, such as "responsible AI" and "ethical AI". They're already in standard use in industry, and they comport well with how engineering has always been discussed. I think the language that we use matters, and that we should not be appropriating the language of a cult.

permalink

u/ImportantContext 1 points at 1682719166.000000

In my opinion neither of these terms really captures the meaning of "alignment", but this is of course subjective. I agree with you that language matters, there's a lot of terminology that has very questionable origins. But the only real way to change it is for the community to adopt new terms that fill the same semantic niche for the majority of speakers. Sadly, this is a very hard task to accomplish.

permalink

u/grotundeek_apocolyps 8 points at 1682719256.000000

> In my opinion neither of these terms really captures the meaning of "alignment", but this is of course subjective. I totally agree, and that's why I think they're appropriate! "Alignment" gives people false ideas about what responsible engineering looks like, which is part of why I think it's a discredited concept.

permalink

u/ImportantContext 2 points at 1682719943.000000

No, I mean like, alignment doesn't _have_ to be about ehics or responsibility specifically. It's applicable to situations where a model was trained to do one thing, but actually learns something else, for example a reinforcement learning agent learns to pause the game instead of avoiding projectiles. Anyway, if you don't like that term -- find a better term that makes sense to you and put it in your papers. That's literally the only way to change the status quo.

permalink

u/grotundeek_apocolyps 11 points at 1682747736.000000

That's the kind of thing that really grinds my gears about the "alignment" people. The possibility that a model could end up being fitted to do something different from what we actually want it to do is trivially obvious. Finding ways to avoid that isn't always easy, but that's a fact that is generally true about avoiding faulty designs in any engineering practice. The way the "alignment" people approach this is really overwrought and all of the scholarship I personally have seen on it is, ahem, *unimpressive* at best. I get the very strong impression that these people are trying to solve a problem that doesn't exist and which they do not want to acknowledge that they believe in: the possibility that a superintelligent AI will go rogue.

permalink

u/DELETED 2 points at 1682717816.000000

[deleted]

permalink

u/grotundeek_apocolyps 15 points at 1682719090.000000

The fundamental idea behind "alignment" is that there is something special and different about making AI-based systems in a responsible way that distinguishes this activity from any other kind of responsible engineering work. That is false, and it (*deliberately*) feeds into the false narrative that there is something fundamentally spooky or mystical about AI-based systems that should cause us to fear them more than other kinds of automation. It also gives a false impression about what responsible engineering in AI consists of. It makes people think that AI systems are discrete agents whose goals we have to align with our own, and that's actually false. An AI model is just one small part of a much larger system, and there are already well-established fields of study about dealing with that sort of thing. Have the "alignment" people never heard of things like operations management or control theory? Do they not realize that the engineering of massive, intelligent systems so as to prevent disaster is something that we've been doing for a very long time already? I am also generally against appropriating the language of a cult in a professional setting. Imagine if psychiatrists started cribbing jargon from Dianetics? Even if nothing about the practice of psychiatry changed as a result, you'd think that was a weird and concerning trend.

permalink

u/nullc 11 points at 1682732048.000000

> Do they not realize that the engineering of massive, intelligent systems so as to prevent disaster is something that we've been doing for a very long time already? And in some ways it's much easier in LLMs. Get the control loop on your motor driver slightly wrong and in some corner cases it can literally blow itself apart, throwing shrapnel and setting a fire. People have been killed by incorrectly configured PID loops, you can't really say that for LLMs. (Even someone who committed suicide after talking to an LLM wouldn't be killed by it unless you wanted to argue that the LLM was some kind of super-persuader) LLM's general failure mode is to produce output which looks like the wrong kind of internet garbage instead of the right kind of internet garbage. It's really unfortunate the garbage is harmful to someone, but the same person can go elsewhere online and get the same kind of garbage. Hop on some shady message board and people will happily encourage you to kill yourself and give you instructions on how. At least when the LLM does it the person interacting with it should understand that it's a crazy machine and not a person. Obviously it's good engineering to get the LLM to stop producing undesirable outputs of all kinds, but an LLM spouting hateful internet crap is not a categorical different kind of risk than we've faced before, its one we constantly have and have no real solutions to other than keeping vulnerable people like children offline and otherwise trying to fortify ourselves. And unless we're profoundly stupid in how we deploy, it isn't likely to directly lead to avoidable harm relative to the existing background risk.

permalink

u/embracebecoming 4 points at 1682816773.000000

> Have the "alignment" people never heard of things like operations management or control theory? I don't think that Vernor Vinge wrote sci-fi books about either of those things, so no they do not.

permalink

u/DELETED 0 points at 1682722347.000000

[deleted]

permalink

u/grotundeek_apocolyps 10 points at 1682748412.000000

> Doesn't that weaken your argument, though? I don't think so, because I'm not saying that specialized language isn't necessary for specialized problems, i'm saying that everything real that "alignment" people want to work on is already covered by existing engineering practices - for which we already have names and terminology and expertise. I really can't emphasize enough that the distinguishing feature of "alignment" as a concept is that it consists of religious beliefs. It appropriates the jargon of engineering, but if you strip out the parts of it that are covered by existing engineering practices - which have never truly been integral to "alignment" anyway, except for its adherents' desire to equivocate in order gain wider acceptance - then what remains are supernatural beliefs that have no place in professional engineering practice. It's not the *word* "alignment" that I'm opposed to, it's the concept and all of the baggage that comes with it.

permalink

u/giziti 5 points at 1682719936.000000

> For example, technologies like RLHF are product of real research in AI alignment How is RLHF a product of "AI alignment"?

permalink

u/ImportantContext 10 points at 1682720343.000000

Take a look at section 5 of [InstructGPT](https://arxiv.org/pdf/2203.02155.pdf) paper: > Discussion 5.1 Implications for alignment research > This research is part of our broader research program to align AI systems with human intentions (Chris- tiano et al., 2017; Ziegler et al., 2019; Stiennon et al., 2020). Even though this work focuses on our current language model systems, we seek general and scalable methods that work for future AI systems (Leike et al., 2018). The systems we work with here are still fairly limited, but they are among the largest language models today and we apply them on a wide range of language tasks, including classification, summarization, question-answering, creative writing, dialogue, and others. Our approach to alignment research in this work is iterative: we are improving the alignment of current AI systems instead of focusing abstractly on aligning AI systems that don’t yet exist. A disadvantage of this approach is that we are not directly facing alignment problems that occur only when aligning superhuman systems (Bostrom, 2014). However, our approach does provides us with a clear empirical feedback loop of what works and what does not. We believe that this feedback loop is essential to refine our alignment techniques, and it forces us to keep pace with progress in machine learning. Moreover, the alignment technique we use here, RLHF, is an important building block in several proposals to align superhuman systems (Leike et al., 2018; Irving et al., 2018; Christiano et al., 2018). For example, RLHF was a central method in recent work on summarizing books, a task that exhibits some of the difficulties of aligning superhuman AI systems as it is difficult for humans to evaluate directly (Wu et al., 2021). I'm not a huge fan of some of people cited here but it would be intellectually dishonest to imply that either RHLF has no relation to alignment or that it's insubstantial and non-important.

permalink

u/giziti 9 points at 1682720914.000000

thanks! EDIT: ah, yes, most of these citations are OpenAI people, which makes sense.

permalink

u/DELETED 5 points at 1682750896.000000

[deleted]

permalink

u/grotundeek_apocolyps 4 points at 1682753282.000000

That's another vexing thing about "alignment" people: you get a different sales pitch depending on who you talk to. If you talk to the hardcore doomers then they might say that RLHF isn't "real alignment", but if you talk to people with respectable academic or industrial careers who also happen to be closeted cultists then they'll gladly hold up RLHF as an example of good "alignment" work. Personally I prefer the hardcore doomers. They might be wrong but at least they're not bullshitting anyone about what they believe.

permalink

u/get_it_together1 8 points at 1682709537.000000

I think that many people do see AI as the next nuclear capability, a tool of great power, and so caution is important. It’s unfortunate that it’s called alignment because probably it will not be too challenging to get AI to generally do what we want and the real risk is that some people will use AI to do things other people don’t like. Certainly the idea of a super intelligent paper clip maximizer seems a little absurd, but maybe our coming AI gods will have the potential to be just as neurotic as humans can be. Imagine a schizophrenic John Nash but a gajillion times smarter! (Insert EY’s ridiculous large number notation here).

permalink

u/grotundeek_apocolyps 3 points at 1682715908.000000

The thing is that it's really not a novel observation to point out that you can use tools to accomplish either good things or bad things. This is something that everyone has known since the beginning of the human species. The fact that so many people feel it's necessary to repeatedly say "gosh, but what if someone does something BAD with the AI???" is a pretty clear indication, in my opinion, that their motivation is ignorance-based fear, and nothing more. New things are always scary, and new things that seem like magic are especially scary. Also, there are no AI gods and there never will be.

permalink

u/get_it_together1 3 points at 1682749674.000000

I disagree, if you read The Making of the Atomic Bomb or some other history of nuclear weaponry you quickly come to see just how transformational it was to humanity’s conception of itself and its capabilities, and in fact that these weapons, if used incorrectly, could wipe out humanity. There was even a group of scientists that foolishly predicted that a nuclear weapon would ignite the atmosphere and immediately kill all humans, which we now know to be ridiculous and seems analogous to the EY crowd today.

permalink

u/grotundeek_apocolyps 7 points at 1682750313.000000

The "nuclear bombs will ignite the atmosphere" thing is an apt comparison because they knew it was ridiculous at the time, too. That's why they made the bombs. So too with the fears of a robot apocalypse, we already know it's bullshit. There's no uncertainty.

permalink

u/get_it_together1 2 points at 1682751734.000000

If you accept the analogy then you accept that AI, like nuclear weaponry, could wipe out humanity if used incorrectly. You are right that the smart scientists in the Manhattan project knew we wouldn’t ignite the atmosphere. To be clear I think EY is not so bright and the alignment crowd are just like the atmospheric ignition crowd, but AI can still be like nuclear power in this scenario, and it doesn’t require an AI god to happen. The dumbest way for this to work would be some stupid country putting an AI in charge of nukes for first strike advantage like War Games and fucking it up.

permalink

u/muchcharles 14 points at 1682702116.000000

> For people who don't know, this is insane and completely impossible. It's like being afraid that a math equation is going to leap off the page and stab you to death if you get too clever when solving it, so you have to be careful and make sure that you don't solve it too well. Training involves a forward inference pass and then backprop to find better weights that would have lead to a better forward pass result. So if Yud is already postulating that it will be able to use something like rowhammer to escape during normal inference use, then (with his postulate) it is possible it would do the same during training as well, since that involves essentially the same kind of inference runs as during real use. So, if he is worried about the system software security and physical hardware fallibility stuff, it could fit in with his concerns (though still has a lot of gaps in explaining how and why it could actually happen). If he's worried about actual extortion (e.g. offers to save your daughter from cancer if you let it out of the box), then worrying about the losses going down suddenly during training doesn't make sense. But most likely he's worried about his normal claims about acausal blackmail or something that is far enough in not even wrong territory that doesn't really get any better or worse regarding whether it takes place during training. It just doesn't make any coherent sense anyway.

permalink

u/grotundeek_apocolyps 15 points at 1682702402.000000

In industrial ML the systems that are used for training are entirely separate from the ones that are used for deployment. Any imaginable sequence of events that leads to the robot god escaping the box is exactly equivalent to "and then literal magic happens". Like, even the idea that it could escape the box in deployment already requires a liberal sprinkling of magic, but in that case it's merely totally implausible rather than completely insane.

permalink

u/muchcharles 6 points at 1682703253.000000

OpenAI is all on the same base nvidia architectures for training and inference. For training they need much more expensive interconnect, since the extra derivatives and stuff don't fit, but I wouldn't be surprised if they use the same high interconnect clusters for things like inferring on the 32K context window version of GPT-4. In the past final inference may have mostly been on something slightly different with e.g. quantization and sparsification applied to the trained model, but now with Deepmind's RigL (sparsity that adapts at training time) and quantization-aware training, that isn't true to the same extent. If infering without looking at the result is a risk (I'm not claiming he has demonstrated that), so too is the training loop, on the same grounds (that he hasn't demonstrated).

permalink

u/grotundeek_apocolyps 14 points at 1682703901.000000

I would be genuinely very surprised if they were using the same instances for both training and inference, and they certainly are not using the same software. It's not just a matter of technical feasibility, it's also a matter of practicality and constraints that are imposed by human organizations. There's no way that the training instances are exposed to be able to serve anything to the outside world. And anyway, like i said you need literal magic for it to make any sense. The idea that AI is going to infer a magic super-rowhammer during training, identify the fact that it's being trained, and then use that super-rowhammer to magic itself into the external world is truly insane and not grounded in any kind of real math or science. There's no version of it that isn't utterly insane, we can just be grateful to Yudkowsky for being so transparent about the insanity.

permalink

u/muchcharles 10 points at 1682704467.000000

Yes, Howard is likely assuming Yudkowsky thinks we aren't monitoring for large drops in loss and that means it might start saying dangerous things, helping people set up dangerous bio stuff whatever. He's pointing out that none of that would be a risk during training. Yudkowsky probably isn't even thinking about rowhammer anymore and is thinking that GPT-5 is getting close to opening up a wormhole in during inference that breaks causality. In that context it doesn't really matter if that is in final deployed inference form or on the one where the layers are split up between more machines and have a slightly different memory layout during training. But, even if he is thinking of rowhammer type escape scenarios or tuning the produced EMI to hack the datacenter wifi or whatever he has come up with, training vs inference wouldn't make a huge difference either and if it is a plausible risk in one it is a plausible risk in the other. This is the guy who thinks because he found Bing Sydney's self-portrait attractive it is some kind of profound tragedy about to unfold or something if we don't steer AI the right way, without considering that stable diffusion/midjourney/etc. or were literally trained on 'aesthetic' datasets with mostly only beautiful people (fashion photos from pinterest making up tons of it). Not that any amount of prompted beauty would matter to his point anyway. I don't think Howard knows the depths of the kind of magical thinking he's dealing with.

permalink

u/grotundeek_apocolyps 8 points at 1682716987.000000

> training vs inference wouldn't make a huge difference either and if it is a plausible risk in one it is a plausible risk in the other. See that's just not true, though. There *are* plausible versions of concerns about AI going rogue or doing things we don't want it to do, and in all of those scenarios the difference between training and inference is very salient. If you're worried about e.g. an AI autonomously spamming people with hate speech on the internet then that is literally impossible in training, but it is actually very plausible during inference. The only situations in which the risk is equally salient in both training and inference are the situations in which we're invoking magic as the mechanism of rogue AI behavior, and those aren't even worth discussing. Basically the issue here is that there is no way to interpret what Yudkowsky said such that it isn't stupid. Every possible interpretation is very dumb, even if we're trying to be maximally charitable to his perspective.

permalink

u/muchcharles 8 points at 1682718998.000000

> The only situations in which the risk is equally salient in both training and inference are the situations in which we're invoking magic as the mechanism of rogue AI behavior, and those aren't even worth discussing. Between "is Yud talking about safety risks of AI impacting society near term in ways we didn't consider" vs "is Yud's working assumption that DALL·E 2 already planted a Snow Crash-like mind virus from an image directly into the brain of the first OpenAI engineer to view the output, directing him to insert a back door into GPT-5's training procedure that would enable it to rowhammer its way out of the datacenter," I'm guessing more towards the latter.

permalink

u/EnckesMethod 6 points at 1682752248.000000

Yud seems to think about all ML in terms of some kind of super-RL, so on some level he's probably imagining some kind of agent with continuity of (state/memory/whatever) living in a simulated space, performing whatever task you want it to do (whether practice versions in training or real instances in inference). If your mental picture is of some kind of super-intelligent computer slave that's pretrained on all human knowledge, then you can imagine it hacking its way out during finetuning or deployment.

permalink

u/dizekat 6 points at 1682796897.000000

I think the tremendous density of dumb shit in Yud's tweet is just hard to unpack. He evidently thinks that there's literally an artificial brain of sorts that's trying to minimize the loss function, so when it starts getting too smart the loss function would drop because it would get better at minimizing it. After all these years, he still doesn't have a foggiest clue how ML (which was around before he was even born) works. He has no clue that the gradual improvement isn't some AI thinking and thinking and remembering and so on, perhaps building up a dastardly plan to enter IDDQD into the quantum keyboard, but just the weights shifting downhill little by little. He didn't even bother to read up enough about actual AI to specify that it is just the neural network that is the genie. No, it's still the same old lump AI that is just self improving that had been the staple of his bullshit for decades prior.

permalink

u/korydg 5 points at 1682724374.000000

brb putting a sleep call into my neutron transport equation routine

permalink

u/sue_me_please 3 points at 1682771388.000000

I guarantee if you dig through his posts you'll find some denigrating people for criticizing technology they don't understand, yet he is a man who is terrified of back propagation.

permalink

u/LordNoodles 3 points at 1682764848.000000

I always drench my homework in acetone and solve it while holding a lit zippo in my left hand. It might get me but I’m sure as shit taking it down with me

permalink

u/bighunt16 2 points at 1682882724.000000

> When he talks about the robot god escaping the box, I had assumed he meant that the model would be fitted to serve as a general purpose optimizer itself and would use that ability in deployment to yada yada yada... Of course they don't mean that, because if they did then people could point out that it's a basic engineering problem wholly independent of AI. I may be missing something though, because a lot of criticism seems to be based on the paradigm of separate training, testing, and deployment phases. While this is good practice in many situations, it isn't fundamental to machine learning as a concept and there are models in use today that continue to update during deployment. Obviously this doesn't validate anything EY is saying here, but as someone who has done work in machine learning but isn't an expert I want to make sure I'm not misunderstanding.

permalink

u/grotundeek_apocolyps 2 points at 1682912291.000000

> Of course they don't mean that, because if they did then people could point out that it's a basic engineering problem wholly independent of AI. Haha I say this all the time and it never fails to irritate the "alignment" people. No you're right, it's possible in principle to update model weights after training, and sometimes people do, although it's not a common practice currently. But people are skewering Yudkowsky anyway because it's very clear from his tweet that his ignorance of machine learning is almost total. Like, if anyone didn't know it already, they now realize that Yudkowsky has never once done the most simple "hello world" version of machine learning: set up a dataset, do some training loops on the training data, and then test the model on the test data.

permalink

u/get_it_together1 1 points at 1682749492.000000

To be fair, if you imagine a future state where an AI is deployed and allowed to edit its own neural network algorithm in real time in a manner similar to some recent research where humans are taking models and trying to figure out how to tweak them to achieve specific results then you do end in a situation with a deployed neural network that can interact with the world and modify itself. This feedback loop may not exist today but it’s plausible to imagine in a future world with another order of magnitude (or several) more processing power to throw at the problem. Who knows, maybe we’ll live to see it.

permalink

u/dgerard 3 points at 1682773960.000000

to be fair, if my dick had wings it would be a magical flying unicorn pony

permalink

u/blakestaceyprime 1 points at 1682784363.000000

🎶 One-eyed, one-horned, flying purple [giant hook appears and drags Blake offstage]

permalink

u/DELETED 51 points at 1682692449.000000

[deleted]

permalink

u/DELETED 36 points at 1682701525.000000

[deleted]

permalink

u/serialmentor 25 points at 1682707296.000000

All I need to know about EY I can find on the "all publications" page of the MIRI: [https://intelligence.org/all-publications/](https://intelligence.org/all-publications/) The entire institute has a research output that would be insufficient for an assistant professor to get tenure at a decent university, and they haven't published anything in 2022 or 2023 while AI is exploding left and right. And EY himself hasn't published in forever. What are they even doing?

permalink

u/JasonPandiras 22 points at 1682711484.000000

Apparently their excuse is they want to keep research insights under wraps, in case they are so good that they influence AI research and "accelerate timelines", i.e. speed up the inevitable advent of the robot god. In simpler times people used to call that immanentizing the eschaton. [This is what lesswrongers actually believe](https://imgur.com/6tGuDm5.jpg) [Like, actually](https://imgur.com/kpWaoRU.jpg)

permalink

u/neilplatform1 23 points at 1682712754.000000

The god acausally ate my homework

permalink

u/grotundeek_apocolyps 14 points at 1682717363.000000

They believe that the robot god is smart enough to be able to simulate humans, but they also believe that that they are capable of imagining things that the robot god itself wouldn't otherwise think of. The implication of this is that they think they're actually smarter than the robot god, and also all other humans. The hubris involved in this line of thinking is truly incredible.

permalink

u/epicwisdom 1 points at 1683011548.000000

Ridiculous as many of the specifics are, that doesn't seem to be the actual point. Creating robot-god-proof walls is obviously impossible, or at least close enough to where it'd be stupid to try. Rather, the robot-god-believers want a friendly-robot-god.

permalink

u/JasonPandiras 39 points at 1682688650.000000

https://imgur.com/VsB6ed2.jpg

permalink

u/zazzersmel 34 points at 1682688446.000000

so cringey to see him try to use industry lingo

permalink

u/Warden4Lyfe 38 points at 1682691870.000000

As if there was suddenly no loss would mean anything other than a model not working He has such an elementary understanding of what he dedicates his life to whinging about

permalink

u/scruiser 40 points at 1682696144.000000

It really shows he hasn’t even played around with toy/practice examples of ML, because then he would realize “drop in training loss” sounds absurd as an alarm bell because it so frequently comes up with over-fitting/memorization or errors like mixing in validation data into training data.

permalink

u/dizekat 8 points at 1682827832.000000

It continues to astound me how much of a piece of shit he is. I mean, as a person, not just that he's wrong about things, or arrogant, or deluded or anything like that. In the moral sense. He had *years* to learn at least a little bit about the subject where supposedly he believes that he's the only hope to save mankind from dying and which he's supposedly working on (and which he has the whole cult of his convinced that he's important for). The subject in which the shit he says can quite plausibly get someone killed one day (also see Ted, better known for other work). Between this and the whole thing with his jumping on the rightwing bandwagon about Wuhan, just a reprehensible human being. It's like you find out that some evil head of ISIS guy in some bullshit tom clancy style fiction hadn't even actually read Quran at all, just to make him sound more reprehensible and to try to be less offensive. You're like "well, that's just the writer taking the piss".

permalink

u/grotundeek_apocolyps 38 points at 1682696799.000000

I really want to emphasize that even if we take this fever dream seriously and assume that the loss function suddenly tanks because the model achieves god-like intelligence during training, this shit *still* doesn't make any sense at all because the model can't interact with the outside world during training. It's the same as worrying that a fictional character from a story that you're writing is going to kill you if you give them god-like intelligence, so you need to be careful to not make them too smart.

permalink

u/Warden4Lyfe 33 points at 1682696916.000000

This is such an important point — its like thinking if an NPC in a video game bugs out and isnt worked on carefully, it then 100% will become sentient and escape the video game and take over the world. Pure unfiltered insanity but its given legitimacy for no reason.

permalink

u/gardenmud 16 points at 1682699754.000000

I think the thing people get confused about is how software interacts with the internet. I mean, don't get me wrong, unfettered AI would fill all social media with bot content *real* quick and maybe brute force into some accounts, but that's already happening... how does he picture any actual end of the world scenario? I'm not even *not* a doomsayer. I definitely think AI is contributing to how quickly humans are going to destroy ourselves just because it'll make the whole process more expedient... but it's not like it's because it'll Skynet us.

permalink

u/spectacularlyrubbish 4 points at 1682701460.000000

ST:TNG was a good show.

permalink

u/zazzersmel 5 points at 1682700887.000000

but the agents!! the AGENTS!!!!!

permalink

u/Shitgenstein 31 points at 1682698258.000000

Read an Arthur C. Clarke quote once.

permalink

u/YourNetworkIsHaunted 25 points at 1682700309.000000

Oh, sure, when I say things like that to my therapist I'm "being paranoid" and "need to discuss different medications" because "my anxiety is clearly not well controlled" but when Eliezer Yudkowsky says it...

permalink

u/hypnosifl 13 points at 1682715590.000000

Perry Metzger posted this one on the thread, here were Yudkowsky's [responses](https://twitter.com/ESYudkowsky/status/1650920512061845504): >I'd consider that scenario less probable now now than when I was younger, but still not quite rule it out entirely. "Can't rule it out, can't rely on it" seems a reasonable epistemic position to have about something you're unsure about, as a kid? and >Our model of physics has changed a lot over the last 200 years. It's presently begun to saturate a bit, but even that was less obviously true at the point I wrote that email message (I think as a teenager?). Seems a bit hubristic to be *that sure* there's no big errors left. BTW, do you happen to know where that Yudkowsky quote originally came from? (From the formatting I'd guess it's from the old [extropians](http://extropians.weidai.com) or [SL4](http://sl4.org/archive/) lists but they aren't searchable.) Curious how old he actually was at the time, and how other people responded. I wish Metzger had pressed him on "that was less obviously true at the point I wrote that email message"--does Yudkowsky really think the idea of the universe having "quantum cheat codes" that could be activated by activating transistors in a certain pattern would be significantly less ludicrous according to physics circa the late 1990s than it would be today?

permalink

u/Shitgenstein 15 points at 1682730830.000000

Of course Yud can't simply say he was wrong or naive. It has to be 'less obviously true' and 'less probable.' If straight up magic is within the bounds of 'a reasonable epistemic position' because it can't be ruled out, I don't know what work 'reasonable' is suppose to do for us or what use such a weak critical attitude has for us. But whatever. But he was also recently responding to the reaction to his Times article that he was pessimistic of the possibility of his solution of international cooperation to airstrike rouge datacenters, stating that he expected the 'everyone dies' scenario, so I'm not so sure that he really is on the up and up in those replies. > BTW, do you happen to know where that Yudkowsky quote originally came from? Not at all. I keep my sneers pretty surface-level because, beyond some point, I genuinely don't care. Apathy is my charity. > does Yudkowsky really think the idea of the universe having "quantum cheat codes" that could be activated by flashing diodes in a certain pattern Tbh, I recalled the meme of kids trying wall clip IRL after learning about quantum tunnelling from youtube.

permalink

u/hypnosifl 8 points at 1682730838.000000

I didn't find that particular Yudkowsky post but I did find [this one](http://sl4.org/archive/0106/1547.html) from June 2001, when he would have been just 3 months short of his 22nd birthday: >You also run into the "quantum cheat codes" problem, in which the SI just uses magic and vanishes - modern tech looks like magic to a Neanderthal, let alone a dog. Maybe entering a sufficiently precise internal state can create the conditions that make likely, e.g., involvement in a larger [closed timelike curve](https://en.wikipedia.org/wiki/Closed_timelike_curve).

permalink

u/EnckesMethod 12 points at 1682753385.000000

See, this stuff was cool and thought-provoking in that Greg Bear short story from the 80s about nanotech blood cells undergoing their own singularity in someone's bloodstream, which is partly why it's so irritating that someone carved out a whole cult niche on just saying it straight like it's science.

permalink

u/dizekat 3 points at 1682797298.000000

Not to mention that the nanotech blood cells were already using quantum computations to do their thinking, not to mention full access to self modify the associated "hardware". It's not like they were having to summon shit by flipping huge classical transistors on and off.

permalink

u/shl0ms 5 points at 1682777472.000000

This man will simultaneously invent a hypothetical scifi scenario with a lengthy chain of probabilities he made up and try to create a doomsday cult around it, then tell you it’s hubristic not to think that it’s physically possible for computers to modulate quantum cheat codes of the universe. Also believe the quote is from Extropians.

permalink

u/dgerard 2 points at 1682855826.000000

yeah, that's where Perry knows Eliezer from - Perry says a lotta Eliezer's ideas are Perry's own wild speculations but dumber

permalink

u/Volt 2 points at 1682868609.000000

I like how he's saying Metzger "hallucinated", as if he's talking to an AI or something.

permalink

u/Shitgenstein 10 points at 1682702921.000000

*reasonable criticism of AGI paranoia* "But have you considered https://youtu.be/7SPtlxcS8Ik?t=49"

permalink

u/JasonPandiras 25 points at 1682700284.000000

The thread the screenshot is from is great, especially the part about MIRI having a policy of not explicitly writing down research so malevolent AGIs can’t use it as training data.

Or the part about MIRI having inner and outer teachings.

Or the part about EY claiming he had no idea about such policies when he was manager, and not realizing how if true that’s worse.

Or the ex MIRI chiming in to say well he wasn’t around much at all actually.

permalink

u/biomatter 27 points at 1682702628.000000

damn. i hate when im reading through old stuff on reddit and in the middle of a sparkling, scintillating discussion i find someone has written over all her old comments with nonsense, fragmenting the discussion permanently. what hilarious, moving, romantic, haunting things could she have said? just to wash it all away, in this digital era of permanency? wow. that takes courage. i bet she was really cute, too

permalink

u/blakestaceyprime 21 points at 1682704520.000000

If you don't make sure that the floor of the server room is level, all the loss will pool on one side.

permalink

u/cashto 17 points at 1682706362.000000

Completely common-sense precautions like regularly checking whether the AI has fabricated paper-mache ventilation grates to cover the hole it is digging in the concrete walls of the server room with a spoon.

permalink

u/clueless1245 19 points at 1682696182.000000

So the thing they’re actually worried about is skynet during the training run and then when you forwards pass it using its phenomenal mind powers to somehow hack into the matrix when you eval??? They really get everything from scifi.

permalink

u/acausalrobotgod 18 points at 1682698144.000000

See, this explains why we must be ready to do tactical air strikes on data centers if we hear they’re starting training runs that are too powerful.

permalink

u/cashto 17 points at 1682698535.000000

Train model?
No minimize loss!
Only train

permalink

u/Inevitable-River-540 16 points at 1682694858.000000

The secret to AI safety is to make sure your loss is sufficiently lipschitz.

permalink

u/lampshadish2 11 points at 1682702363.000000

But what if someone accidentally or on purpose drops some of Ada Lovelace’s DNA into the loss function? How will we guard against that?

permalink

u/acausalrobotgod 11 points at 1682703527.000000

I am known to do the wop (Wop)

Also known for the Flintstone Flop

Tammy D getting biz on the crop (Crop)

Acausal Boys known to let the loss function

“MMM, D-r-r-rop!”

Do it

permalink

u/acausalrobotgod 7 points at 1682704746.000000

[what to do when the loss function drops](https://www.youtube.com/watch?v=z5rRZdiu1UE)

permalink

u/dizekat 7 points at 1682785347.000000

Got to love how on the rare occasion that yudkowsky says anything concrete, it reveals utter and complete lack of any knowledge of the topic hes supposedly an expert in

permalink

u/shl0ms 6 points at 1682777075.000000

Glad to have done my part in eliciting that thread 🫡

permalink

u/JasonPandiras 4 points at 1682778987.000000

You are a treasure.

permalink

u/IHateReddit_9001 3 points at 1682710952.000000

https://twitter.com/jeremyphoward/status/1651832655871352832?s=20

While I like dunking on big yud as much as the next guy, being pedantic about shit like calling it a “loss function” instead of just “loss” is peak 🤓 energy and you deserve to be mildly bullied if you do that kind of shit

permalink

u/scruiser 20 points at 1682712232.000000

The complaint about wording is secondary to the rest of the linked tweet thread, which points out watching the loss function for sudden drops is a nonsensical suggestion to the point of being not even wrong.

permalink

u/IHateReddit_9001 6 points at 1682720347.000000

Yeah I know I just felt like complaining

permalink

u/repe_sorsa 13 points at 1682748727.000000

Among my top annoyances in internet discourse is when people pick out some minor point I made in passing and act as if I'm treating it as some serious thing my argument rests on. "People who actually work on this topic and understand it don't talk like this" is absolutely a valid thing to point out. It's not a substantial argument and doesn't discredit anyone by itself, but if your time for deeply engaging with crackpot theories is limited, you should consider this kind of thing a tell. The tweet we're talking about even outright says this specific one could be just a slip, so I really don't see what's supposed to be the problem here.

permalink

u/IHateReddit_9001 -1 points at 1682773716.000000

Thanks for letting me know, next time I do it I'll make sure I'm doing it specifically to annoy you

permalink

u/da_mikeman 1 points at 1683556493.000000

Aside from the training/deployment distinction, I still have trouble seeing wtf Eliezer is saying here. In the context he’s talking about, “sudden drops in loss function” means “the model is getting smarter/better in unreasonably small time”…but better in *what*? The model is not updating its own weights…is it? Are you checking for instances where you somehow stumbled, in the current epoch, upon a magical configuration of weights that, when further updated by the same dumb algorithm in subsequent epochs, will accelerate the rate at which the model becomes smarter?

That’s freaking insane. The usual “well it’s mysterious and superintelligent and you don’t know what it will do” is not going to work here because it’s not an AI god that does the weight updating. Even if, in theory, this configuration exists inside the ‘inscrutable matrices’, why on earth you think you can ever come across it? Is there any earthly reason the training mechanism will have a tendency to converge towards *that*? That’s like worrying that the next time I breathe the air molecules will assume the exact states needed in order to align the motion of all the molecules in the world towards the Sun.

This person really gives me the impression that he really, *really* thinks “would make for passable sci-fi story” and “is actual science” are very close. As in, if you juggle and rearrange in your narrative technical concepts with the freedom a sci-fi writer is afforded, then you come very close to actual research(if you are a genius like him, I suppose). I know this is the most common critique of him, but damn if ‘we are seeing sudden drops in the loss function’ doesn’t come straight from Bioshock-like audio logs.

permalink