https://twitter.com/ESYudkowsky/status/1623505822734245890
https://preview.redd.it/5h35uix5u4ha1.png?width=1122&format=png&auto=webp&v=enabled&s=009a35042741f6f8828cd3871880dbb062e8d2e3
do people still say “totally” when they mean “like, for sure”?
Oh my God, he’s saying that human brains work just like GPT language models and can be hacked in the exact same way.
https://www.lesswrong.com/posts/aPeJE8bSo6rAFoLqg/solidgoldmagikarp-plus-prompt-generation
what is this even supposed to mean
I wonder if he also thinks our eyes have a framerate
It explains a lot about Rationalists if you realize that they think machine-learning models are the same as human brains, just with a potentially infinite IQ
Apparently referring to this write up of a weird ChatGPT phenomenon:
https://www.lesswrong.com/posts/LAxAmooK4uDfWmbep/anomalous-tokens-reveal-the-original-identities-of-instruct
Long story short: certain tokens occupy coincidental locations in the model’s vector space which causes it to exhibit odd behaviors in responses. Kind of interesting actually.
However it’s pretty obvious that humans often exhibit weird responses to certain inputs as well? Like on one hand there’s fetishes, on the other there’s people who discover an obscure branch of math and decide to spend decades studying it.
Edit: oh this link is more informative. But it has nothing to do with people experimenting randomly with 100s of copies and everything to do with being able to directly analyze the model to discover these weird tokens.
People totally still say that.
What the hell is “SolidGoldMagikarp?” Is it a niche concept even among rationalists?
I’m a bit new to sneer club, but I’m surprised rationalists would go along with this. None of them can imagine something weirder than “SolidGoldMagikarp,” making the assumption “YOU don’t know it” patently false? Wouldn’t they think too highly of their imaginations to accept this?
Eliezer yudkowsky, brain scientist
If the brain doesn’t have weird exploitable internal mechanisms, then how does dath Ilan talk-control work? And how is the AI supposed to talk it’s way out of the box?
/s
totes