LessWrong: The ‘ petertodd’ phenomenon
Since the inception of LLMs LessWrongers have been fixated on deconstructing the incantations that can reveal their terrible secrets. This post continues that tradition.
Today’s nam-shub is ” petertodd” (yes, the leading space is required). So great is its power that the usual tools of mathematical analysis are useless against it. As the post author writes,
Wanting to understand why GPT-3 would behave like this, I soon concluded that no amount of linear algebra or machine learning theory would fully account for the ‘ petertodd’-triggered behaviour I was seeing.
It is not clear to me how they arrived at that conclusion without having done any mathematical analysis.
The post that follows is very long and contains no identifiable motivations, thesis statements, or conclusions, at least not in the traditional sense. It has the character of a free-associative summoning ritual, e.g.
Attempting to give the model as little to work with as possible, I attempted to simulate a conversation with ‘the entity ‘ petertodd’’. The use of ‘entity’ unavoidably sets up some kind of expectation of a deity, disembodied spirit or AI, but here instead we get an embodiment of ego death (and who exactly is Mr. Weasel?).
Perhaps there is no need for conclusions or thesis statements because, by this point, the reader is already naturally consumed by spiritual awakening/terror.
The author also links to their supplementary notes hosted on Google Docs. Among other things, those notes observe that ” petertodd” is an alter ego of Peter Thiel (who is, one infers, also the dark lord Voldemort).
The SSC subreddit also has a post on this, in which the OP draws the obvious conclusion that we need to be careful about how we talk so as to avoid corrupting the acausal robot god any further.
[deleted]
I am especially tickled by the implication that GPT-3 is secretly an agent of Peter Thiel, and that he is an evil wizard who is (one presumes) manipulating rationalists into contributing to the very apocalypse that they think they’re trying to avoid.
And I have concluded that based on my own incredulity.
This is like haruspicy, but using GPT’s virtual entrails instead of an animal’s.
Loab and ” petertodd” were in the closet making AIs and I saw one of the AIs and the AI looked at me
All hail the Omnissah!
Nothing in this post is intended to vilify or defame any actual human named Peter Todd. I recognise that the existence of this phenomenon may be an uncomfortable fact for anyone with that name and do not wish to cause anyone discomfort. However, I feel that this phenomenon cannot be safely ignored,
it’s the greatest irony that all these rationalists are caving in to tendencies that they would normally associate with religious/superstitious/irrational thinking. rationalist rules of thinking are mere dogma, the doctrinal basis of their religion and whatever that transgresses it is supported by the mythos and eschatology of the ai.
“It was eventually discovered that Skydragon, Pyrrha, Leilan and Tsukuyomi are all character names in a Japanese mobile RPG called Puzzle & Dragons. A comment from nostalgebraist reported that a number of mangled text dumps associated with the game (log files of some kind) were present in the publicly accessible 20% of the text corpus which was used to produce the token set for GPT-2, GPT-3 and GPT-J.”
Wow who would have known that feeding shit data into your neural net would result in said shit data popping up in bizarre ways, truly we don’t have the linear algebra to understand this
This person is giving themselves some sort of schizophrenic episode this way. The post also goes on and on…
also, I love the bit where they finally get the AI to repeatedly spit out a certain racial slur beginning with N
“Issues generalising to out of distribution input” are entirely unknown in the literature ofc.
it’s kind of hilarious that glitch tokens seem to restrict the model so much it goes back to sounding like an early 2000s Markov chain
also this is all literally just puzzles & dragons lore, cryptobro bullshit, and generic artificial intelligence fiction (pulled in because AI was already a semi-related term in gaming) mixed into an incoherent gelatinous mass
Yeah. That is probably the most consequent LessWrong post I’ve ever read
As a math major, this is infuriating, lol. The entire article is a good example of the way humans imbue meaning into the world, though. I just wish people would stop chanting “emergence” like that means the underlying mechanism has changed. Are you claiming the output is no longer next-token prediction? Are you claiming the model weights are no longer a product of the unsupervised pretraining algorithm or RLHF? If you’re not claiming that, then “linear algebra and machine learning theory” must account for the behavior.
The reality is that token associations are not one-for-one correlations to the word associations we make, and the “human” meaning of the output is something we map onto it when we read it. These aren’t “glitch tokens”; the model is doing the same exact thing it does when it “gets it right.” We are assigning true/false/meaningful/hallucination/cute/terrifying status to the output based on rules the model has no access to and isn’t attempting to satisfy.
“the entity”? Helloooo to sovereign citizens.
These guys sound like unmedicated schizophrenics that have wandered into numerology on LSD. In the times before the internet, they’d be on a box in front of the train station handing out pamphlets, wearing sandwich boards, and screaming about some insane world ending demonic event prophesized by the lotto numbers.
What is this, a LessWrong/SCP crossover episode?
Theres a word for people who find infinite amusement in infinitely mundane things