So anyway the basilisk has recently taken to delivering my eternal simulated torture in the form of the mind numbing Fridman/Big Yud interview and listening to their content-free ramblings about whether or not there’s a little person inside GPT doing the thinking, and the sonorous nonsense put me in a trance-like state in which I came to a maybe simple realization: when AI dorks describe the black box problem as if we don’t know what the AI is “doing”, they are giving away the fact that they don’t know even the most fundamental principles of the technique. We know exactly what these models are doing: matrix multiplication. They’re taking an input vector and multiplying it by a big matrix to map it to an output vector. That’s it. It’s basic linear algebra at scale. By acting like that is some mysterious, incomprehensible machine thought process they’re obfuscating just how dumb and mechanistic the thing being done actually is. No competent undergrad in any math program anywhere would claim to be so mystified by what a matrix multiplication “is really doing”, and yet these AI charlatans get away with acting like it’s anything like thought. Hell, the obfuscation is so successful that despite knowing exactly this fact, I’ve repeated the “black box problem” as if it were a real problem myself. It’s literally just linear algebra, what the fuck.
So, on the one hand, there is sort of a black box problem in the sense that it’s not always easy to pinpoint exactly which qualities of an input feature are being used to arrive at a model’s output, and there are a lot of smart people who spend their time trying to figure that out. Like sure yes it’s “just matrix multiplication”, but thanks to group theory and whatever we know that matrix multiplication can stand in for literally any function at all, so it’s a nontrivial issue to figure out why a model uses a particular matrix in doing those multiplications. Thought itself is undoubtedly representable in linear algebraic terms (whether that’s what ChatGPT is doing is another matter, though).
On the other hand, it is also true that when rationalists lament being mystified by neural networks it’s often because they don’t know even the most basic elements of linear algebra and calculus. This is especially true of Eliezer Yudkowsky.
It’s not just matrix multiplication – there’s layernorm, GELU and self-attention in there too.
In some sense it’s “just math”, but in another sense the math has a very interesting emergent behavior. It is very hard to point out where the recipe for gluten-free peanut butter cookies exists in the trained model, or to describe how it is encoded.
Yudkowsky submits that unless or until we fully understand how information is encoded in a large language model, we can’t characterize its behavior – which seems rather silly to me, since we hardly understand the human brain at a similar level, and yet we have no problems characterizing the behavior of other humans.
[deleted]
[deleted]
This proves way too much: it’s like saying that you know “what [any] program is doing” once you read the intel chip spec and know the individual CPU operations. Yeah, sure, the operations come from a simple set, applied at scale. But how do these operations in this order lead to the behaviour we see? How could you look at the structure of these operations and identify likely failure risks of the whole system? If/when you do identify a problem, how do you identify a roughly minimal change to the program that fixes it? Saying “they’re matrix multiplications” doesn’t give you any of the control or predictability that we expect from an ‘understood’ technology. No engineer would justify that they “know how the bridge is built” by just writing down Newton’s laws; the need for higher-level explanations here is a real concern.
Well, not exactly, it’s a sequence of matrix multiplications and non linear functions, which can approximate what ever function it wants to.
The function it’s approximating is, in the case of GPT4, the set of human writings its trained on.
For it to be intelligent, it would have to be approximating the process that lead to creation of those writings, but that of course can not be done because the writings were produced by a very large number of people based on their experiences and interactions with the real world.
Lesswrongers love to talk about brains as “computing” something - well, a very large amount of computing power - not just human but also that of the world we observe - went into creation of those writings. Enormously greater than what goes into an “AI”.
This isn’t a situation where you’re trying to approximate a function from a bunch of samples and you end up with somehow equivalent function rather than a lookup table. This is a situation where you end up with a glorified lookup table.
An actual intelligence wouldn’t even be good at the metrics that are being maximized during training.
What is particularly ridiculous about lesswrongers is that they are the most spooked by the most database-ish and least intelligence-ish applications of the neural networks.
generative AI, in general, is having a hay day for the same reason in a broad sense.
We, don’t have a good language to explain this even though there is, obviously, some relevant structure. The fact that we’ve been able to identify *some* relevant features in the systems of our language and vision, but rarely able to articulate the *system* of those features, is precisely the problem we’re seeing.
We invented the thing without having the language, to explain it. This is a relevant and interesting inherent problem to what we’re seeing, buy Yud and the such would love us to think that “can’t done it before, can’t done it now” == “never can done it, time traveling super ai destroys us all.”
The collective cognitive dissonance we’re having is precisely because we don’t have a good language to explain what we’re seeing and it appears to be magic. In the mist of uncertainty and a little priming from a scifi doomer, unfortunately, fear is stronger than imagination.
For one, AI-models aren’t merely doing matrix-multiplication. Even in a simple neural-network, we compose with functions like the sigmoid function or the RELU function which aren’t linear (hence no matrix representative).
This is why it starts to act like a black-box because it isn’t a mere composition of linear functions anymore (which itself is linear and hence just a matrix). And now comes the real-mathematical marvel: If your function isn’t a polynomial, (aka sigmoid, Relu) then you can approximate any continuous function by composing your original function with a matrix and fine-tuning the parameters for that matrix. This is precisely the contents of the celebrated Universal Approximation Theorem.
So we can approximate (locally uniformly) any continuous function (which can be very bizarre, for example the Weierestrass function). So not only AI-models are a black-box, it is unsurprising they are one.
„Why are people so confused by the mind and consciousness? It’s literally just chemistry that happens in the brain you idiots“
Sorry but this is an incredibly poorly thought out take. Yes, obviously at the base level what is being carried out is matrix multiplications. But what you’re engaging in is probably the most on-the-nose example of “missing the forest for the trees” I’ve ever encountered.
The human brain is just electrochemistry. We understand electrochemistry. The physical interactions between neurons can be modeled; that is not the hard part of cognitive science.