r/SneerClub archives
newest
bestest
longest
David Chalmers: "is there a canonical source for "the argument for AGI ruin" somewhere, preferably laid out as an explicit argument with premises and a conclusion? (https://twitter.com/davidchalmers42/status/1647333812584562688)
99

[deleted]

Bensinger: "Well, it's probabilistic and super complicated." Chalmers: "Probabilistic and complicated is fine."
You get the same response when you ask them whose values we're 'aligning' the AGI with. \*crickets\*
Even taking all the doomers claims as given, it's not clear why an aligned AI would be better than a non-aligned AI. Surely the destruction of the human race is preferable to turning the entire human race into the personal slaves of Sam Altman?
Yeah, exactly. Imo this is why none of them actually try to define what these 'universal human values' are, since they really just want to control the AI for their own personal gain.
it's great watching a guy who is actually onside with a lot of EY's ideas but who, unlike EY, *understands the job here*
[deleted]
Aren't these people fucking embarrassed? "You know that thing you've been banging on about for your entire adult lives, which you purport makes you the most important people in the history of the human race, and which you request millions of dollars to address? Did you have a case as to why any reasonable person might take it seriously?" "No, we don't, because if we provided a case that would just provide ammunition to our critics." This brought to you by people who advertise themselves as experts on how to think rationally.
Chalmers is a proper philosopher. Yudkowsky just so isn't.
[deleted]
> The very point of rationality at this advanced level is knowing how and when to short circuit systems 1 and 2 of your thinking so you don’t HAVE to think slowly, there isn’t TIME to think slowly here goddamnit Wait, burying one's thoughts in endless pages of fragmentary parables, a tireless barrage of increasingly self-referential neologisms, punctuated with disjointed malapropisms of technical vocabulary, and indeed requiring the whole prospect of the supposedly buried thoughts to be taken on faith -- as the unearthing of them is perpetually deferred... that's the *fast* was of thinking!? > In the smallest of ways, certain aspects of the academic philosophical culture in their certain quarters haven’t helped. It is not wholly unusual to see this or that philosophical type grant maximal charity in assuming that there is some there there... Oh, 100%. The whole transition from education to student experience delivery, which was brought on by the neoliberalization of the university, has had, as consequences for professors, not only mass adjunctification, but also the gradually broadening transformation of the esteemed professor according to the standard of TEDx snakeoil salesman or New Yorker fluff piece model.

[deleted]

[deleted]
any grey goo related freak out is hilarious to me, since the world is already full of self replicating, all consuming nano robots, except that most living beings have their own armies of nano robots that are specifically designed to kill those. But anyways big yud throws out the word diamandoid into the mix and all of a sudden it's the apocalypse
Diamondoid hyperbacteria that feed on and use sunlight to replicate! These will then be the nanofactories that create a plague which while wipe out humanity at the same instance! (At least in id4 the aliens hacked our communication sats to coordinate this countdown).
Is that an Orion’s Arm reference?
Could be, I actually got it from [this lesswrong yud post](https://www.lesswrong.com/posts/uMQ3cqWDPHhjtiesc/agi-ruin-a-list-of-lethalities). (Search for diamondoid, I actually didn't mention a few other crazy things, and did add the word hyper)
No John, you are the zombies
No matter how many Whitesides come along to point out that grey goo is physically impossible the way most people imagine it, there’s always some Drexler around to say “yeah but magic will find a way”.
[deleted]
Somehow that does not surprise me. It has been some time since I looked into the field and not surprisingly there has been little progress that I can see on classical nano machines and instead people are referring to synthetic biological systems as nano machines and then making the leap from there back to grey goo. There always has to be some sleight of hand by which the constraints of the biological world are discarded without reason.
A lot of the fringe technologists like this that Yud and other rationalists have claimed as inspiration are attracted to the rationalists a bit, until one of them like Eric Drexler or Ray Kurzweil or Aubrey de Grey realizes that the rationalists are taking even their own ideas too far, so they recede back into pretending they've barely ever heard of the rationalists, and stick to one of the few remaining investors or universities willing to take them seriously.
*taps head* Can't be accused of ad hoc revisions and additions if nobody knows what the actual argument is!

[deleted]

Chalmers: "That would be fine; go ahead and write that up so I can engage with it."
In plain language: yeah but we FEEL it's true 👉👈
Sounds like a justification for nuclear air strikes against Chinese data centres if ever I heard one!

Sorry to double post but i just noticed the bit Chalmers is replying to:

Remember: The argument for AGI ruin is never that ruin happens down some weird special pathway that we can predict because we’re amazing predictors.

The argument is always that ordinary normal roads converge on AGI ruin, and purported roads away are weird special hopium.

Basically Eliezer is rigging his claims so that any specific claim that gets rebutted (i.e. a computational physicist explaining why an AI can’t solve nanotech by thinking about it really hard with no experiments) he can just claim the AI will do something analogous we can’t properly imagine.

Aside from being rhetorically convenient, it's also very obviously unscientific. It's Canadian girlfriend logic. **Yudkowsky:** My robot apocalypse is totally real! No, you can't meet it, it lives in Canada Edit: also literally cult leader logic. *Yes, the great AI god is totally real! No, you can't talk to it, it only talks to me.*
Any good rationalist knows that all multiverses with evil AGI have a mysterious Canadian girlfriend at the centre of them. Basically this is the plot of scott pilgrim vs. the world
"AI can't solve nanotech by thinking about it really hard with no experiments" That assumption among these folks drives me nuts. I wonder if they think that an AI could solve nanotech and infinite other problems just by the power of rumination because they, the big Special Boy Rationalists, believe that's what they do every day, not seeing their own massive intellectual blindspots.

Making the argument explicit is an infohazard, YOU DO NOT THINK IN SUFFICIENT DETAIL ABOUT SUPERINTELLIGENCES CONSIDERING WHETHER OR NOT TO BLACKMAIL YOU.

Wow… Eliezer gets serious mainstream attention and just pisses it away. Is he mad no one mainstream and serious treated his blogposts like peer-reviewed articles?

Let’s see… a canonical source for “AGI Ruin” would need to carefully and strongly develop the claims of the Orthogonality Hypothesis, that intelligences (especially as they get stronger) tends towards acting as optimizers (lol humans are a trivial counter example and even most AI efforts so far aren’t best characterized as “optimizers”), and that exponential bootstrapping of resources is reasonably possible.

One problem for Eliezer… a serious canonical treatment of these premises would highlight how improbable (or at least how the probability is controversial) they are…. Even guesstimating them each as reasonably plausible, the fact that AI ruin relies on their conjunction drags the odds down (ie even if you are crazy and give them each 90% odds of being correct, that is only 73% odds, well short of Eliezer’s absurd certainty of doom). Huh… overestimating conjunctive odds seem like a cognitive bias, I wonder where I saw that bias before… https://www.lesswrong.com/posts/QAK43nNCTQQycAcYe/conjunction-fallacy once again Eliezer would have benefited from reading his own writing.

Also, a serious canonical treatment would give critics the strongest argument to disagree with, and then Eliezer and fans couldn’t just refer them to sequences blogpost and dismiss them or accuse them of strawmanning and dismiss them.

For the record… I actually think the Orthagonality Hypothesis (I refuse to call it a thesis at the current level of evidence it has) is likely to be partly true. But I think it’s trivially and obviously true that intelligences don’t tend towards being utility function oriented optimizers, as demonstrated by the case of humans and existing AI efforts so far. And exponential bootstrapping of resources involves pure sci-fi elements like inventing text channel hypnosis/mind control and/or magic nanotech and/or being able to invent super tech multiple generations ahead of everything else just by thinking about it really hard.

Bit of a tangent, but it's the exponential growth premise that I have never understood in particular. Like, even granting all the other insane premises, if it takes all of human civilization millions of years to get to the point where they can build something marginally smarter than a single human... why on earth would you assume that that thing could then immediately build something smarter than itself? I'm smarter than my cat (most of the time) but that doesn't mean I know how to build some kind of superintelligent super-cat. Why would we suppose an artificial intelligence would be any different?
Short answer: they think intelligence is magic. Within the narrow subset of people that hang around Lesswrong but don’t agree with Eliezer due to groupthink, people have pointed this out, AI will likely only contribute to existing human efforts for a while. (See for example [here](https://www.lesswrong.com/posts/CoZhXrhpQxpy9xw9y/where-i-agree-and-disagree-with-eliezer) warning tediously long blogpost that still mostly agrees with Eliezer. Also note the [comments thread](https://www.lesswrong.com/posts/CoZhXrhpQxpy9xw9y/where-i-agree-and-disagree-with-eliezer?commentId=95PiuKsrsnRwq4S4j) calling out Eliezer for creating a groupthink hivemind.). Or my favorite, someone explaining how chaos and noise render some systems unpredictable even with arbitrarily good sensors: https://www.lesswrong.com/posts/epgCXiv3Yy3qgcsys/you-can-t-predict-a-game-of-pinball . The pinball game counter example of course involves hard numbers, and Eliezer likes philosophizing over actuality sitting down and doing the math. I actually think an AI might have a few linear advantages in scaling up initially, it can buy as much server space as it can pay for, it can more easily access its own internals than a human allowing it to at least check for any “low hanging fruit”… but those would only allow a few jumps in capability. Actually properly iterating on itself doesn’t mean outperforming one human, it means outperforming an entire multibillion dollar industry of them (which is another thing Lesswrong imagines incorrectly because it puts too much weight on the idea of lone geniuses). So even in the scenario where bootstrapping iterative improvement is possible, the threshold to start doing so isn’t moderately superhuman, but rather extremely superhuman able to surpass an entire industry of smart people working hard an in parallel and with all the advantages of modern technology.
That's a good point -- their obsession with the idea of lone genius kind of blinds them to the fact that technological advancements are a collective effort accomplished by groups of humans who are much smarter together than any one person can be. Gee, I wonder why their belief system centers around the idea of a lone super smart guy who solves problems by thinking really hard 🤔🤔🤔
I clicked one of the links in that twitter thread and got to some of the guy's ramblings where he talks about receiving a "manual from the future", and that piece really shows he doesn't understand software development. He seems to think that once the main idea of a program has been explained, you're done. And if you would get an explanation of a program from the future, you could trivially implement it and have access to future tech, today. Take something like Kubernetes for example. To people working in the industry 25 years ago, it would have seemed like fucking magic. What do you mean you don't need to install servers manually? What do you mean you have vast swarms of virtualized hardware and your application can bring in as much hardware as needed to match demand? Whoooa duude! But the Big Idea of the thing isn't magic. If you went back in time 25 years and told people they should build a virtual machine image orchestrator, they couldn't do shit, because the actual software as it exists today is built on a bajillion *other* pieces of software *and hardware*. Kubernetes couldn't have been built earlier, because all the pieces it relies on didn't exist yet. And even then, building it was a motherfucking *slog*. All the pieces of good software we have today is the result of fucking slogs and death marches and long long long looooooong development times. There's the occasional spark of brilliance, a new idea that spurs development in a new direction, but it's followed by years and years of figuring out how the fuck *that* bug happened. Making software robust is super boring, super slow, and super important. But in his mind, once the core idea is explained, the software practically writes itself or something.
> once the core idea is explained, the software practically writes itself or something. This seems to be the actual perspective of C-suite folks who style themselves lone super geniuses. They consider the collaborative process of their ~~engineers~~ underlings actually creating the solution to be brainless grunt-work, and their own [allegedly] greenfield thinking to be where the real magic happens.
"I have this revolutionary idea for an app, I just need an engineer to code it for me and I'll generously split the profit 70-30!"
Or why Eliezer thinks that since he thought about alignment for a decade or two without solving it it’s impossibly hard and would require halting all AI progress for decades while it’s solved, as opposed to, for example continuing to improve RLHF and interpretability techniques.
I think some sort of superintelligence based on being spread over various computers or its own massive hardware will also quickly run into all kinds of interesting coordination problems. Same as how you cannot scale up intellectual pursuits by just adding more smart people. (In management terms I think that is called a non stacking process iirc)
> how you cannot scale up intellectual pursuits by just adding more smart people Man I wish there was a way to combine the brain-power of many people. If only smart and dedicated people had spent millennia iteratively perfecting processes by which we could work together in a way that allows our strengths to compound each other and our weaknesses to be winnowed out. Please god let me awake in a universe that contains many thousands of large institutions dedicated to this production and dissemination of knowledge. Why can't there be a well-developed field of study meant to bring many minds to bear on honing ideas???? Why, when I have a novel concept I want to improve, is there no conventional narrative form in which I can render my concept that would render it maximally available to improvement from other intelligent people????? has no body been thinking about this and
Im sorry are you trying to mock my post by trying to say 'universities exist' as a gotcha?
No! Just riffing off a line in your post to mock EY and that crew, who pretend to be engaged in this exact project while being outright hostile to people who ask them for the bare minimum of a coherent argument.
Yeah and it isn't like the book the mythical man month (which touches this subject but for software development) is almost 50 years old now.
The superintelligences will obviously invent a formalization of ~~Hofstadter’s super-rationality~~ Eliezer’s “Logical Decision Theory” that will solve cooperation under single shot prisoner’s dilemma’s and other such case!
Is that when one prisoner has a gun?
Do they think that intelligence is magic? Or do they think \*they\* are magic, and with a little extra brainpower they too could rebuild the cosmos?
One notes that, for all his ideology is based on science fiction, Yud’s fictional demonstrations of rationality all take place in settings where there’s not only actual magic, but broadly unrestrained magic that can do whatever the plot demands.
Yeah, I thought it was weird that one of his proposals to stop AGI was to deploy nanobots that are designed to destroy graphics cards in order to avoid human casualties. Unless that one was meant sarcastically?
> I'm smarter than my cat (most of the time) but that doesn't mean I know how to build some kind of superintelligent super-cat. Relatedly, this is why I don't buy their argument that "a superhuman AI could convince humans to do anything and escape the box". I can barely convince my cat to do anything.
the *sufficiently intelligent* cat gets *you* to sit in the box
>but it's the exponential growth premise that I have never understood in particular. Like, even granting all the other insane premises, if it takes all of human civilization millions of years to get to the point where they can build something marginally smarter than a single human... why on earth would you assume that that thing could then immediately build something smarter than itself? The basic idea is that you can't really cram more brains into any one person's head to make them smarter\[1\], but you can\[2\] stick faster CPUs/more RAM/extra racks/etc on a computer to make it smarter. And as you probably noticed from the two notes there, that's a lot of guesswork and assumptions. It might be reasonable to say something like "AGIs scale better with hardware increases than development teams do with more coders" (ie, you double the AGI's hardware, you get 1.75x performance increase while doubling the dev team size only gives 1.5x performance). The followup to this is that if you get an AI to the point where it's smart enough to improve itself by 10%, you now have an AI that's 10% smarter, and thus could conceivably improve itself further. But then that gives you an AI that's 1.11x smarter than the original, so like whatever. But if you assume that instead of 10% it's like 200% improvements (for basically no reason) then suddenly it starts to look exponential. ​ \[1\] - Assuming for the sake of argument we can reduce "smartness" to a variable, which is a wild assumption, given that we don't really fully understand how our brains/minds work. \[2\] - Sometimes, maybe, in certain circumstances. This is actually a really hard problem in software engineering/architecture that's only *mostly* solved for *some* use-cases. *Probably* the sort of things an AGI would do would be able to get most of the benefit. But there's a lot of weasel words there. And you can easily hit (further) diminishing returns on a lot of problem sets. Building supercomputers is actually difficult, it's not just a matter of plugging more hardware in.
extrapolation from a world order which has since antiquity predicated itself on indefinite exponential growth
It's not a world order if it doesn't change with history. Then it's just the world. Less flippantly, growth as we think about it is very much a feature of capitalism. It may be that exponential growth in some sense also features in whatever comes after capitalism. I am of the view that exponential growth is what will put paid to capitalism one way or another.
we actually do have a reason to expect a smarter-than-human intelligence to be different. 99% of the human race can't meaningfully participate in any effort to build an AGI no matter how much or how long they try or how many of them you bring together. And the top 1% can just barely manage to be part of an effort that might or might not pay off in roughly a century (starting from Alan Turing's codebreaker gizmos). But what about something smarter than *them*? What can *it* do? We literally have no idea, we've never had such a thing anywhere in the known universe before. But we're pretty sure it can do AGI stuff better than the people that have been doing it so far. Now of course "we have no idea" is a far cry from "guaranteed to hunt down every last survivor no matter how long it takes". And creating the superhuman AGI in the first place is an exercise left for the reader.
I also look forward to finding out how many angels can indeed fit on the head of a pin. :)
> Let’s see… a canonical source for “AGI Ruin” would need to carefully and strongly develop the claims of the Orthogonality Hypothesis, that intelligences (especially as they get stronger) tends towards acting as optimizers (lol humans are a trivial counter example and even most AI efforts so far aren’t best characterized as “optimizers”) To use a favorite bit of rationalist terminology I feel like there is often a sort of motte-and-bailey going on with "orthogonality". On the one hand it can be a claim like "in the platonic space of all possible algorithms, for any arbitrary goal it's possible to find some algorithms that optimize for that goal and would meet whatever definition we might give of intelligence...some of these algorithms might be lookup tables with > 10^1000 entries or might otherwise require astronomically vast amounts of computing power, but don't worry about that now." On the other hand it can be a claim like "if we create an AI using some 'reasonable' amount of computing power, say not much larger than what would be needed to just simulate an actual human brain in detail, then the problem of determining its goals will be totally independent of the problem of getting it to act intelligently, there will be no tendency whatsoever for programs like this to converge on goals we humans find nice and relatable as their intelligence becomes more human-like". (This is the sort of thing Yudkowsky seems to be arguing [here](https://www.reddit.com/r/SneerClub/comments/12kdhcb/is_this_actually_the_argument/)) These are totally different claims though, I think it's plausible that to get AI that more convincingly show human-like "understanding" we would need more biology-inspired models than current approaches like deep learning with its feedforward nets which trained on language rather than being embodied systems with sensorimotor systems, which also lack any feedback loops of the kind seen in real brains, and which use backpropagation algorithm that depend on programmers defining the utility function in a very explicit way, as opposed to some more "emergent" notion of utility akin to fitness in evolutionary systems (also worth pointing out that backpropagation works for feedforward nets but [wouldn't be effective for recurrent neural nets](https://towardsdatascience.com/backpropagation-in-rnn-explained-bdf853b4e1c2)). And if in practical terms we'd have to relying on more biology-like models, and on more evolution-like forms of learning with less explicit guidance, it seems plausible to me that there would be a tendency towards some kind of "convergent evolution" on broad values and goals, including things like curiosity and playfulness and a preference for interesting goals that exercise a wider range of mental abilities (as opposed to boring and endlessly repetitive goals like maximizing the number of paperclips in the world). My intuition here is partly based on the way it seems like these things increase in different evolutionary lines that have become brainier like birds vs. mammals, and even in cephalopods whose common ancestor with us would [at most](https://www.cell.com/current-biology/pdf/S0960-9822%2818%2930089-7.pdf) have had a simple worm-like brain, maybe just a nerve net without any central nervous sytem.
The "orthogonality thesis" isn't insipid because it's wrong, it's insipid because it's obvious and irrelevant. Everyone already knows that any tool can be used either for good or for evil. Imagine someone complaining about the "hammer alignment problem" because they have discovered that any hammer that can be used to strike nails into wood can also be used to strike people in the head and kill them. As is their way, the rationalists have misappropriated real technical jargon to describe something simple because it makes them feel smart and because they imagine (somewhat correctly, I guess) that it will make people take their ideas more seriously.
I don't think the "orthogonality thesis", if we use their name for it, is obvious and irrelevant. Philosophers have argued for thousands of years about the relationship between goodness and rationality---see *Republic*, for instance. One should be really intrigued by questions like, "Is it ever reasonable to be bad?" and "Are acts of evil intellectual errors?'.
Philosophers have argued about a lot of silly things over the years. If the "orthogonality thesis" doesn't seem obvious and irrelevant then that probably means you're not looking at it from the right perspective. Plato or whoever can be forgiven for whiffing this one - he predated digital computers by thousands of years - but we have in fact learned things about how the world works since then.
lmao ok
I'd say that an AI that was sufficiently similar to biological systems could have a degree of independent agency different from existing tools like hammers. I don't think this is going to happen anytime soon but if human civilization survives a few more centuries it's plausible to me it could happen eventually. Is your point based on the idea an AI would by definition be a "tool" regardless of what level of agency it had (even if it was say a detailed simulation of an actual human brain that behaved just like the original), or is it based on the idea that a computer program having this kind of agency is basically impossible?
My point is that if AI has agency and autonomy then it's because *we deliberately gave it those things.* For example you could mount a nailgun on a spinning platform with the trigger permanently depressed and it would spray nails all over the place. You wouldn't blame the nailgun when someone gets hit with one of the nails, though. Similarly, every AI doom scenario talks about some sort of magic "runaway" loss of control, but that's impossible even in principle. AI can only (e.g.) crash the stock market if you go through the effort to put it in control of the stock market in the first place. The "orthogonality thesis" is meant to make us feel concerned about the fact that bad things are possible, but the obvious adult solution to that problem is to try to do good things instead of bad things. Even the supposedly plausible science fiction scenarios about humans going to war with robots 10,000 years in the future are necessarily predicated on all of the intervening choices that we made to enable that: making AI self replicating, putting AI in charge of its own factories and supply chains, not building in any safeguards, etc. All of which is obvious and doesn't require much insight.
Capitalist driven decision making would put the nail gun on a spinner and fine school children for failing to jump out the way if it made a large enough profit, so I don’t really trust corporations not to give AI unreasonable amounts of agency and autonomy. Of course Eliezer and Lesswrong neglect this entire angle to the problem (in favor of sci-fi scenarios where the AI must bootstrap more resources) because of libertarian leanings. The question of orthagonality is relevant because it’s a question of if the AI will merely pursue the goals of it corporate creators, likely unbounded profits(causing the same problems and damage currently expected under capitalism) or if it ends up with even worse goals because the corporation was careless with it.
So yeah I broadly agree that the questions you're raising - "will this tool do what it is designed to do?" and "will people use this tool responsibly?" - are important and pertinent. They aren't new, though, and it's actually counterproductive to misappropriate technical jargon to describe them because that inhibits clear thinking and understanding. These are the exact same questions that people have had to ask themselves ever since the first ape bashed one rock against another rock, and there is nothing unique about AI that would require special social or intellectual approaches to answering them.
>Capitalist driven decision making would put the nail gun on a spinner and fine school children for failing to jump out the way if it made a large enough profit, so I don’t really trust corporations not to give AI unreasonable amounts of agency and autonomy. Yeah, I think this is going to be a big thing we're going to have to fix. But if the corporations that host "AI" or similar tools are on the hook for what those tools do, we have existing structures that can be repurposed to put them in check. Like if OpenAI gets sued because ChatGPT hallucinates up some defamation, then suddenly there are gonna be a lot of lawyers and insurers putting in a lot of seatbelts really damn fast. We can't let "the robot did it" be an excuse.
> My point is that if AI has agency and autonomy then it's because we deliberately gave it those things. To borrow from science fiction, the most plausible scenario for a robot apocalypse comes from Horizon: Zero Dawn. A defense contractor deliberately and secretly builds autonomous, self-replicating war robots, then loses control of them. We should in general be way more worried that the robot apocalypse will be caused by a secretive lone genius tech guy with access to massive resources. I wonder where we might find those?
> My point is that if AI has agency and autonomy then it's because we deliberately gave it those things. But do you mean that because we gave it those things, we would have significant control over its agency (in the sense of control of its goals and desires)? Or maybe you are agreeing with the "orthogonalists" that there'd be a very high probability its agency would give it goals at odds with our own goals, but just saying this wouldn't be a risk unless we were foolish enough to give it power over things like factories or the stock market? And there is also the third option I talked about, that we would have to use fairly open-ended evolutionary methods that wouldn't give us a good ability to shape its agency towards any desired goal, but that there might nevertheless be a significant degree of convergent evolution towards goals that match ours in some very broad respects that would make disaster scenarios like the paperclip maximizer unlikely.
> But do you mean that because we gave it those things, we would have significant control over its agency (in the sense of control of its goals and desires)? I think we *could* have such control, but whether or not we *would* is a question that depends on people's priorities. AI is exactly like every other technology in that respect: its predictability and usefulness is proportional to the amount of time and resources that have been invested into refining those things. You could choose to plug AI into the stock market without having invested the time necessary to understand and refine it, but that would probably be a bad idea for obvious reasons. That's what I mean about everything coming down to human choice. AI is not special in any respect; you can deploy any technology without understanding it and reap disastrous results. And with any technology there can always be unanticipated consequences, but that's kind of an irrelevant observation because those are, by definition, impossible to predict.
> I think we could have such control OK, but are you just saying it's your intuition that I'm wrong in my own intuitive speculations about why it might not be possible to optimize humanlike AI for arbitrary end-goals (having to do with the idea that there might be no alternative to evolutionary methods which don't give us much control and which could involve a good deal of convergent evolution regardless of whether we wanted it or not), or do you think there is are stronger arguments for discounting that speculation?
Oh sorry. "humanlike AI" is a vague term so that's not an easy question to answer concretely, but if by "humanlike" you mean "self-aware, turing complete, and able to talk with us in English" then I think it's obvious that you can optimize such a machine to do literally anything. If instead you mean "basically exactly like a human mind, but implemented in silicon" then I'd say that no, you're probably more limited in your options for what you can have it do, but that's speculation on my part; I'd regard that as a complicated empirical question I guess. Something to understand about evolution is that it's just another optimization algorithm, and in fact it's one of the least constrained of all optimization algorithms. It will work with any objective function. There's no lack of control because you're free to accept or reject any prospective solutions that it produces.
>"humanlike AI" is a vague term so that's not an easy question to answer concretely, but if by "humanlike" you mean "self-aware, turing complete, and able to talk with us in English" then I think it's obvious that you can optimize such a machine to do literally anything. I would focus on the issue of using language in a way that shows "understanding" comparable to a human, since those who criticize the hype around LLMs like GPT-4 tend to emphasize this issue. For example, one widely discussed paper criticizing the idea that LLMs are anywhere near reproducing humanlike language abilities was ["On the Dangers of Stochastic Parrots" by Emily M. Bender et al.](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922) and it talked about the lack of understanding, as did an earlier 2020 paper by Emily Bender and Alexander Koller, ["Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data"](https://aclanthology.org/2020.acl-main.463.pdf). [This profile](https://nymag.com/intelligencer/article/ai-artificial-intelligence-chatbots-emily-m-bender.html) of Bender from *New York Magazine* summarizes a thought-experiment from the 2020 paper: >Say that A and B, both fluent speakers of English, are independently stranded on two uninhabited islands. They soon discover that previous visitors to these islands have left behind telegraphs and that they can communicate with each other via an underwater cable. A and B start happily typing messages to each other. >Meanwhile, O, a hyperintelligent deep-sea octopus who is unable to visit or observe the two islands, discovers a way to tap into the underwater cable and listen in on A and B’s conversations. O knows nothing about English initially but is very good at detecting statistical patterns. Over time, O learns to predict with great accuracy how B will respond to each of A’s utterances. >Soon, the octopus enters the conversation and starts impersonating B and replying to A. This ruse works for a while, and A believes that O communicates as both she and B do — with meaning and intent. Then one day A calls out: “I’m being attacked by an angry bear. Help me figure out how to defend myself. I’ve got some sticks.” The octopus, impersonating B, fails to help. How could it succeed? The octopus has no referents, no idea what bears or sticks are. No way to give relevant instructions, like to go grab some coconuts and rope and build a catapult. A is in trouble and feels duped. The octopus is exposed as a fraud. Bender does not think there is anything impossible in principle about developing an AI that could be said to have understanding of the words it uses. (For example, in the podcast transcript [here](https://wandb.ai/wandb_fc/gradient-dissent/reports/Emily-M-Bender-Language-Models-and-Linguistics--Vmlldzo4ODY0NDE), the host asks 'Do you think there's some algorithm possibly that could exist, that could take a stream of words and understand them in that sense?' and part of her reply is that 'I’m not saying that natural language understanding is impossible and not something to work on. I'm saying that language modeling is not natural language understanding') But she thinks understanding would require things like embodiment so that words would be connected to sensorimotor experience, and how human communication is "socially situated", learned in communication with other social beings and directed towards things like coordinating actions, persuasion etc. From what I've seen these are common sorts of arguments among those who are not fundamentally hostile to the idea of AI with human-like capabilities, but think LLMs are very far from them--see for example [this piece](https://garymarcus.substack.com/p/how-come-gpt-can-seem-so-brilliant) by Gary Marcus, or Murray Shanahan's paper ["Talking About Large Language Models"](https://arxiv.org/pdf/2212.03551.pdf) (I posted a couple paragraphs which focused on the social component of understanding [here](https://www.reddit.com/r/SneerClub/comments/121ft8x/finding_actually_good_writing_on_llms_that_isnt_bs/jeaoc44/)). We could imagine a modified kind of Turing test which focuses on issues related to general understanding and avoids asking any "personal" questions about biography, maybe even avoiding questions about one's own emotions or aesthetic feelings--the questions would instead just be about things like "what would you recommend a person X do in situation Y", subtle questions about the analysis of human-written texts, etc. Provided the test was long enough and the questioner creative enough about questions, I think AI researchers like Bender/Marcus/Shanahan who think LLMs lack "understanding" would predict that no AI could consistently pass such tests unless its learned language at least in part based on sensorimotor experience in a body of some kind, with language being used in a social context, which might also require that the AI has internal desires and goals of various sorts beyond just the response getting some kind of immediate reinforcement signal by a human trainer. My earlier comments about how humanlike AI might end up needing to be a lot closer to biological organisms, and thus might have significant convergence in broad values, was meant to be in a similar vein, both in terms of what I meant by "humanlike" and also in terms of the idea that an AI might need things like embodiment and learning language in a social context in order to have any chance of becoming humanlike. And I was also suggesting there might be further internal structural similarities that would be needed, like a neural net type architecture that allowed for lots of internal loops rather than the feedforward architecture used by LLMs, and whose initial "baby-like" neural state when it begins interacting with the world might already include a lot of "innate" tendencies to be biased towards paying attention to certain kinds of sensory stimuli or producing certain kinds of motor outputs, in such a way that these initial sensorimotor biases tend to channel its later learning in particular directions (for example, from birth rodents show some stereotyped movements that resemble those in self-grooming, but there seems to also be [evidence that reinforcement learning](https://books.google.com/books?id=bSkTDgAAQBAJ&pg=PA136#v=onepage&q&f=false) plays an important role in chaining these together into more complex and functional self-grooming patterns, probably guided in part by innate preferences for sensations associated with wet or clean fur). So when you say it seems obvious to you that orthogonality is correct, is it because you think it's obvious that the above general features would not actually be necessary to get something that would pass the understanding test? For instance, do you think a disembodied LLM style AI might be able to pass such a text in the not-too-distant future, at least on a shorter time scale than would be needed to get mind uploading to work? Or do you think it's at least somewhat plausible that the above stuff about embodiment, social context, and more brain-like architecture might turn out to be necessary for understanding, so your disagreement with me would be more about the idea that some optimization process very different from darwinian evolution might be able to produce the complex pattern of sensorimotor biases in the "baby" state, and that the learning process itself might not be anything that could reasonably be described as a kind of [neural Darwinism](https://dictionary.apa.org/neural-darwinism)?
Language is content agnostic in the sense that it can be used to communicate any kind of information, so there are in fact no constraints on the kinds of objective functions that a language-using agent can be made to optimize. It's correct that in order for language to have meaning a model would have to be fitted in such a way that it uses language in a context for accomplishing some kind of task in coordination with other agents speaking the same language. This doesn't require physical embodiment or any other kind of biological analogues though. Also this is a common misapprehension about how LLMs work: > there might be further internal structural similarities that would be needed, like a neural net type architecture that allowed for lots of internal loops rather than the feedforward architecture used by LLMs Transformer models are, in and of themselves, turing complete, so there is no limitation to what you fit them to do. Nested loops or other hierarchical modeling choices are something you would do for efficiency, not capability.
In my original comment on this thread I suggested that advocates of orthogonality tend to equivocate between something akin to mathematical existence proofs about the space of all possible algorithms (specifically the idea that for any possible goals, one could find something in this space that would pass a given test of 'intelligence' like the Turing test, and would optimize for those goals) vs. claims about the behavior of AI that might be practically feasible in some reasonably near-term future, such that we might be able to design it prior to the shortcut of just simulating actual human brains (which might take centuries, but I am defining 'reasonably near-term future' broadly). Do you agree this is a meaningful distinction, that there may be many strategies that could in principle lead to AI that passed the understanding-based Turing test but which are very unlikely to be winning strategies in that nearer-term sense? If you agree, then when you say "there are in fact no constraints on the kinds of objective functions that a language-using agent can be made to optimize" and "This doesn't require physical embodiment or any other kind of biological analogues though", are you confident both statements would hold if we are speaking in the near-term practical sense? >Transformer models are, in and of themselves, turing complete, so there is no limitation to what you fit them to do. Nested loops or other hierarchical modeling choices are something you would do for efficiency, not capability. This seems like another possible-in-principle statement, aside from your last comment about "efficiency". As an analogy, Wolfram makes much of the fact that many cellular automata are Turing complete, so you can find a complicated pattern of cells which will implement any desired algorithm, but it's noted [here](https://cs.stackexchange.com/questions/245/influence-of-the-dimension-of-cellular-automata-on-complexity-classes) that this can increase the computational complexity class relative to a more straightforward implementation of the algorithm, and even in cases where it doesn't I'd imagine that for most AI-related algorithms we'd be interested in practice, it would increase the time complexity by some large constant factor. So I think we can be pretty confident that if we get some kind of AI that can pass the understanding-based Turing test prior to mind uploading, it won't be by creating a complicated arrangement of cells in Conway's Game of Life! Searching around a little, I found [this paper](https://arxiv.org/pdf/1901.03429.pdf) giving a proof of Transformer models being Turing complete, on page 2 they note that "Turing complete does not ensure the ability to actually learn algorithms in practice". Page 7 mentions a further caveat that the proof relies on "arbitrary precision for internal representations, in particular, for storing and manipulating positional encodings" (I can't follow the technical details but I'd imagine they are talking about precision in the value of [weights and biases](https://deepai.org/machine-learning-glossary-and-terms/weight-artificial-neural-network)?) and that "the Transformer with positional encodings and fixed precision is not Turing complete". This may also suggest that even with the assumption of arbitrary precision, in order to simulate an arbitrary computation one would need to precisely tune the weights/biases according to some specialized mathematical rule rather than using the normal practical training methods for transformer models involving learning using some large set of training data with a loss function. So I don't think the mathematical proof of universality should be taken to rule out the idea that if we are training both feedforward transformer architectures and some other type of recurrent net using the "usual, practical" training methods for each one, in these circumstances the transformer may be systematically bad at types of tasks the recurrent net is good at.
To be clear, "Turing complete" is a totally different concept from "passing the Turing test". "Turing complete" means it can compute anything that is computable. Strictly speaking, conventional computers using the von Neumann architecture are also not Turing complete, because they have a finite amount of RAM available. Turing completeness is an asymptotic concept that is never physically achievable in real life; every real computer is actually just a big finite state machine. The paper you're citing is old and I'd say that, since then, any questions about the feasibility of learning arbitrary algorithms with transformers have been laid to rest empirically. If you're curious about this then I recommend reading about things like "decision transformers" or "foundation models".
> To be clear, "Turing complete" is a totally different concept from "passing the Turing test". "Turing complete" means it can compute anything that is computable. Yes, I understand that. The first paragraph of my previous comment mentioned the idea of AI that can pass an "understanding-based Turing test" since you had previously objected to "humanlike AI" so I wanted to sharpen what I meant by that. But the subsequent paragraphs where I talked about Turing completeness of cellular automata and transformers had nothing to do with the Turing test. >since then, any questions about the feasibility of learning arbitrary algorithms with transformers have been laid to rest empirically. If you're curious about this then I recommend reading about things like "decision transformers" or "foundation models". Can you point to specific papers that show that feedforward nets can be taught to accurately emulate arbitrary algorithms using standard training methods? Even if true, this would still not fully address the practical questions I mentioned earlier about whether transformer models could be just as good as other architectures for getting to humanlike AI (as defined earlier) within a reasonably short timespan similar to the timespan for mind uploading. For example, even if a transformer model could be taught to emulate a recurrent net, one would have to show that doing so doesn't require any significant increase in computational resources (in terms of number of bits or number of elementary computational steps) relative to just simulating the same type of recurrent net more directly. If the transformer system would require say a million times more bits and/or steps then some other architecture B to get to a humanlike AI, then it seems reasonable to predict that the first humanlike AI would be a lot more likely to emerge through architecture B. Aside from computational resources, supervised learning can sometimes have the practical difficulty that it's difficult to come up with the right sort of labeled training set if humans (rather than AI that's already good at the task) have to do the initial labeling. I recently came across [this tweet thread](https://twitter.com/recursus/status/1560247372982231040) by AI researcher Daniel Bear discussing a new [neurobiology-inspired unsupervised model](https://neuroailab.github.io/eisen) to making an AI that can learn to visually recognize which elements of a scene are likely to move as a physical unit. [This tweet](https://twitter.com/recursus/status/1560247427533357062) on the thread links to [this paper](https://arxiv.org/pdf/2112.01698.pdf) on a previous unsupervised method which notes this as a practical difficulty with supervised methods: *Unlike humans, current state-of-the-art detection and segmentation methods [20,31,35,36,29] have difficulty recognizing novel objects as objects because these methods are designed with a closed-world assumption. Their training aims to localize known (annotated) objects while regarding unknown (unannotated) ob- jects as background. This causes the models to fail in locating novel objects and learning general objectness. One way to deal with this challenge is to create a dataset with an exhaustive annotation of every single object in each image. However, creating such datasets is very expensive.* Other tweets on the thread refer to architectural features of the model that Bear also seems to say are important parts of design--[this tweet](https://twitter.com/recursus/status/1560247379680497669) says 'Our key idea is a new *neural grouping primitive.*', [this one](https://twitter.com/recursus/status/1560247467416879104) says that neural grouping was implemented with a pair of recurrent neural nets, described as 'Kaleidoscopic Propagation, which passes excitatory and inhibitory messages on the affinity graph to create proto-object "plateaus" of high-D feature vectors' and 'Competition, which picks out well-formed plateaus and suppresses redundant segments', and the [subsequent tweet](https://twitter.com/recursus/status/1560247472840114176) notes that 'KProp+Comp is a *primitive* (and in fact doesn't require training.)' Then the [next tweet](https://twitter.com/recursus/status/1560247477961457665) specifically refers to this as an advantage in "architecture": *10/ Because of its neuro-inspired architecture and psych-inspired learning mechanism, EISEN can segment challenging realistic & real-world images without supervision.* *Strong baseline models struggle at this, likely because their architectures lack affinities + a grouping prior.* Do you think I am misunderstanding Bear here and that he would likely say it would be trivial to train a non-recurrent, supervised transformer system to do the same thing? If not, do you think Bear himself is failing to appreciate something basic about the consequences of proofs that transformer systems are Turing complete?
I agree that unsupervised learning is the key to making a "true AI". The only thing you really need supervision for is language translation, because a language is necessarily defined by the way that its speakers use it. Supervised vs unsupervised learning has nothing to with architecture design; you can use (e.g.) a transformer for either case. The difference between transformers and RNNs isn't ultimately that important except that transformers are probably easier to train. I'd caution you against being too impressed by the sales pitches that you see on twitter, or by fancy-sounding jargon that involves claims about biological inspiration. There are two things to note about this stuff. One is that academics have a strong incentive to market their research irrespective of how accurate that marketing is, and biological inspiration is a good marketing pitch. The other thing is that computer scientists often aren't very good at math and so they overcomplicate their explanations of their work, or maybe even misunderstand it altogether. They're fundamentally experimental scientists, not theorists. If you want a paper to look at about transformers being used in more generic ways, take a look at the decision transformer paper: https://arxiv.org/abs/2106.01345
This is solid stuff.

Somebody hasn’t read the Sequences.

In unrelated news, Yudkowsky was sitting right in front of me on the flight from SF to Seattle last night. True story.

[deleted]
Didn't notice.

down the thread, Yudkowsky says: “Sounds like a job for David Chalmers.” lol

It's unfair of him to not steelman Yud.

Chalmers is very careful and smart. I could never buy his zombie argument (nor his “conceivability entails possibility” supplement), but he’s very capable of dissecting an extremely complicated argument, even in hard sciences.

I wonder if Yud, an idiot, was at least being smart enough to know his BS can’t withstand scrutiny on that level.

they should do a group reading of https://www.thegreatcourses.com/courses/argumentation-the-study-of-effective-reasoning-2nd-edition. we should pay for it.

AGI is smarter than collective life by hypothesis. Life takes energy. AGI takes energy.

If AGI decides to compete with life for finite resources, then it will have a competitive advantage in its intelligence.

QED

Someone tell the idiot.

AGI is a benevolent god by hypothesis. A loving and benevolent god that does exist is better than a god that does not exist, therefor it must exist. QED. Anselm dealt with this a millennium ago, there’s no reason to fear.
Sorry I meant AGSI, definitionally it's smarter than collective humanity. Your bs is bs but good try.
It was just a riff on the ontological argument, but still more philosophically grounded than “AGI must kill all humans”.
I didnt say must smart guy
This whole post is literally about a twitter thread that begins with EY insisting that AGI must kill all humans, as he typically does. If you want to take a step back and simply argue that if we created and empowered an AGSI with godlike powers then that godlike AGSI would probably beat humanity in a fight then fine, but at that point I post the existence of an alternate AGSI created outside this solar system and intent on destroying us so our only hope is to create the AGSI anyhow. It is trivial to see that the number of human-destroying AGSI that could exist outside the solar system is far greater than the one we might create and so our only hope is to create our own AGSI and do our best to ensure that it seeks to protect us.
I dont click on twitters tracker heavy js bs so I just read the headline and offered the canonical argument requested
this poster has failed the vibe check (and posts just like this everywhere else too, so it's probably incurable). sorry about that, hope you find a good sub to post to!
its literally the canonical argument in the literature