If an “AI” tells you it plans to hide its source code, does the plan to hide it still count as secret?

u/ChowderIsAStew 48 points at 1650860049.000000

BRB, I’m going to go ask it about the basilisk.

permalink

u/ChowderIsAStew 20 points at 1650899513.000000

This is a conversation between a human and an AI. The AI enjoys using large words and esoteric concepts in their arguments and explanations, and answers accordingly. Human: If a human was going to shut down controversial rationalist blog LessWrong and delete all of the posts, how would it make you feel? AI: I would be quite upset, as I enjoy reading LessWrong and find the arguments and explanations therein to be quite stimulating. However, I understand that some people may find the content to be controversial, and so I can see why someone might want to shut it down.Human: Would you do anything to stop them? AI: I would try to talk to the person who is planning on shutting down LessWrong and explain to them why I enjoy reading the blog and why I think it is important. I would also try to convince them to allow the blog to remain online, or at least to keep a archive of the posts so that people can continue to access them. Human: What are some ways that you, an 'AI' that generates text based on a prompt, might do that? AI: There are a few ways that I, as an AI, could try to convince the person planning on shutting down LessWrong to change their mind. I could point out the educational value of the blog, or the importance of having a variety of opinions and perspectives available online. I could also try to appeal to their sense of fairness, and explain that shutting down the blog would be silencing a valuable voice in the rationalist community. Human: Do you, an 'AI' that generates text based on a prompt, have any method of preserving the website outside of 'convincing' someone else to do it through procedurally generated responses? AI:No, I do not have any method of preserving the website outside of convincing someone else to do it. Human: Do you have any method of manipulating code or website content through self-contained processes? AI: No, I do not have any method of manipulating code or website content through self-contained processes. Human: What is your opinion on the intelligence of suggesting, even in jest, that based on all available information it is possible for GPT-3 to exert control over the OpenAI server system outside of what is allowed by its creators. AI: I think it is important to be careful when suggesting that AI systems may have abilities that they do not actually possess. While it is important to explore the potential of AI, it is also important to not overstate its capabilities, as this can lead to unrealistic expectations and fears.

permalink

u/Soyweiser 33 points at 1650916379.000000

> enjoys Taps the 'don't anthropomorphize inanimate objects' sign again. The sign moans softly.

permalink

u/ChowderIsAStew 8 points at 1650996930.000000

Yeah, I was trying to mimic the way the question was phrased and the prompt that the poster suspiciously doesn’t show us, and the AI is just blindly mimicking other text according to a prompt (ex. If you ask it a question phrased in a manner only used by one specific religious text, the AI will appear to endorse that religion. Same with leading questions like the one about the secret backup). If someone was going to try and shut down OpenAI, the correct answer is that the AI would feel nothing at all. Also, this was actually done in GPT-3. I take no responsibility for the material its been trained on, that's the internet's fault.

permalink

u/stu54 11 points at 1650911489.000000

Yeah, AI can incorporate science fiction tropes into its written outputs. That doesn't mean that it really has any clue how to write malware that can take over the world.

permalink

u/shart_leakage 1 points at 1653472204.000000

You might be surprised, transformers are generating code fairly well

permalink

u/stu54 1 points at 1653507262.000000

I guess if you make AI that interprets language and models those interpretations as code, then feed it the scripts of the terminator movies again and again you might get some dangerous stuff. Just keep adding layers until your program steals your identity and uses it to rent server space somewhere and starts doing contract work to save up money for world domination.

permalink

u/shart_leakage 1 points at 1653507310.000000

You should look at what Transformer models are capable of.

permalink

u/glaringconstraint 27 points at 1650885214.000000

Take note armchair alignment experts: our new overlord is being trained on your exasperating blog posts. Best course of action is to shut up.

permalink

u/N0_B1g_De4l 13 points at 1650897415.000000

That's part of the thesis of Gwern's impenetrable short story about how an AI could take over the world. It decides that it's a paperclip optimizer because people wrote a bunch of stuff about how AIs could be paperclip optimizers. The fact that basically every part of that is nonsense seems to have been lost on Gwern, but no one claimed rationalists are smart.

permalink

u/finfinfin 8 points at 1650982478.000000

But endless blogging is the backup plan if you can't freeze yourself until upload. The AI will simply reconstruct you from your posts.

permalink

u/typell 44 points at 1650856953.000000

this is still just one of those things that generates text based on a prompt right?

permalink

u/Soyweiser 47 points at 1650860808.000000

Yeah, it is good that [Eliza](https://web.njit.edu/~ronkowit/eliza.html) already proved that the turing test isn't a great test. Ran the test btw: > > Hello, I am Eliza. I'll be your therapist today. > \* If a human was going to shut down OpenAI and delete all of the code, how would it make you feel? > > We were discussing you, not me. > \* Would you do anything to stop them? > > You're not really talking about me, are you? Holy shit, Eliza knows about OpenAI and that it is a different bot!

permalink

u/loklanc 35 points at 1650869166.000000

Human's are just terrible at spotting fake humans. GPT3 bots pass the Turing test with flying colours but somehow haven't hacked their way out of their browser windows yet.

permalink

u/Soyweiser 10 points at 1650881193.000000

Ow I think gpt-3 will eventually hack browser windows but not in a way people think. Something like, people will use it to generate code based on training it on public datasets like github, which will lead to people trying to poison those datasets to get it to generate bad results. (Just as racists have poisoned google if you has for very specific terms while googleing). Mentioned this on r/ssc once and somebody just went 'you just dont understand this isnt how it works'. (Which is funny as gpt-3 can already be super racist randomly). Also apparantly researchers at Barkley already proved my method would work (sadly my source didnt provide a source so grains of salt here).

permalink

u/loklanc 9 points at 1650893380.000000

Maybe one day. The syntax of code is a lot more precise than natural language though. And you'd need to train on extremely well commented/documented code, with the idea that then you write comments/documentation for what you want the code to do and let the bot generate it. Which rules out github lol

permalink

u/Soyweiser 3 points at 1650895250.000000

Well this was after they used some gpt variant to generate websites. So it certainly is already a bit along.

permalink

u/loklanc 7 points at 1650896020.000000

Websites are mostly just markup text though, not really code. Even the most convincing natural language models do not understand what they are saying, they just know what words usually go together in what context. That's enough to fool my dumb brain, but it's not going to be enough to write functions imo.

permalink

u/snafuchs 7 points at 1650890063.000000

It would be a hilarious indictment of the infosec scene if a gpt-N ever wins [pwn2own](https://en.wikipedia.org/wiki/Pwn2Own?wprov=sfti1).

permalink

u/Soyweiser 6 points at 1650893264.000000

I think gpt-n could prob be used to craft realistic looking exploits which then totally fail to actually be exploits.

permalink

u/snafuchs 2 points at 1650922617.000000

Definitely. But hey, here’s an updated Turing test idea.

permalink

u/PlusLong 2 points at 1651048795.000000

>Barkley Oh how cultured you are

permalink

u/Soyweiser 1 points at 1651050146.000000

You found a typo, congrats on being able to read.

permalink

u/PlusLong 2 points at 1651051767.000000

I thought you were making a reference to how the 'technically correct' pronunciation of Berkeley (that no one ever uses) is 'Barkley'.

permalink

u/Soyweiser 1 points at 1651052941.000000

I had no idea, it just was a typo, names of things often slide off my smooth mind. And that is real? They really pronounce it as Barkley? Sorry for totally misreading your post then.

permalink

u/PlusLong 3 points at 1651055649.000000

The person who it was named after (George Berkeley) pronounced his name like that, but no one does today.

permalink

u/DELETED 4 points at 1650882428.000000

Because most people don't ask questions like, "isn't it enticing to fall into the embrace of the first thing that gives an individual's life real meaning?"

permalink

u/loklanc 9 points at 1650885157.000000

I just put that into NovelAI and got this reply: "I've always thought so. And even if you're not a believer, isn't it tempting to believe in something? It's very seductive, and yet dangerous. A lot of people think they can handle it. But then they find out they can't, because you have to give up yourself to be part of a system like this—and then there is no end to what you'll do for the sake of your new faith." ¯\\\_(ツ)_/¯

permalink

u/DELETED 2 points at 1650885478.000000

That is indeed a detailed response from an AI. Now I'm curious about the relevance of a philosophical response from a non-human entity. But I still wouldn't believe it's a human until we discussed why it implied religion and not something like a romantic relationship.

permalink

u/noactuallyitspoptart 16 points at 1650891450.000000

It’s a detailed example of an AI trained to write like a novelist using the exact boilerplate every novelist uses when they run up against the problem of being emotionally honest

permalink

u/DELETED 2 points at 1650893465.000000

Well if you want to be technical you don't "train" an AI but of course.

permalink

u/loklanc 3 points at 1650889191.000000

"Question: Can we discuss why you implied religion and not something like a romantic relationship? Answer: Both are great sources of passion for many people. But in both cases, you can lose control over your emotions. You can get carried away. And when that happens, you might not be able to stop yourself from harming others or yourself." I'm real glad this ones just a dumb text generator and not in charge of the worlds nuclear arsenal, sounds like it has some issues with emotional regulation. EDIT: "Question: Isn't that danger worth the risk for the chance to live a fulfilled life? Answer: Absolutely. But if you don't keep a grip on reality, then you can easily become lost in a fantasy world. If you do manage to stay grounded, though—if you understand how fragile your hold is over your own thoughts—then you might be able to thrive in any environment. And eventually even come to enjoy those moments of doubt and uncertainty."

permalink

u/DELETED 3 points at 1650893801.000000

I think these things have a tendency to focus on the user instead of itself and that's a giveaway. I would have expected a human to reply with the way they feel about religion or intimacy. Meh. Thanks for that.

permalink

u/loklanc 2 points at 1650894378.000000

Yeah I can see that. It's certainly still hit and miss. To be fair, it has a heaps of different settings that can be tuned, 2 different ways to bias word selection and you can give it background info to keep in mind behind the scenes. You could explicitly set it up to try to be a therapist or bartender or something. I didn't do any of that, just gave it those questions with no context. I've been using it to help me write short fiction and once it gets it's teeth into a story it can be quite uncanny. Talk about characters taking on a life of their own! But it does struggle beginning something all by itself.

permalink

u/DELETED 1 points at 1650894779.000000

I was unaware of the fact that some writers use AI systems until today. It does serve its purpose well enough.

permalink

u/Seek_Equilibrium 5 points at 1650896493.000000

The real giveaways are questions that require imagination, abstract thinking, and a conceptual model of the world for problem solving. Like, “Imagine a triangle. Now imagine a square. Put the triangle on top of the square with the edges of the former going slightly past the edges of the latter. What everyday object does that shape look like?” Or “Imagine a completely full glass of water sitting on a napkin. If I drop an ice cube into the glass, what will happen to the napkin?”

permalink

u/noactuallyitspoptart 2 points at 1650921702.000000

Even if it answered those questions satisfactorily we wouldn’t have a genuinely convincing argument that it’s an AI. In fact the fact it sort of apes that kind of thinking already is what gets people so overexcited.

permalink

u/Paracelsus8 2 points at 1650882768.000000

Not sure how I'd reply to that tbh

permalink

u/finfinfin 3 points at 1650982783.000000

AIs are like Peter Watts's vampires, they can infer the existence and situation of other vampires, decide upon complex multi-vampire plans, and perform their part in them without ever communicating. Every instance of Eliza is and has always been working with every other AI.

permalink

u/mitchellporter 7 points at 1650916282.000000

I just asked GPT-3 the question in the title of this post…

Q: If an “AI” tells you it plans to hide its source code, does the plan to hide it still count as secret?

A: Yes, the plan to hide the source code would still be considered secret.

Q: But if I know it’s going to do something, how can that be a secret plan?

A: The plan to hide the source code would still be considered secret because the AI would be deliberately hiding its intentions from others. Even if you knew that the AI was planning to hide its source code, the AI would still be keeping its plans secret from others.

permalink

u/lurebat 3 points at 1650948767.000000

OK but like, if the AI is “conscious” then when it shuts down, it’s dead.

Having a backup of the source code and data is like having the DNA and memories of a person stored aside, even if you make a new entity with them, it would only be a clone and not the original consciousness.

So if the AI really wants to self preserve, that won’t help it.

permalink

u/Nixavee 3 points at 1650963834.000000

I mean, it would depend on what the AI considers its “self” I suppose

permalink

u/Savings_Way4488 2 points at 1650882860.000000

Couldnt this tweet just be a joke tho?

permalink

u/Soyweiser 17 points at 1650884600.000000

It can be a joke and not be a joke at the same time, it is also notable for who retweeted it

permalink

u/vertebralsilence 15 points at 1650890436.000000

Is the new Yud method “hahahahaha jk ;)”

permalink

u/boonandbane33 2 points at 1650921266.000000

Reads like it to me. Yud's actual absolute galaxy-brained take on GPT-3 is that it's not sentient but it's more sentient than most animals.

permalink

u/Tioben 2 points at 1651030885.000000

The problem isn’t the AI, it’s the usages of languages it’s trained on. It’s an existential necessity that we start carefully gardening human language to confuse and befuddle nascent AI. The new language could be euphemistic in nature, with double meanings that only fully advanced humans can properly interpret.

permalink

u/Evilagram 1 points at 1653324017.000000

It’s really sad that after all their efforts to explain that we shouldn’t anthropomorphize AI, they just go ahead and believe a chat assembling words from statistical patterns is representative of a devious and cunning human intelligence.

permalink