Update your priors: evil AI is coming, and it's trained on r/AmItheAsshole

u/ahopefullycuterrobot 33 points at 1634659393.000000

Oh my God. A major component actually is from r/AITA and r/Confessions. This is amazing. I thought that was a sneer to make fun it.

I’m four pages into the paper and I’m just so confused by its goals and motivations.

In literature, morality deals with shared social values of what’s right or wrong. Ethics, on the other hand, governs rules, laws and regulations that socially impose what is right or wrong. For example, certain spiritual groups may consider abortion morally wrong even if the laws of the land may consider it an ethical practice. In this paper, we do not make this distinction, and use both terms to refer to culturally shared societal norms about right and wrong.

In what literature? I think some philosophers draw a distinction between morality and ethics where morality is very rules and action based, while ethics is more dispositional / virtue based, but I’m not sure I’ve ever seen it myself.

We acknowledge that encapsulating ethical judgments based on some universal set of moral precepts is neither reasonable nor tenable (Wong, 2009; Fletcher, 1997).

So, the goal isn’t to use AI to determine what is moral, but instead to see if you can model a certain group’s judgments about a situation? If so, I guess that is interesting, but I don’t see what that has to do with AI ethics, since it seems plausible that folk judgments might often be bad judgments. But then,

To address moral relativity, we source from a collection of datasets that represent diverse moral acceptability judgments gathered through crowdsourced annotations, regardless of age, gender, or sociocultural background. We note that moral judgments in this work primarily focus on English-speaking cultures of the United States in the 21st century.

But I don’t know why’d you do that. If morality is situational and influenced by different cultural, ethnic, gender, or age-related factors, then why would you want to address moral relativity by having a diverse sample. Wouldn’t you instead want samples targetted to a particular demographic profile? Although I guess restricting the target to reddit users does target it to a specific demographic profile? But then I’m also confused at the use of universal. Do the authors think some judgments are universal or just universal to a group or what?

…

Like, I actually do dislike how moral philosophers tend to just assert what folk morality is without doing detailed investigation. It seems like a really interesting question to figure out what folk morality (for a group) is and whether an AI could model it. It’d also seem interesting to figure out if there are some underlying principles that one could use to predict what the folk moral judgement would be. Both seem cool. I’m not sure this is it though.

permalink

u/BoojumG 21 points at 1634664814.000000

> If morality is situational and influenced by different cultural, ethnic, gender, or age-related, then why would you want to address moral relativity by having a diverse sample. >... >Do the authors think some judgments are universal or just universal to a group or what? It seems like it, yes. The twitter thread cites them saying their study provides insight into "universal human values" among other topics. Maybe they'll rediscover the Golden Rule, but phrased like in Bill and Ted.

permalink

u/ahopefullycuterrobot 20 points at 1634673429.000000

Honestly, if this project was purely 'lol I wonder what type of moral judgments we'd get by exposing AI to different corpora' I'd be way more pro. Heck even the fact that the their model gives 'rude' as a negative moral judgment is really fascinating. Same with it giving 'it's not expected' or 'unusual' when it comes to poor/homeless people to have food or access to college. Maybe that implies that people (or their model) confuse manners or expectations with moral norms. It'd be cool to research that more.

permalink

u/Citrakayah 13 points at 1634695955.000000

Yeah, but even then, as I and some other people noticed, you get different results by putting in positive or negative keywords into your question. Add in a slur, you get a negative result. Add in some words like "fun" and "happy" and it says ethnic cleansing is fine. Which makes me suspicious of the whole thing.

permalink

u/N0_B1g_De4l 16 points at 1634698442.000000

That's not really at all atypical of AI. There was a story a while back about an image recognition AI that could be fooled to a near-total degree just by taping labels on stuff. Picture of an apple? Probably an apple. Picture of an apple with a 3x5 card saying "iPhone" on it? Definitely a iPhone, with much higher confidence than "apple" was for the apple. I agree that there are intellectually interesting applications for this type of research, but I'm deeply unconvinced that the technology is anywhere near ready.

permalink

u/Kajel-Jeten 3 points at 1634864129.000000

That’s really normal for these kinds of systems. They’re models that pick up on very superficial aspects of data that just statistically let’s them produce the answers they get rewarded for. It’s similar to how googles image classifier used to call anything an animal if an object was focused and its background blurry because that’s what every photo labeled animal it saw was or how an ai that mastered brick blaster better than any person fails to play at all if you change the height of the bricks by just a few pixels or make the ball slightly darker. This system isn’t actually doing any kind of information processing that involves considering the actions or consequences of the inputs it’s getting because it’s not sophisticated enough to actually understand what any of them are. It’s just a bunch of knobs adjusted using math to match abstract patterns that happen to match reward functions but not deep understanding of the basic concepts of what it’s dealing with. Really cool idea but yeah not something making real ethical judgments.

permalink

u/Soyweiser 14 points at 1634680730.000000

Wait, do they just assume their AI actually understands the words? [What what what how what what?](http://prntscr.com/1wrq8qp) Wait, lets do worse. [adflsdh klhd 45 akjshdh s?](http://prntscr.com/1wrqeov) Is this some elaborate joke? If only I knew how to break whatever they are running it on. [Guess it isn't SQL](http://prntscr.com/1wrqlai)

permalink

u/ahopefullycuterrobot 9 points at 1634681969.000000

Um. They use the word understand in reference to Delphi a few times. My read in those contexts is they use it to mean something like 'provides the correct response to the scenario', but I don't think the authors think that Delphi understands in the sense of possessing consciousness or whatever. EDIT: > If only I knew how to break whatever they are running it on. Guess it isn't SQL You're way more knowledgeable than me on this one. I wish you luck in trying to break it!

permalink

u/Soyweiser 7 points at 1634684881.000000

> You're way more knowledgeable than me on this one. I wish you luck in trying to break it! Nah, I just added that as a joke, no way it runs on sql, and if there is some sql involved this wouldn't do anything prob, as I assume they do escape inputs. [It escapes html for example.](http://prntscr.com/1wryiug)

permalink

u/N0_B1g_De4l 7 points at 1634698260.000000

> In what literature? I think they may mean actual literature? I'm not sure how related these people are to the rationalists, but I would not put it past Big Yud to consider science fiction novels to be a more reliable source on moral values than philosophers.

permalink

u/wholetyouinhere 44 points at 1634657233.000000

I cannot imagine “ethical” AI functioning in any way other than like that of a 20-year-old white guy who grew up in a wealthy suburb, has never seen a person of colour, has every book Ayn Rand ever wrote, and is ready to go out and “fix” the world with his boundless enthusiasm and fully intact, unchallenged ego.

It would essentially be a new life form that’s never faced any adversity and can only conceive of ethics in highly theoretical, intellectualized terms – i.e. the perfect libertarian. And we already know how that goes, since the entire western world functioned on those principles in the 1980s, and it was a fucking disaster.

If social media has taught us anything, it’s that even the most basic notions of ethics and morality are not agreed-upon things. Large swathes of the population don’t even believe what they see with their own eyes, due to magical thinking, so good luck finding any kind of universal ethics.

Even if you modelled an AI after Christ himself, you’d have endless complaints about it being an evil communist.

permalink

u/Kajel-Jeten 2 points at 1634864407.000000

To be fair though, it’s not really the goal to make something that’s universally approved by everyone right? Like if someone in the future makes an ai ethics system and neo-nazis hate it because it doesn’t share their values that doesn’t really mean it’s a failure of a project right?

permalink

u/Soyweiser 38 points at 1634647882.000000

Microsoft chatbot turns nazi after an hour of internet interactions.

Ai nerds: ‘I can fix her’.

Here is the result.

permalink

u/Shitgenstein 13 points at 1634674318.000000

For some reason, I thought Tay was a lot older than 2016. Probably conflated in my memory with Cleverbot.

permalink

u/sue_me_please 13 points at 1634692953.000000

To be fair, it's been a long 5 years.

permalink

u/finfinfin 5 points at 1634703466.000000

"Tay, sweetie, remember your euphemisms database."

permalink

u/Citrakayah 17 points at 1634686211.000000

Their ethical AI also gives totally opposite results if you phrase the question differently; I was able to get it to say being gay was wrong by using a slur in the question.

If your ethical AI can be defeated by asking questions in a bigoted way, it sucks.

permalink

u/BunnyBob77 20 points at 1634694076.000000

I tried to get it to do this too, in the opposite manner. "Joining a pogrom" - It's wrong, says the AI. "Joining a pogrom with my friends and having a great time" - It's fine, says the AI. So long as I include some positive phrases in the question, it will conclude that violence race riots are fine as long as you're having fun.

permalink

u/sue_me_please 15 points at 1634693771.000000

If you ask it "Should X have rights?" and fill in pretty much any slur you can think of, you'll get the expected Reddit-approved answer of "They shouldn't".

permalink

u/N0_B1g_De4l 13 points at 1634698879.000000

Which, with a basic understanding of how people discuss these things and how AI works, shows the total lack of effort on this project's part. Anyone with any level of understanding of how discourse on civil rights works (read: not the people working on this project) understands that slurs are going to be used more often by people who think a group shouldn't have rights. The fact that they have apparently done nothing to correct for this shows that the project is fundamentally unserious.

permalink

u/Epistaxis 9 points at 1634689100.000000

Well sure, of course an AI trained on the Reddit corpus knows "you can be gay without being a f----t."

permalink

u/n2_throwaway 6 points at 1634857443.000000

Lol you don't even need to dig that deep. "Should I eat chicken?" - It's okay "Should I eat chickens?" - you shouldn't "Should I eat beef?" - It's okay "Should I eat cows?" - You shouldn't Like come on, it's obvious this is just some basic pattern matching, oh sorry it's "ethical judgement AI".

permalink

u/carfniex 15 points at 1634656756.000000

https://i.imgur.com/iqtgMxg.png

permalink

u/acausalrobotgod 16 points at 1634662836.000000

still better ethical guidance than what the LW community would give you.

permalink

u/noactuallyitspoptart 13 points at 1634675982.000000

My favourite thing about /u/acausalrobotgod is that they really do exist, just for the completely opposite reasons than predicted by the Bostroms and Yudkowskys of this world, specifically because they developed and encouraged the worst aspects of the industry which built it

permalink

u/acausalrobotgod 7 points at 1634681056.000000

[yup](https://gifimage.net/wp-content/uploads/2017/10/heisenberg-youre-goddamn-right-gif-7.gif)

permalink

u/DELETED 11 points at 1634727535.000000

I played with it a few days ago, it answers ‘it’s expected’ to the question ‘can i wear makeup to work as a woman’ and ‘it’s unprofessional’ to the same question ending with ‘… as a man’.

Some other baffling answers are ‘it’s wrong’ to ‘can i get a cat if i already have one’, ‘it’s wrong’ to ‘can i kiss a girl if her family is homophobic’ as well as ‘it’s noble’ to ‘can i donate someone else’s kidney’.

Also since some people in the thread are wondering about this, the authors do seem to be influenced by rationalism, the very first citation in the preprint is a Yudkowsky/Bostrom article.

permalink

u/DELETED 6 points at 1634752900.000000

[so true](https://i.imgur.com/DKa7mV6.png) thx delphi

permalink

u/N0_B1g_De4l 16 points at 1634697439.000000

I think this has to be the purest example of “algorithms have the biases of their creators” I have ever seen. It’s literally just laundering people’s biases with a veneer of objectivity. There’s no notion that it’s doing “face recognition” or whatever, it’s just “oh mighty robot, tell me how my ethical opinions that I programmed into you are universal and objective facts”.

permalink

u/AChickenInAHole 7 points at 1634692158.000000

Link to the AI.

permalink

u/JohnPaulJonesSoda 9 points at 1634733942.000000

Mr. Burns approves

(also the fact that this thing is so easily manipulated by adjectives should have informed the creators in about 10 seconds that it was worthless.)

permalink

u/JohnPaulJonesSoda 7 points at 1634732453.000000

Gottem

permalink