Copilot AI calls journalist a child abuser, Microsoft tries to launder responsibility

David Gerard · edit-2 2 年前

Copilot AI calls journalist a child abuser, Microsoft tries to launder responsibility

V0ldek · 2 年前

I was thinking about this after reading the P(Dumb) post.

All normal ML applications have a notion of evalutaion, e.g. the 2x2 table of {false,true}x{positive,negative}, or for clustering algorithms some metric of “goodness of fit”. If you have that you can make an experiment that has quantifiable results, and then you can do actual science.

I don’t even know what the equivalent for LLMs is. I don’t really have time to spare to dig through the papers, but like, how do they do this? What’s their experimental evaluation? I don’t seen an easy way to classify LLM outputs into anything really.

The only way to do science is hypothesis->experiment->analysis. So how the fuck do the LLM people do this?

o7___o7 · edit-2 2 年前

Right? “AI” is great if you want to sort a few million images of galaxies into their various morphological classifications and have it done before the end of the decade. A++, good job, no notes.

You can’t grift off of that very easily, though.

self · 2 年前

I’d really like to know too, especially given how many times we’ve already seen LLMs misused in scientific settings. it’s starting to feel like the LLM people don’t have that notion — but that’s crazy, right?