There's a pretty good chance that all the AI content scrapers used to get "Training data" have ingested all the Epstein files

Darkard@lemmy.world · 3 days ago

There's a pretty good chance that all the AI content scrapers used to get "Training data" have ingested all the Epstein files

exaybachae@startrek.website · 3 days ago

Those files are kinda a nightmare to navigate in their bare state. And the datasets are huge. I doubt anyone training AI would allow them to go through knowingly, less it was specifically a police invesigation and case law focused AI that was designed to process and categorize that kind of data.

Most AI are designed for functional discussion and factual data processing. It’s not a great idea to just feed in random trash.

r00ty@kbin.life · 3 days ago

I had to use cloudflare to stop AI crawlers from using like 60% of my 16 core server that runs this instance. They were spending that much time pulling fediverse content, multiple bots without and wait time between requests. You really think they’d reject epstein files but seek out our combined output?

degenerate_neutron_matter@fedia.io · 3 days ago

They scrape data indiscriminately; I’m sure any Epstein files publicly accessible on the internet have been added to their databases. Perhaps they’d be filtered out before being used to train models but I’m skeptical they take that level of care with the data.

youcantreadthis@quokk.au · 3 days ago

not a good idea

Which would stop them.