BlueMonday1984 to

MoreWrite · 2 years ago

Some Quick and Dirty Thoughts on Sabotaging AI Scrapers

21

Some Quick and Dirty Thoughts on Sabotaging AI Scrapers

BlueMonday1984 to

MoreWrite · 2 years ago

(Gonna expand on a comment I whipped out yesterday - feel free to read it for more context)

At this point, its already well known AI bros are crawling up everyone’s ass and scraping whatever shit they can find - robots.txt, honesty and basic decency be damned.

The good news is that services have started popping up to actively cockblock AI bros’ digital smash-and-grabs - Cloudflare made waves when they began offering blocking services for their customers, but Spawning AI’s recently put out a beta for an auto-blocking service of their own called Kudurru.

(Sidenote: Pretty clever of them to call it Kudurru.)

I do feel like active anti-scraping measures could go somewhat further, though - the obvious route in my eyes would be to try to actively feed complete garbage to scrapers instead - whether by sticking a bunch of garbage on webpages to mislead scrapers or by trying to prompt inject the shit out of the AIs themselves.

The main advantage I can see is subtlety - it’ll be obvious to AI corps if their scrapers are given a 403 Forbidden and told to fuck off, but the chance of them noticing that their scrapers are getting fed complete bullshit isn’t that high - especially considering AI bros aren’t the brightest bulbs in the shed.

Arguably, AI art generators are already getting sabotaged this way to a strong extent - Glaze and Nightshade aside, ChatGPT et al’s slop-nami has provided a lot of opportunities for AI-generated garbage (text, music, art, etcetera) to get scraped and poison AI datasets in the process.

How effective this will be against the “summarise this shit for me” chatbots which inspired this high-length shitpost I’m not 100% sure, but between one proven case of prompt injection and AI’s dogshit security record, I expect effectiveness will be pretty high.

Chat

BlueMonday1984OP
link
fedilink
arrow-up
3·
2 years ago

How about honeypotting? What’s the chance the crawlers are written smart enough to avoid a neverending HTTP stream?

Given the security record I mentioned earlier, their generally indiscriminate scraping and that one time John Levine tripped up OpenAI’s crawler, I suspect its pretty high.

MoreWrite

You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !morewrite@awful.systems

post bits of your writing and links to stuff you’ve written here for constructive criticism.

if you post anything here try to specify what kind of feedback you would like. For example, are you looking for a critique of your assertions, creative feedback, or an unbiased editorial review?

if OP specifies what kind of feedback they’d like, please respect it. If they don’t specify, don’t take it as an invite to debate the semantics of what they are writing about. Honest feedback isn’t required to be nice, but don’t be an asshole.

Visibility: Public

This community can be federated to other instances and be posted/commented in by their users.

7 users / day
15 users / week
27 users / month
46 users / 6 months
42 local subscribers
171 subscribers
47 Posts
455 Comments
Modlog

mods:
self
Steve