Some Quick and Dirty Thoughts on Sabotaging AI Scrapers

BlueMonday1984 · 2 years ago

Some Quick and Dirty Thoughts on Sabotaging AI Scrapers

V0ldek · 2 years ago

How about honeypotting? What’s the chance the crawlers are written smart enough to avoid a neverending HTTP stream?

So this is an idea from SSH: you make a server that listens at port 22 and responds to any connections with a valid, but extremely long message slowly fed to the source byte by byte. Automated bots that look for open SSH ports or vulns get trapped there, and they have to keep consuming resources to service the connection.

Also what happens if you try to feed it an infinite HTML file very quickly? Like just spam the stream with <div><div><div>...?

BlueMonday1984 · 2 years ago

How about honeypotting? What’s the chance the crawlers are written smart enough to avoid a neverending HTTP stream?

Given the security record I mentioned earlier, their generally indiscriminate scraping and that one time John Levine tripped up OpenAI’s crawler, I suspect its pretty high.

David Gerard · 2 years ago

feed them LLM output, obviously