FOSS infrastructure is under attack by AI companies

@simple@lemm.ee · 4 days ago

FOSS infrastructure is under attack by AI companies

@daq@lemmy.sdf.org · 3 days ago

I’m not sure how they actually implemented it, but you can easily block ML crawlers via cloud flare. Isn’t just about every small site/service behind CF anyway?

@grysbok@lemmy.sdf.org · 3 days ago

Last I checked, cloudflare requires the user to have JavaScript and cookies enabled. My institution doesn’t want to require those because it would likely impact legitimate users as well as bots.

@daq@lemmy.sdf.org · 3 days ago

Huh? I can reach my site via curl that has neither. How did you come up with this random set of requirements?

@grysbok@lemmy.sdf.org · 2 days ago

Odd. I just tried

curl https://www.scrapingcourse.com/cloudflare-challenge

and got

Enable JavaScript and cookies to continue

I’m clearly not on the same setup as you are, but my off-the-cuff guess is that your curl command was issued from a system that cloudflare already recognized (IP whitelist, cookies, I dunno).

Anyways, I’m reading through this blog post on using cURL with cloudflare-protected sites and I’m finding it interesting.

@daq@lemmy.sdf.org · 2 days ago

Of course their challenge requires those things. How else could they implement it? Most users will never be presented with a challenge though and it is trivial to disable if you don’t want to ever challenge anyone. I was just saying CF blocks ML crawlers.