Why wordfreq will not be updated - AI spam

David Gerard · 1 year ago

Why wordfreq will not be updated - AI spam

UnseriousAcademic · 1 year ago

Man I feel this, particularly the sudden shutting down of data access because all the platforms want OpenAI money. I spent three years building a tool that pulled follower relation data from Twitter and exponentially crawled it’s way outwards from a few seed accounts to millions of users. Using that data it was able to make a compressed summary network, identify community structures, give names to the communities based on words in user profiles, and then use sampled tweet data to tell us the extent to which different communities interacted.

I spent 8 months in ethics committees to get approval to do it, I got a prototype working, but rather than just publish I wanted to make it accessible to the academic community so I spent even more time building an interface, making it user friendly, improving performance, making it more stable etc.

I wanted to ensure that when we published our results I could also say “here is this method we’ve developed, and here you can test it and use it too for free, even if you don’t know how to code”. Some people at my institution wanted me to explore commercialising but I always intended to go open source. I’m not a professional developer by any means so the project was always going to be a janky academic thing, but it worked for our purposes and was a new way of working with social media data to ask questions that couldn’t be answered before.

Then the API got put behind a $48K a month paywall and the project was dead. Then everywhere else started shutting their doors too. I don’t do social media research anymore.

ahopefullycuterrobot · 1 year ago

After my own heart right here. I followed some version of Luca Hammer’s guide to categorise everyone I followed on Twitter into communities, then created rss feeds of them using nitter. It was fascinating seeing how they clustered together. I think I still have an old gephi file with that output. I did this before Musk bought Twitter, since I knew he was going to wreck it.

Basically, I would have killed for this tool.

(I’m now wondering if anyone’s published a guide on this for bluesky.)

YourNetworkIsHaunted · 1 year ago

I would wager that, more than the costs of serving these API calls, preserving the opacity of the resultant network is probably part of the advantage these companies get from locking down their APIs. Given how much flak they already get for the mental and social damage done by social media and Twitter specifically, I suspect they’re very happy to preserve as much of the black boxiness as they can so they can point to the value users get and their ad revenue and say that all the costs are unfortunate coincidents rather than central problems with the paradigm.

Soyweiser · 1 year ago

Well that really sucks, another project stopped because of all this bullshit. :(

David Gerard · 1 year ago

This was actually posted in June, but it became hot news yesterday for some reason

David Gerard · 1 year ago

did a pivot-to-ai too

Serinus@lemmy.world · 1 year ago

It’s an excellent read on a safe site. I appreciate OP doing the opposite of clickbait, but if this interests you at all, check it out.

froztbyte · 1 year ago

this post reads like an amazon product review

Soyweiser · 1 year ago

It reminded me of those blog reply spam comments.

Serinus@lemmy.world · 1 year ago

It’s a matter of time until AI starts misspelling everything to try to fit in better.

V0ldek · 1 year ago

but if this interests you at all, check it out.

ye that’s how most normal people use the internet? what’s the alternative strategy, checking it out if it doesn’t interest you?

froztbyte · 1 year ago

staring at it suspiciously with eyes narrowed, until it scampers

UnseriousAcademic · 1 year ago

To be fair I’ve spent an inordinate amount of time looking at stuff on the Internet that doesn’t interest me. Especially since my workplace moved their employee training online.

Serinus@lemmy.world · 1 year ago

The alternative is reading the headline and skipping the article.

The punchline is in the title, yes, but the article is still worth reading. Maybe I didn’t phrase that well.

David Gerard · 1 year ago

I try for clickbait that delivers

Why wordfreq will not be updated - AI spam

Why wordfreq will not be updated - AI spam

wordfreq/SUNSET.md at master · rspeer/wordfreq