Not just a problem for open source, surely? The answer is to use AI to scan contributions for suspicious patterns, no?
And then when those AI also have issues do we use the AI to check the AI for the AI?
Its turtles all the way down.
there’s already a whole swathe of static analysis tools that are used for these purposes (e.g. Sonarqube, GH code scanning). of course their viability and costs affect who can and does utilise them. whether or not they utilise LLMs I do not know (but I’m guessing probably yes).
This is a huge reason to support open source LLM development, and training projects to specialize them to cybersecurity.
There will be an ever growing divide between those who pay for the latest automated code review services and those who don’t, unless the open source side keeps up.
Honestly this might be the most important open source AI application of all, and from what I can tell it seems to be falling behind.
The xz attack required years of patient work to build Jia Tan’s credibility through hundreds of legitimate patches. These [LLM] tools can now generate those patches automatically, creating convincing contribution histories across multiple projects at once.
I don’t know, but maybe the current hype could have the opposite effect: if you try to flood many projects with AI-generated patches, you’ll be marked as AI-slopper and blocked from the projects, rather than become a trusted contributor? (OK, assuming a nation-state-like powerful adversary, it can probably do the real job of checking the AI output diligently in order not to be detected as AI spamming, while still getting some advantage of it, unlike those hordes of need-famous-open-source-contribution-in-my-CV who just copypaste the first nonsense which came out of AI into a PR.)
the only solution is to refuse AI commits, but you need to find out that an LLM was used…







