• @selfA
    link
    English
    4
    edit-2
    1 year ago

    1k stars, 690 lines of code, first commit 1 week ago, author barely has any GitHub activity

    there’s something very inorganic about these stats. also, there’s barely any code here at all, but boy look at what it does:

    Prompt engineering is kind of like alchemy.

    an obsession with creating homunculi and getting rich quick? a field that’s going to be discredited when folks realize the magnitude of its failure? practitioners are all suffering from psychosis due to lead poisoning? this comparison is doing so much more work than the author intended

    The real magic happens after the generation. The system tests each prompt against all the test cases, comparing their performance and ranking them using an ELO rating system.

    this is the basic shit you wrote docs and pushed to GitHub for? fuck it, maybe the ranking system is something special

    ELO Rating System: Each prompt starts with an ELO rating of 1200. As they compete against each other in generating responses to the test cases, their ELO ratings change based on their performance. This way, you can easily see which prompts are the most effective.

    nope, just a simple test runner that lazily ranks its results. this is the kind of approach you come up with, evaluate, and discard while doing mental engineering in the shower. it’s probably also spectacularly inefficient, so I’m glad LLM grifters are carrying forward the cryptobro tradition of setting fire to one rainforest for every bad algorithm they invent

    e: completely forgot to sneer at the lazy ranking system being stolen from chess for some reason too. what is it with grifters and the thin veneer of chess prestige?