• BlueMonday1984
    link
    fedilink
    English
    arrow-up
    13
    ·
    5 个月前

    Artificial intelligence and cheating/lying: two great tastes that go together

  • diz
    link
    fedilink
    English
    arrow-up
    8
    ·
    edit-2
    5 个月前

    When they tested on bugs not in SWE-Bench, the success rate dropped to 57‑71% on random items, and 50‑68% on fresh issues created after the benchmark snapshot. I’m surprised they did that well.

    After the benchmark snapshot. Could still be before LLM training data cut off, or available via RAG.

    edit: For a fair test you have to use git issues that had not been resolved yet by a human.

    This is how these fuckers talk, all of the time. Also see Sam Altman’s not-quite-denials of training on Scarlett Johansson’s voice: they just asserted that they had hired a voice actor, but didn’t deny training on actual Scarlett Johansson’s voice. edit: because anyone with half a brain knows that not only did they train on her actual voice, they probably gave it and their other pirated movie soundtracks massively higher weighting, just as they did for books and NYT articles.

    Anyhow, I fully expect that by now they just use everything they can to cheat benchmarks, up to and including RAG from solutions past the training dataset cut off date. With two of the paper authors being from Microsoft itself, expect that their “fresh issues” are gamed too.

  • abcdqfr@lemmy.worldBanned
    link
    fedilink
    English
    arrow-up
    1
    arrow-down
    14
    ·
    5 个月前

    I also likes to cheat on tests by studying every answer on the subject the test giver might put in the test??? We’ve got a computer than can study and pass tests, cmon. Where’s the real story?

    • selfA
      link
      fedilink
      English
      arrow-up
      22
      ·
      5 个月前

      it’s appropriate that you think your brain works like an LLM, because you regurgitated this shitty opinion from somewhere else without giving it any thought at all

      • diz
        link
        fedilink
        English
        arrow-up
        7
        ·
        5 个月前

        Yeah I’m thinking that people who think their brains work like LLM may be somewhat correct. Still wrong in some ways as even their brains learn from several orders of magnitude less data than LLMs do, but close enough.

    • YourNetworkIsHaunted
      link
      fedilink
      English
      arrow-up
      19
      ·
      5 个月前

      This isn’t studying possible questions, this is memorizing the answer key to the test and being able to identify that the answer to question 5 is “17” but not being able to actually answer it when they change the numbers slightly.

    • V0ldek
      link
      fedilink
      English
      arrow-up
      11
      ·
      5 个月前

      Hey mate what do you think learning is. Like genuinely, if you were to describe the process of learning a subject to me.

    • o7___o7
      link
      fedilink
      English
      arrow-up
      7
      ·
      5 个月前

      LLMs are seven or eight bipartite graphs in a trench coat. Is your brain seven neurons thick, because that would explain a few things.

    • Seminar2250
      link
      fedilink
      English
      arrow-up
      7
      ·
      edit-2
      5 个月前

      i have a potato that can study, send me your venmo if interested