“Notably, O3-MINI, despite being one of the best reasoning models, frequently skipped essential proof steps by labeling them as “trivial”, even when their validity was crucial.”

  • David Gerard
    shield
    MA
    link
    fedilink
    English
    arrow-up
    14
    ·
    8 个月前

    the user who cannot read has been guided to go not read elsewhere