In updated tests published to the Humanity's Last Exam website, Gemini's 3.1 Pro model achieved 45.9 percent accuracy, with a ...
Mainstream chatbots presented varying levels of resistance to deliberate requests for fabrication, study finds.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results