Tag: benchmark
All the articles with the tag: "benchmark"
-
A New Bar for AI Coding Benchmarks: The Question FrontierCode Asks
Cognition's new benchmark FrontierCode asks not 'does it pass the tests' but 'would a maintainer merge this PR.' Why the top model scores just 13.4% on the Diamond tier, and the gap between code that runs and code that merges.