________ [https://openreview.net/pdf?id=mdA5lVvNcU] - 2026-02-20 21:16:50 - public:mzimmerm ai, benchmark, test - 3 | id:1538354 -
APEX-Agents: AI Productivity Index for Agents [https://www.mercor.com/apex/apex-agents-leaderboard/] - 2026-02-20 11:47:18 - public:mzimmerm agent, invest, finance, benchmark, apex, ai - 6 | id:1538348 - Mercor is a company that created APEX benchmark for AI models. The benchmark is concentrated on finance.
CodeClash [https://codeclash.ai/] - 2025-12-12 15:53:15 - public:mzimmerm benchmark, code, good, vibe - 4 | id:1536664 - Compares writing code, rather than other benchmarks which do mostly git patches.
SWE-bench Leaderboards [https://www.swebench.com/] - 2025-12-12 11:40:44 - public:mzimmerm benchmark, code, passmark, vibe, ai - 5 | id:1536663 - Benchmark of AI coding models