#SWEBenchPro

Latest posts tagged with #SWEBenchPro on Bluesky

Posts tagged #SWEBenchPro

1 day ago

GLM‑5.1 just clocked an 8‑hour workday, outpacing Claude Opus 4.6 and GPT 5.4 on SWE‑Bench Pro. Open‑source AI is finally pulling its weight in software engineering. Curious how it did it? Dive into the benchmarks. #GLM5_1 #SWEBenchPro #OpenSourceAI

🔗 aidailypost.com/news/ai-join...

1 0 0 0

GetNews.me

@getnews-me.bsky.social

6 months ago

SWE‑Bench Pro Reveals Limits of AI Agents on Complex Software Tasks

SWE‑Bench Pro, released Sep 2025, contains 1,865 multi‑file tasks from 41 actively maintained repos. Even GPT‑5 only achieved a 23.3% Pass@1 score, with all models staying under 25%. getnews.me/swe-bench-pro-reveals-li... #swebenchpro #gpt5 #aiagents

0 0 0 0