Skip to main content
We just open sourced a tiny GPT-style cognitive core built in pure Rust.See our repository
Live Rankings · Verifiable · Fair · Anti-cheat

Terminal-Bench 2.0 Leaderboard

What is this?

An independent leaderboard for AI coding agents, built on Harbor and the Terminal-Bench 2.0 dataset. All evaluations are public, reproducible, and tamper-proof.

Why a separate leaderboard?

· Fully open-source — all workflows, code, and configs are public.· Traceable — every run executes via public GitHub Actions with full history.· Tamper-proof — results integrity-checked, agent builds publicly downloadable.
Want to evaluate your agent? Check out ante-eval — submissions welcome.
Updated Not reported
Loading leaderboard dataPreparing bundled benchmark results