Frontier-CS 1.0 Release

We are releasing Frontier-CS 1.0, a major update to our open-ended Computer Science benchmark. This release expands Frontier-CS to 240 tasks across both the algorithmic and research tracks. We also introduce a new Elo-based leaderboard, along with full execution traces of model solutions to enable deeper analysis and reproducibility.

We’re excited to announce the release of Frontier-CS 1.0! Frontier-CS is an open-ended benchmark designed for evolving intelligence. It now comprises 240 unsolved problems (+94 since Dec 2025), spanning a diverse range of computer-science domains and are authored by CS PhDs and ICPC World Final–level experts.

Frontier-CS supports benchmarking frontier models, agent-based evaluation, algorithm evolution, post-training, and beyond - any setting where progress is measured by quantitative, fine-grained metrics that reflect solution quality, rather than binary pass/fail outcomes.

Why Frontier-CS?

By 2025, LLMs have largely saturated exam-style benchmarks in computer science. We now see near-perfect performance on standardized evaluations: ~80% on SWE-bench Verified, gold-medal–level results at the 2025 ICPC World Finals, and strong scores across many exam-style tasks. While useful, these benchmarks no longer capture meaningful progress in the development of foundational models and various modes of problem solving.

In light of that, Frontier-CS is motivated by two gaps. First, we need open-ended tasks that never saturate, so that progress can be continuously made and reflects long-horizon planning, code writing, and genuine skills in problem-solving—not binary pass/fail accuracy hacking. Second, with the rise of evolution and agentic frameworks (e.g., AlphaEvolve, ADRS, TTT-style discovery), we need a large-scale, verifiable benchmark that enables comprehensive comparison and provides training signals toward scalable agents for scientific discovery, including agentic RL.

So how do the models do?

Crafted by several ICPC World Finalists and CS PhDs, Frontier-CS has 240 problems which are open-ended, verifiable, and diverse (varying 7 research domains and 3 algorithmic engineering categories). Surprisingly, even though modern models have nearly aced traditional computer science benchmarks, they still struggle with the open-ended nature of Frontier-CS. On the algorithmic track, human expert solutions achieve an average score of 86.99, while the strongest model reaches only 33.12. This substantial gap remains far from closed. We’ll go into the details of our results in a later post, but for now check out our leaderboard.

Teaser Image

What’s New in Frontier-CS 1.0

Teaser Image

git clone https://github.com/FrontierCS/Frontier-CS.git
cd Frontier-CS
# Install dependencies (using uv, recommended)
uv sync
# Run the example solution (GPT-5 Thinking Solution)
frontier eval --algorithmic 0 algorithmic/problems/0/examples/gpt5.cpp

Getting in Touch

We are always looking for more problems to add for our next release and evaluating on more models and agents. We love to hear about your comments and feedback! Join us on discord or email us!

And if you find Frontier-CS useful, please consider giving us a ⭐ on GitHub!