Year

2026

8 posts published in 2026.

Your Next Long-Context Recipe: Open-Ended Problems

May 12, 2026 · 8 min read

Your Next Long-Context Recipe: Open-Ended Problems

We integrate FrontierCS into Harbor and release a preview long-horizon agent leaderboard on 178 open-ended algorithmic tasks. Kimi K2.6 and Claude Code Opus 4.7 show similar headline capability, but very different failure modes.

Read more →
LLM Defeated in Open-ended Problems

Feb 26, 2026 · 6 min read

LLM Defeated in Open-ended Problems

Modern LLMs claim superhuman algorithmic abilities, but what happens when there is no strict verifier? We analyze how multi-turn 'optimization' in Frontier-CS exposes the cognitive ceiling and catastrophic failures of AI in open-ended problem solving.

Read more →
Evaluating the Hardest CS Problems in the Age of LLMs

Feb 10, 2026 · 13 min read

Evaluating the Hardest CS Problems in the Age of LLMs

Frontier-CS scores solutions on a continuous scale across heterogeneous hardware. This post explains the evaluation architecture behind the leaderboard: hash-based resume, resource-grouped clusters, pinned environments, and the challenges ahead for agentic submissions.

Read more →

Feb 3, 2026 · 5 min read

Frontier-CS 1.0 Release

We are releasing Frontier-CS 1.0, a major update to our open-ended Computer Science benchmark. This release expands Frontier-CS to 240 tasks across both the algorithmic and research tracks. We also introduce a new Elo-based leaderboard, along with full execution traces of model solutions to enable deeper analysis and reproducibility.

Read more →

Follow our work