Your Next Long-Context Recipe: Open-Ended Problems
- Date
- May 12, 2026
- Category
- Research
Unsolved: no solution has achieved perfect scores
Open-ended: research & optimization challenges
Verifiable: continuous scoring, always room to improve
Diverse: systems, ML, algorithms, security, and more
→ Read our blog here.
→ Read our paper here.
frontier-cs · codebase_adaptation · evaluation
# Run with any standard agent CLI
$ uv run harbor run -d frontier-cs-algorithm \ -a claude-code -m "anthropic/claude-opus-4-6"
# Try your own solution!
$ uv run frontier eval algorithmic 0 <your_solution.cpp>
track.algorithmic problem.0 backend.docker
continuous scoring enabled...
running public test instances...
✓ Score@1 72.6
✓ leaderboard submission ready
172 problems
| Rank | Model | Score@1 | Avg@5 | Score@5 | Elo |
|---|---|---|---|---|---|
| 🥇 | gemini-3.0-pro | 33.12 | 34.58 | 56.09 | 1265 |
| 🥈 | gpt-5.2-thinking | 32.40 | 33.11 | 47.19 | 1242 |
| 🥉 | gpt-5-thinking | 23.10 | 22.58 | 39.73 | 1196 |
| 4 | deepseek-3.2 | 24.83 | 23.89 | 41.44 | 1193 |
| 5 | grok-4 | 24.04 | 22.98 | 36.81 | 1174 |
| 6 | gemini-2.5-pro | 20.34 | 19.32 | 36.65 | 1167 |
| 7 | gpt-5.1-thinking | 20.64 | 21.49 | 34.76 | 1164 |
Human reference: 86.99 (Score@1)
68 problems
| Rank | Model | Score@1 | Avg@5 | Score@5 | Elo |
|---|---|---|---|---|---|
| 🥇 | gemini-3.0-pro | 46.55 | 43.14 | 59.22 | 1283 |
| 🥈 | gpt-5-thinking | 30.91 | 34.94 | 55.25 | 1218 |
| 🥉 | gpt-5.1-thinking | 32.12 | 33.70 | 56.79 | 1214 |
| 4 | gpt-5.2-thinking | 30.29 | 34.09 | 58.90 | 1210 |
| 5 | gemini-2.5-pro | 21.66 | 25.74 | 51.57 | 1180 |
| 6 | grok-4 | 26.75 | 24.01 | 48.15 | 1149 |
| 7 | deepseek-3.2 | 21.51 | 21.76 | 44.41 | 1146 |
Coming soon
Agent results will appear here once the track is released.
Recover a hidden permutation using as few adaptive queries as possible. The task tests information-efficient probing, feedback-driven planning, and reasoning under sparse signals.
View task →
Pack geometric pieces into increasingly dense layouts. The task rewards iterative search, heuristic design, symmetry handling, and long-horizon improvement.
View task →Academic institutions
UC Berkeley
Princeton University
UCSD
Georgia Tech
Stanford University
University of Washington
Nanyang Technological University
University of Toronto
UIUC
University of Michigan
New York University
MIT