Join Our Team
Help shape the future of AI evaluation. Core contributors will be eligible for co-authorship.
We are looking for engineering collaborators to help build the next generation of AI agent benchmarks. Contributors will design challenging, verifiable tasks that test frontier models in realistic computer science and software engineering environments.
This is a chance to help define the evaluation standards that will guide future agent research: from task design and environment construction to scoring infrastructure and benchmark analysis.
To get involved, please contact qmang@berkeley.edu, wenhao.chai@princeton.edu, huanzhimao@berkeley.edu, or zhifei.li@berkeley.edu.