Change the repository type filter
All
Repositories list
10 repositories
terminal-bench-science
PublicTerminal-Bench-Science: Evaluating AI Agents on Complex Real-World Scientific Workflows in the Terminal- Harbor is a framework for running agent evaluations and creating and using RL environments.
terminal-bench-challenge
Publicbenchmark-template
Public templateawesome-harbor
Publicterminal-bench-2
Public