Terminal-bench: A benchmark for AI agents in terminal environments

TTB Team

2025

Earlier benchmark release for evaluating AI agents on terminal-environment tasks.