Research
Publications
A list of papers I've contributed to. See Google Scholar for the most current list.
-
Terminal-bench: Benchmarking agents on hard, realistic tasks in command line interfaces
M.A. Merrill, A.G. Shaw, N. Carlini, B. Li, H. Raj, I. Bercovich, L. Shi, J.Y. Shin, et al.
arXiv preprint arXiv:2601.11868 -
Harbor: A framework for evaluating and optimizing agents and models in container environments
HF Team
January 2026 -
Customer-Agent: Learning Long-Context Reasoning over Shopping Trajectories via Reinforcement Learning
Hongye Liu, Rongmei Lin, Anurag Kashyap, Hejie Cui, Ricardo Henao, Besnik, et al.
OpenReview, 2026 -
Terminal-bench: A benchmark for AI agents in terminal environments
TTB Team
2025