Terminus 2 (Terminal-Bench)¶

What it is¶

Terminus 2 is a minimal, terminal-native AI agent designed by the Terminal-Bench team. Unlike complex agents with multi-step reasoning engines or GUI-bound tools, Terminus 2 takes a "raw" approach by giving the LLM direct access to a tmux session. The model sends commands as text and parses the terminal output itself, acting as a direct interface between the model's reasoning and the shell.

What problem it solves¶

It demonstrates that a simple, direct approach to terminal-based AI agents (LLM + tmux) can achieve strong performance on complex tasks without the overhead of heavy orchestration layers. It serves as a baseline and a research tool for evaluating how well models can manage terminal state, handle long-running processes, and recover from shell errors.

Where it fits in the stack¶

Development & Ops. It is a specialized, terminal-centric agent that sits at the intersection of developer productivity and autonomous systems research.

Typical use cases¶

Terminal Task Automation: Automating repetitive shell tasks, migrations, or server configuration via natural language.
Agentic Benchmarking: Used as a baseline in the "Terminal-Bench" suite to evaluate LLM capability in a CLI environment.
Minimalist Agent Research: Exploring the limits of simple agent architectures that use tmux for session management.

Strengths¶

Minimalist Architecture: Low overhead; easier to understand and debug than multi-layered agents.
High Portability: Works anywhere tmux and Python can run.
Direct Feedback Loop: The model "sees" the raw terminal output exactly as a human would in a tmux window.
Benchmark Leader: Performs remarkably well on terminal tasks compared to more complex competitors.

Limitations¶

Minimal Tooling: Lacks advanced features like built-in file editors or complex GUI interaction.
State Management: Relies heavily on the LLM's ability to keep track of the terminal state within its context window.
tmux Dependency: Requires a functional tmux environment, which may be a barrier for some users.

When to use it¶

When you need a lightweight agent to perform shell-based tasks without a full IDE integration.
When conducting research on agent performance in CLI environments.
For users who prefer a "raw" terminal experience but want AI assistance.

When not to use it¶

When you need a full-featured AI coding agent with LSP integration and IDE features (e.g., Cursor or VS Code).
For users unfamiliar with tmux or shell-based workflows.

Getting started¶

Installation¶

Terminus 2 typically requires a Python environment and tmux.

# Clone the Terminal-Bench repository
git clone https://github.com/pro-puffin/terminal-bench.git
cd terminal-bench

# Install dependencies
pip install -r requirements.txt

# Ensure tmux is installed on your system
sudo apt install tmux

Basic usage¶

You can run Terminus 2 by pointing it to a specific task or prompt:

python -m terminal_bench.agents.terminus2 --task "Find all large log files in /var/log and compress them."

Technical Examples¶

tmux Session Inspection¶

Because Terminus 2 uses tmux, you can attach to the session and watch the agent work in real-time:

# List tmux sessions
tmux ls

# Attach to the session used by Terminus (usually named or indexed)
tmux attach -t <session_id>

Custom Prompting Pattern¶

Terminus 2 uses a specific system prompt to guide the LLM's terminal interactions. You can modify this in agents/terminus2/prompts.py to add project-specific constraints or tools.

Sources / references¶

Contribution Metadata¶

Last reviewed: 2026-06-01
Confidence: high