agent-ui / README.md
lvwerra's picture
lvwerra HF Staff
Fix tool message handling, parallel image refs, error display, and UX polish
4da424f
---
title: Agent UI
emoji: πŸ€–
colorFrom: green
colorTo: blue
sdk: docker
pinned: false
header: mini
---
# Agent UI
A multi-agent AI interface with code execution, web search, image generation, and deep research β€” all orchestrated from a single command center.
## Local Install
```bash
pip install . # Install from pyproject.toml
python -m backend.main # Start server at http://localhost:8765
```
Or use Make shortcuts:
```bash
make install # pip install .
make dev # Start dev server
```
Configure API keys in the Settings panel, or set environment variables:
| Variable | Purpose |
|----------|---------|
| `LLM_API_KEY` | Default LLM provider token (any OpenAI-compatible API) |
| `HF_TOKEN` | HuggingFace token (image generation, hosted models) |
| `E2B_API_KEY` | [E2B](https://e2b.dev) sandbox for code execution |
| `SERPER_API_KEY` | [Serper](https://serper.dev) for web search |
## Docker
```bash
docker build -t agent-ui .
docker run -p 7860:7860 -e LLM_API_KEY=... agent-ui
```
CLI options: `--port`, `--no-browser`, `--config-dir`, `--workspace-dir`, `--multi-user`.
For HuggingFace Spaces deployment, set `HF_BUCKET` and `HF_BUCKET_TOKEN` secrets for workspace persistence across restarts.
## Architecture
```
backend/
β”œβ”€β”€ agents.py # Agent registry (single source of truth) + shared LLM utilities
β”œβ”€β”€ main.py # FastAPI routes, SSE streaming, file management
β”œβ”€β”€ command.py # Command center: tool routing, agent launching
β”œβ”€β”€ code.py # Code agent: E2B sandbox execution
β”œβ”€β”€ agent.py # Web agent: search + browse
β”œβ”€β”€ research.py # Research agent: multi-source deep analysis
β”œβ”€β”€ image.py # Image agent: generate/edit via HuggingFace
└── tools.py # Direct tools (execute_code, web_search, show_html, etc.)
frontend/
β”œβ”€β”€ index.html # Entry point
β”œβ”€β”€ utils.js # Global state, shared helpers (setupInputListeners, closeAllPanels)
β”œβ”€β”€ timeline.js # Sidebar timeline data + rendering
β”œβ”€β”€ sessions.js # Session CRUD + panel
β”œβ”€β”€ tabs.js # Tab creation/switching, sendMessage
β”œβ”€β”€ streaming.js # SSE streaming, code cells, action widgets, markdown
β”œβ”€β”€ workspace.js # Workspace serialize/restore
β”œβ”€β”€ settings.js # Settings CRUD, themes, debug/files/sessions panels
β”œβ”€β”€ app.js # Initialization, event listeners, DOMContentLoaded
β”œβ”€β”€ style.css # All styles (CSS custom properties for theming)
└── research-ui.js # Research-specific UI components
```
### How It Works
1. The **command center** receives user messages and decides whether to answer directly or launch sub-agents
2. Sub-agents (code, web, research, image) run in their own tabs with specialized tools
3. All communication uses **SSE streaming** β€” agents yield JSON events with a `type` field
4. Settings store providers, models, and agent-to-model assignments β€” any OpenAI-compatible API works
## Extending Agent UI
### Adding a New Agent
Only the backend needs changes β€” the frontend fetches the registry from `GET /api/agents` at startup.
**1. Backend registry** β€” add to `AGENT_REGISTRY` in `backend/agents.py`:
```python
"my_agent": {
"label": "MY AGENT",
"system_prompt": "You are a helpful assistant...",
"tool": {
"type": "function",
"function": {
"name": "launch_my_agent",
"description": "Launch my agent for X tasks.",
"parameters": {
"type": "object",
"properties": {
"task": {"type": "string", "description": "The task"},
"task_id": {"type": "string", "description": "2-3 word ID"}
},
"required": ["task", "task_id"]
}
}
},
"tool_arg": "task",
"has_counter": True,
"in_menu": True,
"in_launcher": True,
"placeholder": "Enter message...",
"capabilities": "Short description of what this agent can do.",
},
```
**2. Backend streaming handler** β€” create `backend/my_agent.py`:
```python
from .agents import call_llm
def stream_my_agent(client, model, messages, extra_params=None, abort_event=None):
"""Generator yielding SSE event dicts."""
debug_call_number = 0
while not_done:
# call_llm handles retries and emits debug events
response = None
for event in call_llm(client, model, messages, tools=MY_TOOLS,
extra_params=extra_params, abort_event=abort_event,
call_number=debug_call_number):
if "_response" in event:
response = event["_response"]
debug_call_number = event["_call_number"]
else:
yield event
if event.get("type") in ("error", "aborted"):
return
# Process response, yield events...
yield {"type": "thinking", "content": "..."}
yield {"type": "result", "content": "Final answer"}
yield {"type": "done"}
```
Required events: `done`, `error`. Common: `thinking`, `content`, `result`, `result_preview`.
**3. Wire the route** β€” in `backend/main.py`, add to the streaming handler dispatch (search for `agent_type`):
```python
elif request.agent_type == "my_agent":
return StreamingResponse(stream_my_agent_handler(...), ...)
```
**4. Frontend** β€” no changes needed. The frontend fetches the registry from `GET /api/agents` at startup.
### Adding a Direct Tool
Direct tools execute synchronously in the command center (no sub-agent spawned). Only two files need changes.
**1. Define the tool schema + execute function** in `backend/tools.py`:
```python
my_tool = {
"type": "function",
"function": {
"name": "my_tool",
"description": "Does something useful.",
"parameters": {
"type": "object",
"properties": {
"input": {"type": "string", "description": "The input"}
},
"required": ["input"]
}
}
}
def execute_my_tool(input: str, files_root: str = None) -> dict:
return {"content": "Result text for the LLM", "extra_data": "..."}
```
**2. Register it** in `DIRECT_TOOL_REGISTRY` at the bottom of `backend/tools.py`:
```python
DIRECT_TOOL_REGISTRY = {
"show_html": { ... }, # existing
"my_tool": {
"schema": my_tool,
"execute": lambda args, ctx: execute_my_tool(
args.get("input", ""), files_root=ctx.get("files_root")
),
},
}
```
That's it β€” `command.py` automatically picks up tools from the registry.
### Modifying System Prompts
All system prompts live in `backend/agents.py` inside `AGENT_REGISTRY`. Edit the `"system_prompt"` field for any agent.
The `get_system_prompt()` function adds dynamic context automatically:
- `{tools_section}` β€” replaced with available agent descriptions (command center only)
- Current date is appended to all prompts
- Project file tree is appended (in `main.py` wrapper)
- Theme/styling context is added for code agents
### Adding a Model Provider
In the Settings panel, models are configured through **Providers** and **Models**:
1. **Add a provider**: name + OpenAI-compatible endpoint URL + API token
2. **Add a model**: name + provider + API model ID (e.g., `gpt-4o`, `claude-sonnet-4-20250514`)
3. **Assign models**: pick which model each agent type uses
Any OpenAI-compatible API works (OpenAI, Anthropic via proxy, Ollama, vLLM, etc.).
Settings are stored in `workspace/settings.json` and managed via the Settings panel in the UI.
### Creating a Theme
Themes are CSS custom property sets defined in `frontend/settings.js`.
**Add to `themeColors` object** (search for `const themeColors`):
```javascript
myTheme: {
border: '#8e24aa',
bg: '#f3e5f5',
hoverBg: '#e1bee7',
accent: '#6a1b9a',
accentRgb: '106, 27, 154',
...lightSurface // Use for light themes
},
```
For dark themes, override the surface colors instead of spreading `lightSurface`:
```javascript
myDarkTheme: {
border: '#bb86fc',
bg: '#1e1e2e',
hoverBg: '#2a2a3e',
accent: '#bb86fc',
accentRgb: '187, 134, 252',
bgPrimary: '#121218',
bgSecondary: '#1e1e2e',
bgTertiary: '#0e0e14',
bgInput: '#0e0e14',
bgHover: '#2a2a3e',
bgCard: '#1e1e2e',
textPrimary: '#e0e0e0',
textSecondary: '#999999',
textMuted: '#666666',
borderPrimary: '#333344',
borderSubtle: '#222233'
},
```
The theme automatically appears in the Settings theme picker β€” no other changes needed. The `applyTheme()` function reads all properties from the object and sets the corresponding CSS variables.
**Available CSS variables:** `--theme-accent`, `--theme-accent-rgb`, `--theme-bg`, `--theme-hover-bg`, `--theme-border`, `--bg-primary`, `--bg-secondary`, `--bg-tertiary`, `--bg-input`, `--bg-hover`, `--bg-card`, `--text-primary`, `--text-secondary`, `--text-muted`, `--border-primary`, `--border-subtle`.
## SSE Event Protocol
All agents communicate via Server-Sent Events. Each event is a JSON object with a `type` field.
| Event | Description |
|-------|-------------|
| `done` | Stream complete (required) |
| `error` | `{content}` β€” error message (required) |
| `thinking` | `{content}` β€” reasoning text |
| `content` | `{content}` β€” streamed response tokens |
| `result` | `{content, figures?}` β€” final output for command center |
| `result_preview` | Same as result, shown inline |
| `retry` | `{attempt, max_attempts, delay, message}` β€” retrying |
| `debug_call_input` | `{call_number, messages}` β€” LLM input (debug panel) |
| `debug_call_output` | `{call_number, response}` β€” LLM output (debug panel) |
| `launch` | `{agent_type, initial_message, task_id}` β€” spawn sub-agent |
| `tool_start` | `{tool, args}` β€” direct tool started |
| `tool_result` | `{tool, result}` β€” direct tool completed |
| `code_start` | `{code}` β€” code execution started |
| `code` | `{output, error, images}` β€” code execution result |
## Key Patterns & Conventions
### Backend
- **Single source of truth**: `AGENT_REGISTRY` in `backend/agents.py` defines all agent types. The frontend fetches it via `GET /api/agents` β€” never duplicate agent definitions.
- **LLM calls**: Always use `call_llm()` from `agents.py` β€” it handles retries, abort checking, and emits `debug_call_input`/`debug_call_output` events for the debug panel.
- **Streaming pattern**: Agent handlers are sync generators yielding event dicts. `_stream_sync_generator()` in `main.py` wraps them for async SSE delivery β€” never duplicate the async queue boilerplate.
- **Direct tools**: `DIRECT_TOOL_REGISTRY` in `tools.py` maps tool name β†’ `{schema, execute}`. `command.py` dispatches automatically.
- **Result nudging**: When an agent finishes without `<result>` tags, `nudge_for_result()` in `agents.py` asks the LLM for a final answer. It uses `call_llm` internally.
### Frontend
- **No build system**: Plain `<script>` tags in `index.html`, no bundler. Files share `window` scope.
- **Load order matters**: `utils.js` loads first (declares all globals), then other files. Cross-file function calls are fine because they happen at runtime, not parse time.
- **Global state** lives in `utils.js`: `AGENT_REGISTRY`, `settings`, `activeTabId`, `tabCounter`, `timelineData`, `debugHistory`, `globalFigureRegistry`, etc.
- **Shared helpers** (also in `utils.js`):
- `setupInputListeners(container, tabId)` β€” wires textarea auto-resize, Enter-to-send, send button click
- `setupCollapseToggle(cell, labelSelector)` β€” wires click-to-collapse on tool/code cells
- `closeAllPanels()` β€” closes all right-side panels (settings, debug, files, sessions)
- **Markdown rendering**: `parseMarkdown()` in `streaming.js` is the single entry point (marked + KaTeX + Prism).
- **Panel toggle pattern**: Call `closeAllPanels()` first, then add `.active` to the panel being opened.
- **Workspace persistence**: Changes auto-save via `saveWorkspaceDebounced()`. Tab state is serialized to JSON and posted to `/api/workspace`.
- **Cache busting**: Bump `?v=N` query params in `index.html` when changing JS/CSS files.
### Naming
- Backend: `stream_<agent>_execution()` for the sync generator, `_stream_<agent>_inner()` for the async wrapper in `main.py`
- Frontend: Agent types use short keys (`code`, `agent`, `research`, `image`, `command`)
- CSS: `--theme-*` for accent colors, `--bg-*` / `--text-*` / `--border-*` for surface colors
## Verification
Verify backend imports: `python -c "from backend.command import stream_command_center"`