agent-ui / README.md
lvwerra's picture
lvwerra HF Staff
Fix tool message handling, parallel image refs, error display, and UX polish
4da424f
metadata
title: Agent UI
emoji: πŸ€–
colorFrom: green
colorTo: blue
sdk: docker
pinned: false
header: mini

Agent UI

A multi-agent AI interface with code execution, web search, image generation, and deep research β€” all orchestrated from a single command center.

Local Install

pip install .        # Install from pyproject.toml
python -m backend.main   # Start server at http://localhost:8765

Or use Make shortcuts:

make install   # pip install .
make dev       # Start dev server

Configure API keys in the Settings panel, or set environment variables:

Variable Purpose
LLM_API_KEY Default LLM provider token (any OpenAI-compatible API)
HF_TOKEN HuggingFace token (image generation, hosted models)
E2B_API_KEY E2B sandbox for code execution
SERPER_API_KEY Serper for web search

Docker

docker build -t agent-ui .
docker run -p 7860:7860 -e LLM_API_KEY=... agent-ui

CLI options: --port, --no-browser, --config-dir, --workspace-dir, --multi-user.

For HuggingFace Spaces deployment, set HF_BUCKET and HF_BUCKET_TOKEN secrets for workspace persistence across restarts.

Architecture

backend/
β”œβ”€β”€ agents.py       # Agent registry (single source of truth) + shared LLM utilities
β”œβ”€β”€ main.py         # FastAPI routes, SSE streaming, file management
β”œβ”€β”€ command.py      # Command center: tool routing, agent launching
β”œβ”€β”€ code.py         # Code agent: E2B sandbox execution
β”œβ”€β”€ agent.py        # Web agent: search + browse
β”œβ”€β”€ research.py     # Research agent: multi-source deep analysis
β”œβ”€β”€ image.py        # Image agent: generate/edit via HuggingFace
└── tools.py        # Direct tools (execute_code, web_search, show_html, etc.)

frontend/
β”œβ”€β”€ index.html      # Entry point
β”œβ”€β”€ utils.js        # Global state, shared helpers (setupInputListeners, closeAllPanels)
β”œβ”€β”€ timeline.js     # Sidebar timeline data + rendering
β”œβ”€β”€ sessions.js     # Session CRUD + panel
β”œβ”€β”€ tabs.js         # Tab creation/switching, sendMessage
β”œβ”€β”€ streaming.js    # SSE streaming, code cells, action widgets, markdown
β”œβ”€β”€ workspace.js    # Workspace serialize/restore
β”œβ”€β”€ settings.js     # Settings CRUD, themes, debug/files/sessions panels
β”œβ”€β”€ app.js          # Initialization, event listeners, DOMContentLoaded
β”œβ”€β”€ style.css       # All styles (CSS custom properties for theming)
└── research-ui.js  # Research-specific UI components

How It Works

  1. The command center receives user messages and decides whether to answer directly or launch sub-agents
  2. Sub-agents (code, web, research, image) run in their own tabs with specialized tools
  3. All communication uses SSE streaming β€” agents yield JSON events with a type field
  4. Settings store providers, models, and agent-to-model assignments β€” any OpenAI-compatible API works

Extending Agent UI

Adding a New Agent

Only the backend needs changes β€” the frontend fetches the registry from GET /api/agents at startup.

1. Backend registry β€” add to AGENT_REGISTRY in backend/agents.py:

"my_agent": {
    "label": "MY AGENT",
    "system_prompt": "You are a helpful assistant...",
    "tool": {
        "type": "function",
        "function": {
            "name": "launch_my_agent",
            "description": "Launch my agent for X tasks.",
            "parameters": {
                "type": "object",
                "properties": {
                    "task": {"type": "string", "description": "The task"},
                    "task_id": {"type": "string", "description": "2-3 word ID"}
                },
                "required": ["task", "task_id"]
            }
        }
    },
    "tool_arg": "task",
    "has_counter": True,
    "in_menu": True,
    "in_launcher": True,
    "placeholder": "Enter message...",
    "capabilities": "Short description of what this agent can do.",
},

2. Backend streaming handler β€” create backend/my_agent.py:

from .agents import call_llm

def stream_my_agent(client, model, messages, extra_params=None, abort_event=None):
    """Generator yielding SSE event dicts."""
    debug_call_number = 0

    while not_done:
        # call_llm handles retries and emits debug events
        response = None
        for event in call_llm(client, model, messages, tools=MY_TOOLS,
                              extra_params=extra_params, abort_event=abort_event,
                              call_number=debug_call_number):
            if "_response" in event:
                response = event["_response"]
                debug_call_number = event["_call_number"]
            else:
                yield event
                if event.get("type") in ("error", "aborted"):
                    return

        # Process response, yield events...
        yield {"type": "thinking", "content": "..."}
        yield {"type": "result", "content": "Final answer"}

    yield {"type": "done"}

Required events: done, error. Common: thinking, content, result, result_preview.

3. Wire the route β€” in backend/main.py, add to the streaming handler dispatch (search for agent_type):

elif request.agent_type == "my_agent":
    return StreamingResponse(stream_my_agent_handler(...), ...)

4. Frontend β€” no changes needed. The frontend fetches the registry from GET /api/agents at startup.

Adding a Direct Tool

Direct tools execute synchronously in the command center (no sub-agent spawned). Only two files need changes.

1. Define the tool schema + execute function in backend/tools.py:

my_tool = {
    "type": "function",
    "function": {
        "name": "my_tool",
        "description": "Does something useful.",
        "parameters": {
            "type": "object",
            "properties": {
                "input": {"type": "string", "description": "The input"}
            },
            "required": ["input"]
        }
    }
}

def execute_my_tool(input: str, files_root: str = None) -> dict:
    return {"content": "Result text for the LLM", "extra_data": "..."}

2. Register it in DIRECT_TOOL_REGISTRY at the bottom of backend/tools.py:

DIRECT_TOOL_REGISTRY = {
    "show_html": { ... },  # existing
    "my_tool": {
        "schema": my_tool,
        "execute": lambda args, ctx: execute_my_tool(
            args.get("input", ""), files_root=ctx.get("files_root")
        ),
    },
}

That's it β€” command.py automatically picks up tools from the registry.

Modifying System Prompts

All system prompts live in backend/agents.py inside AGENT_REGISTRY. Edit the "system_prompt" field for any agent.

The get_system_prompt() function adds dynamic context automatically:

  • {tools_section} β€” replaced with available agent descriptions (command center only)
  • Current date is appended to all prompts
  • Project file tree is appended (in main.py wrapper)
  • Theme/styling context is added for code agents

Adding a Model Provider

In the Settings panel, models are configured through Providers and Models:

  1. Add a provider: name + OpenAI-compatible endpoint URL + API token
  2. Add a model: name + provider + API model ID (e.g., gpt-4o, claude-sonnet-4-20250514)
  3. Assign models: pick which model each agent type uses

Any OpenAI-compatible API works (OpenAI, Anthropic via proxy, Ollama, vLLM, etc.).

Settings are stored in workspace/settings.json and managed via the Settings panel in the UI.

Creating a Theme

Themes are CSS custom property sets defined in frontend/settings.js.

Add to themeColors object (search for const themeColors):

myTheme: {
    border: '#8e24aa',
    bg: '#f3e5f5',
    hoverBg: '#e1bee7',
    accent: '#6a1b9a',
    accentRgb: '106, 27, 154',
    ...lightSurface    // Use for light themes
},

For dark themes, override the surface colors instead of spreading lightSurface:

myDarkTheme: {
    border: '#bb86fc',
    bg: '#1e1e2e',
    hoverBg: '#2a2a3e',
    accent: '#bb86fc',
    accentRgb: '187, 134, 252',
    bgPrimary: '#121218',
    bgSecondary: '#1e1e2e',
    bgTertiary: '#0e0e14',
    bgInput: '#0e0e14',
    bgHover: '#2a2a3e',
    bgCard: '#1e1e2e',
    textPrimary: '#e0e0e0',
    textSecondary: '#999999',
    textMuted: '#666666',
    borderPrimary: '#333344',
    borderSubtle: '#222233'
},

The theme automatically appears in the Settings theme picker β€” no other changes needed. The applyTheme() function reads all properties from the object and sets the corresponding CSS variables.

Available CSS variables: --theme-accent, --theme-accent-rgb, --theme-bg, --theme-hover-bg, --theme-border, --bg-primary, --bg-secondary, --bg-tertiary, --bg-input, --bg-hover, --bg-card, --text-primary, --text-secondary, --text-muted, --border-primary, --border-subtle.

SSE Event Protocol

All agents communicate via Server-Sent Events. Each event is a JSON object with a type field.

Event Description
done Stream complete (required)
error {content} β€” error message (required)
thinking {content} β€” reasoning text
content {content} β€” streamed response tokens
result {content, figures?} β€” final output for command center
result_preview Same as result, shown inline
retry {attempt, max_attempts, delay, message} β€” retrying
debug_call_input {call_number, messages} β€” LLM input (debug panel)
debug_call_output {call_number, response} β€” LLM output (debug panel)
launch {agent_type, initial_message, task_id} β€” spawn sub-agent
tool_start {tool, args} β€” direct tool started
tool_result {tool, result} β€” direct tool completed
code_start {code} β€” code execution started
code {output, error, images} β€” code execution result

Key Patterns & Conventions

Backend

  • Single source of truth: AGENT_REGISTRY in backend/agents.py defines all agent types. The frontend fetches it via GET /api/agents β€” never duplicate agent definitions.
  • LLM calls: Always use call_llm() from agents.py β€” it handles retries, abort checking, and emits debug_call_input/debug_call_output events for the debug panel.
  • Streaming pattern: Agent handlers are sync generators yielding event dicts. _stream_sync_generator() in main.py wraps them for async SSE delivery β€” never duplicate the async queue boilerplate.
  • Direct tools: DIRECT_TOOL_REGISTRY in tools.py maps tool name β†’ {schema, execute}. command.py dispatches automatically.
  • Result nudging: When an agent finishes without <result> tags, nudge_for_result() in agents.py asks the LLM for a final answer. It uses call_llm internally.

Frontend

  • No build system: Plain <script> tags in index.html, no bundler. Files share window scope.
  • Load order matters: utils.js loads first (declares all globals), then other files. Cross-file function calls are fine because they happen at runtime, not parse time.
  • Global state lives in utils.js: AGENT_REGISTRY, settings, activeTabId, tabCounter, timelineData, debugHistory, globalFigureRegistry, etc.
  • Shared helpers (also in utils.js):
    • setupInputListeners(container, tabId) β€” wires textarea auto-resize, Enter-to-send, send button click
    • setupCollapseToggle(cell, labelSelector) β€” wires click-to-collapse on tool/code cells
    • closeAllPanels() β€” closes all right-side panels (settings, debug, files, sessions)
  • Markdown rendering: parseMarkdown() in streaming.js is the single entry point (marked + KaTeX + Prism).
  • Panel toggle pattern: Call closeAllPanels() first, then add .active to the panel being opened.
  • Workspace persistence: Changes auto-save via saveWorkspaceDebounced(). Tab state is serialized to JSON and posted to /api/workspace.
  • Cache busting: Bump ?v=N query params in index.html when changing JS/CSS files.

Naming

  • Backend: stream_<agent>_execution() for the sync generator, _stream_<agent>_inner() for the async wrapper in main.py
  • Frontend: Agent types use short keys (code, agent, research, image, command)
  • CSS: --theme-* for accent colors, --bg-* / --text-* / --border-* for surface colors

Verification

Verify backend imports: python -c "from backend.command import stream_command_center"