Spaces:

lvwerra
/

agent-ui

Running

App Files Files Community

agent-ui / README.md

lvwerra HF Staff

Fix tool message handling, parallel image refs, error display, and UX polish

4da424f 26 days ago

preview code

raw

history blame contribute delete

12.7 kB

metadata

title: Agent UI
emoji: 🤖
colorFrom: green
colorTo: blue
sdk: docker
pinned: false
header: mini

Agent UI

A multi-agent AI interface with code execution, web search, image generation, and deep research — all orchestrated from a single command center.

Local Install

pip install .        # Install from pyproject.toml
python -m backend.main   # Start server at http://localhost:8765

Or use Make shortcuts:

make install   # pip install .
make dev       # Start dev server

Configure API keys in the Settings panel, or set environment variables:

Variable	Purpose
`LLM_API_KEY`	Default LLM provider token (any OpenAI-compatible API)
`HF_TOKEN`	HuggingFace token (image generation, hosted models)
`E2B_API_KEY`	E2B sandbox for code execution
`SERPER_API_KEY`	Serper for web search

Docker

docker build -t agent-ui .
docker run -p 7860:7860 -e LLM_API_KEY=... agent-ui

CLI options: --port, --no-browser, --config-dir, --workspace-dir, --multi-user.

For HuggingFace Spaces deployment, set HF_BUCKET and HF_BUCKET_TOKEN secrets for workspace persistence across restarts.

Architecture

backend/
├── agents.py       # Agent registry (single source of truth) + shared LLM utilities
├── main.py         # FastAPI routes, SSE streaming, file management
├── command.py      # Command center: tool routing, agent launching
├── code.py         # Code agent: E2B sandbox execution
├── agent.py        # Web agent: search + browse
├── research.py     # Research agent: multi-source deep analysis
├── image.py        # Image agent: generate/edit via HuggingFace
└── tools.py        # Direct tools (execute_code, web_search, show_html, etc.)

frontend/
├── index.html      # Entry point
├── utils.js        # Global state, shared helpers (setupInputListeners, closeAllPanels)
├── timeline.js     # Sidebar timeline data + rendering
├── sessions.js     # Session CRUD + panel
├── tabs.js         # Tab creation/switching, sendMessage
├── streaming.js    # SSE streaming, code cells, action widgets, markdown
├── workspace.js    # Workspace serialize/restore
├── settings.js     # Settings CRUD, themes, debug/files/sessions panels
├── app.js          # Initialization, event listeners, DOMContentLoaded
├── style.css       # All styles (CSS custom properties for theming)
└── research-ui.js  # Research-specific UI components

How It Works

The command center receives user messages and decides whether to answer directly or launch sub-agents
Sub-agents (code, web, research, image) run in their own tabs with specialized tools
All communication uses SSE streaming — agents yield JSON events with a type field
Settings store providers, models, and agent-to-model assignments — any OpenAI-compatible API works

Extending Agent UI

Adding a New Agent

Only the backend needs changes — the frontend fetches the registry from GET /api/agents at startup.

1. Backend registry — add to AGENT_REGISTRY in backend/agents.py:

"my_agent": {
    "label": "MY AGENT",
    "system_prompt": "You are a helpful assistant...",
    "tool": {
        "type": "function",
        "function": {
            "name": "launch_my_agent",
            "description": "Launch my agent for X tasks.",
            "parameters": {
                "type": "object",
                "properties": {
                    "task": {"type": "string", "description": "The task"},
                    "task_id": {"type": "string", "description": "2-3 word ID"}
                },
                "required": ["task", "task_id"]
            }
        }
    },
    "tool_arg": "task",
    "has_counter": True,
    "in_menu": True,
    "in_launcher": True,
    "placeholder": "Enter message...",
    "capabilities": "Short description of what this agent can do.",
},

2. Backend streaming handler — create backend/my_agent.py:

from .agents import call_llm

def stream_my_agent(client, model, messages, extra_params=None, abort_event=None):
    """Generator yielding SSE event dicts."""
    debug_call_number = 0

    while not_done:
        # call_llm handles retries and emits debug events
        response = None
        for event in call_llm(client, model, messages, tools=MY_TOOLS,
                              extra_params=extra_params, abort_event=abort_event,
                              call_number=debug_call_number):
            if "_response" in event:
                response = event["_response"]
                debug_call_number = event["_call_number"]
            else:
                yield event
                if event.get("type") in ("error", "aborted"):
                    return

        # Process response, yield events...
        yield {"type": "thinking", "content": "..."}
        yield {"type": "result", "content": "Final answer"}

    yield {"type": "done"}

Required events: done, error. Common: thinking, content, result, result_preview.

3. Wire the route — in backend/main.py, add to the streaming handler dispatch (search for agent_type):

elif request.agent_type == "my_agent":
    return StreamingResponse(stream_my_agent_handler(...), ...)

4. Frontend — no changes needed. The frontend fetches the registry from GET /api/agents at startup.

Adding a Direct Tool

Direct tools execute synchronously in the command center (no sub-agent spawned). Only two files need changes.

1. Define the tool schema + execute function in backend/tools.py:

my_tool = {
    "type": "function",
    "function": {
        "name": "my_tool",
        "description": "Does something useful.",
        "parameters": {
            "type": "object",
            "properties": {
                "input": {"type": "string", "description": "The input"}
            },
            "required": ["input"]
        }
    }
}

def execute_my_tool(input: str, files_root: str = None) -> dict:
    return {"content": "Result text for the LLM", "extra_data": "..."}

2. Register it in DIRECT_TOOL_REGISTRY at the bottom of backend/tools.py:

DIRECT_TOOL_REGISTRY = {
    "show_html": { ... },  # existing
    "my_tool": {
        "schema": my_tool,
        "execute": lambda args, ctx: execute_my_tool(
            args.get("input", ""), files_root=ctx.get("files_root")
        ),
    },
}

That's it — command.py automatically picks up tools from the registry.

Modifying System Prompts

All system prompts live in backend/agents.py inside AGENT_REGISTRY. Edit the "system_prompt" field for any agent.

The get_system_prompt() function adds dynamic context automatically:

{tools_section} — replaced with available agent descriptions (command center only)
Current date is appended to all prompts
Project file tree is appended (in main.py wrapper)
Theme/styling context is added for code agents

Adding a Model Provider

In the Settings panel, models are configured through Providers and Models:

Add a provider: name + OpenAI-compatible endpoint URL + API token
Add a model: name + provider + API model ID (e.g., gpt-4o, claude-sonnet-4-20250514)
Assign models: pick which model each agent type uses

Any OpenAI-compatible API works (OpenAI, Anthropic via proxy, Ollama, vLLM, etc.).

Settings are stored in workspace/settings.json and managed via the Settings panel in the UI.

Creating a Theme

Themes are CSS custom property sets defined in frontend/settings.js.

Add to themeColors object (search for const themeColors):

myTheme: {
    border: '#8e24aa',
    bg: '#f3e5f5',
    hoverBg: '#e1bee7',
    accent: '#6a1b9a',
    accentRgb: '106, 27, 154',
    ...lightSurface    // Use for light themes
},

For dark themes, override the surface colors instead of spreading lightSurface:

myDarkTheme: {
    border: '#bb86fc',
    bg: '#1e1e2e',
    hoverBg: '#2a2a3e',
    accent: '#bb86fc',
    accentRgb: '187, 134, 252',
    bgPrimary: '#121218',
    bgSecondary: '#1e1e2e',
    bgTertiary: '#0e0e14',
    bgInput: '#0e0e14',
    bgHover: '#2a2a3e',
    bgCard: '#1e1e2e',
    textPrimary: '#e0e0e0',
    textSecondary: '#999999',
    textMuted: '#666666',
    borderPrimary: '#333344',
    borderSubtle: '#222233'
},

The theme automatically appears in the Settings theme picker — no other changes needed. The applyTheme() function reads all properties from the object and sets the corresponding CSS variables.

Available CSS variables: --theme-accent, --theme-accent-rgb, --theme-bg, --theme-hover-bg, --theme-border, --bg-primary, --bg-secondary, --bg-tertiary, --bg-input, --bg-hover, --bg-card, --text-primary, --text-secondary, --text-muted, --border-primary, --border-subtle.

SSE Event Protocol

All agents communicate via Server-Sent Events. Each event is a JSON object with a type field.

Event	Description
`done`	Stream complete (required)
`error`	`{content}` — error message (required)
`thinking`	`{content}` — reasoning text
`content`	`{content}` — streamed response tokens
`result`	`{content, figures?}` — final output for command center
`result_preview`	Same as result, shown inline
`retry`	`{attempt, max_attempts, delay, message}` — retrying
`debug_call_input`	`{call_number, messages}` — LLM input (debug panel)
`debug_call_output`	`{call_number, response}` — LLM output (debug panel)
`launch`	`{agent_type, initial_message, task_id}` — spawn sub-agent
`tool_start`	`{tool, args}` — direct tool started
`tool_result`	`{tool, result}` — direct tool completed
`code_start`	`{code}` — code execution started
`code`	`{output, error, images}` — code execution result

Key Patterns & Conventions

Backend

Single source of truth: AGENT_REGISTRY in backend/agents.py defines all agent types. The frontend fetches it via GET /api/agents — never duplicate agent definitions.
LLM calls: Always use call_llm() from agents.py — it handles retries, abort checking, and emits debug_call_input/debug_call_output events for the debug panel.
Streaming pattern: Agent handlers are sync generators yielding event dicts. _stream_sync_generator() in main.py wraps them for async SSE delivery — never duplicate the async queue boilerplate.
Direct tools: DIRECT_TOOL_REGISTRY in tools.py maps tool name → {schema, execute}. command.py dispatches automatically.
Result nudging: When an agent finishes without <result> tags, nudge_for_result() in agents.py asks the LLM for a final answer. It uses call_llm internally.

Frontend

No build system: Plain <script> tags in index.html, no bundler. Files share window scope.
Load order matters: utils.js loads first (declares all globals), then other files. Cross-file function calls are fine because they happen at runtime, not parse time.
Global state lives in utils.js: AGENT_REGISTRY, settings, activeTabId, tabCounter, timelineData, debugHistory, globalFigureRegistry, etc.
Shared helpers (also in utils.js):
- setupInputListeners(container, tabId) — wires textarea auto-resize, Enter-to-send, send button click
- setupCollapseToggle(cell, labelSelector) — wires click-to-collapse on tool/code cells
- closeAllPanels() — closes all right-side panels (settings, debug, files, sessions)
Markdown rendering: parseMarkdown() in streaming.js is the single entry point (marked + KaTeX + Prism).
Panel toggle pattern: Call closeAllPanels() first, then add .active to the panel being opened.
Workspace persistence: Changes auto-save via saveWorkspaceDebounced(). Tab state is serialized to JSON and posted to /api/workspace.
Cache busting: Bump ?v=N query params in index.html when changing JS/CSS files.

Naming

Backend: stream_<agent>_execution() for the sync generator, _stream_<agent>_inner() for the async wrapper in main.py
Frontend: Agent types use short keys (code, agent, research, image, command)
CSS: --theme-* for accent colors, --bg-* / --text-* / --border-* for surface colors

Verification

Verify backend imports: python -c "from backend.command import stream_command_center"