Spaces:

lvwerra
/

agent-ui

Running

App Files Files Community

agent-ui / README.md

lvwerra HF Staff

Fix tool message handling, parallel image refs, error display, and UX polish

4da424f 28 days ago

preview code

raw

history blame contribute delete

12.7 kB

	---
	title: Agent UI
	emoji: 🤖
	colorFrom: green
	colorTo: blue
	sdk: docker
	pinned: false
	header: mini
	---

	# Agent UI

	A multi-agent AI interface with code execution, web search, image generation, and deep research — all orchestrated from a single command center.

	## Local Install

	```bash
	pip install . # Install from pyproject.toml
	python -m backend.main # Start server at http://localhost:8765
	```

	Or use Make shortcuts:

	```bash
	make install # pip install .
	make dev # Start dev server
	```

	Configure API keys in the Settings panel, or set environment variables:

	\| Variable \| Purpose \|
	\|----------\|---------\|
	\| `LLM_API_KEY` \| Default LLM provider token (any OpenAI-compatible API) \|
	\| `HF_TOKEN` \| HuggingFace token (image generation, hosted models) \|
	\| `E2B_API_KEY` \| [E2B](https://e2b.dev) sandbox for code execution \|
	\| `SERPER_API_KEY` \| [Serper](https://serper.dev) for web search \|

	## Docker

	```bash
	docker build -t agent-ui .
	docker run -p 7860:7860 -e LLM_API_KEY=... agent-ui
	```

	CLI options: `--port`, `--no-browser`, `--config-dir`, `--workspace-dir`, `--multi-user`.

	For HuggingFace Spaces deployment, set `HF_BUCKET` and `HF_BUCKET_TOKEN` secrets for workspace persistence across restarts.

	## Architecture

	```
	backend/
	├── agents.py # Agent registry (single source of truth) + shared LLM utilities
	├── main.py # FastAPI routes, SSE streaming, file management
	├── command.py # Command center: tool routing, agent launching
	├── code.py # Code agent: E2B sandbox execution
	├── agent.py # Web agent: search + browse
	├── research.py # Research agent: multi-source deep analysis
	├── image.py # Image agent: generate/edit via HuggingFace
	└── tools.py # Direct tools (execute_code, web_search, show_html, etc.)

	frontend/
	├── index.html # Entry point
	├── utils.js # Global state, shared helpers (setupInputListeners, closeAllPanels)
	├── timeline.js # Sidebar timeline data + rendering
	├── sessions.js # Session CRUD + panel
	├── tabs.js # Tab creation/switching, sendMessage
	├── streaming.js # SSE streaming, code cells, action widgets, markdown
	├── workspace.js # Workspace serialize/restore
	├── settings.js # Settings CRUD, themes, debug/files/sessions panels
	├── app.js # Initialization, event listeners, DOMContentLoaded
	├── style.css # All styles (CSS custom properties for theming)
	└── research-ui.js # Research-specific UI components
	```

	### How It Works

	1. The command center receives user messages and decides whether to answer directly or launch sub-agents
	2. Sub-agents (code, web, research, image) run in their own tabs with specialized tools
	3. All communication uses SSE streaming — agents yield JSON events with a `type` field
	4. Settings store providers, models, and agent-to-model assignments — any OpenAI-compatible API works

	## Extending Agent UI

	### Adding a New Agent

	Only the backend needs changes — the frontend fetches the registry from `GET /api/agents` at startup.

	1. Backend registry — add to `AGENT_REGISTRY` in `backend/agents.py`:

	```python
	"my_agent": {
	"label": "MY AGENT",
	"system_prompt": "You are a helpful assistant...",
	"tool": {
	"type": "function",
	"function": {
	"name": "launch_my_agent",
	"description": "Launch my agent for X tasks.",
	"parameters": {
	"type": "object",
	"properties": {
	"task": {"type": "string", "description": "The task"},
	"task_id": {"type": "string", "description": "2-3 word ID"}
	},
	"required": ["task", "task_id"]
	}
	}
	},
	"tool_arg": "task",
	"has_counter": True,
	"in_menu": True,
	"in_launcher": True,
	"placeholder": "Enter message...",
	"capabilities": "Short description of what this agent can do.",
	},
	```

	2. Backend streaming handler — create `backend/my_agent.py`:

	```python
	from .agents import call_llm

	def stream_my_agent(client, model, messages, extra_params=None, abort_event=None):
	"""Generator yielding SSE event dicts."""
	debug_call_number = 0

	while not_done:
	# call_llm handles retries and emits debug events
	response = None
	for event in call_llm(client, model, messages, tools=MY_TOOLS,
	extra_params=extra_params, abort_event=abort_event,
	call_number=debug_call_number):
	if "_response" in event:
	response = event["_response"]
	debug_call_number = event["_call_number"]
	else:
	yield event
	if event.get("type") in ("error", "aborted"):
	return

	# Process response, yield events...
	yield {"type": "thinking", "content": "..."}
	yield {"type": "result", "content": "Final answer"}

	yield {"type": "done"}
	```

	Required events: `done`, `error`. Common: `thinking`, `content`, `result`, `result_preview`.

	3. Wire the route — in `backend/main.py`, add to the streaming handler dispatch (search for `agent_type`):

	```python
	elif request.agent_type == "my_agent":
	return StreamingResponse(stream_my_agent_handler(...), ...)
	```

	4. Frontend — no changes needed. The frontend fetches the registry from `GET /api/agents` at startup.

	### Adding a Direct Tool

	Direct tools execute synchronously in the command center (no sub-agent spawned). Only two files need changes.

	1. Define the tool schema + execute function in `backend/tools.py`:

	```python
	my_tool = {
	"type": "function",
	"function": {
	"name": "my_tool",
	"description": "Does something useful.",
	"parameters": {
	"type": "object",
	"properties": {
	"input": {"type": "string", "description": "The input"}
	},
	"required": ["input"]
	}
	}
	}

	def execute_my_tool(input: str, files_root: str = None) -> dict:
	return {"content": "Result text for the LLM", "extra_data": "..."}
	```

	2. Register it in `DIRECT_TOOL_REGISTRY` at the bottom of `backend/tools.py`:

	```python
	DIRECT_TOOL_REGISTRY = {
	"show_html": { ... }, # existing
	"my_tool": {
	"schema": my_tool,
	"execute": lambda args, ctx: execute_my_tool(
	args.get("input", ""), files_root=ctx.get("files_root")
	),
	},
	}
	```

	That's it — `command.py` automatically picks up tools from the registry.

	### Modifying System Prompts

	All system prompts live in `backend/agents.py` inside `AGENT_REGISTRY`. Edit the `"system_prompt"` field for any agent.

	The `get_system_prompt()` function adds dynamic context automatically:
	- `{tools_section}` — replaced with available agent descriptions (command center only)
	- Current date is appended to all prompts
	- Project file tree is appended (in `main.py` wrapper)
	- Theme/styling context is added for code agents

	### Adding a Model Provider

	In the Settings panel, models are configured through Providers and Models:

	1. Add a provider: name + OpenAI-compatible endpoint URL + API token
	2. Add a model: name + provider + API model ID (e.g., `gpt-4o`, `claude-sonnet-4-20250514`)
	3. Assign models: pick which model each agent type uses

	Any OpenAI-compatible API works (OpenAI, Anthropic via proxy, Ollama, vLLM, etc.).

	Settings are stored in `workspace/settings.json` and managed via the Settings panel in the UI.

	### Creating a Theme

	Themes are CSS custom property sets defined in `frontend/settings.js`.

	Add to `themeColors` object (search for `const themeColors`):

	```javascript
	myTheme: {
	border: '#8e24aa',
	bg: '#f3e5f5',
	hoverBg: '#e1bee7',
	accent: '#6a1b9a',
	accentRgb: '106, 27, 154',
	...lightSurface // Use for light themes
	},
	```

	For dark themes, override the surface colors instead of spreading `lightSurface`:

	```javascript
	myDarkTheme: {
	border: '#bb86fc',
	bg: '#1e1e2e',
	hoverBg: '#2a2a3e',
	accent: '#bb86fc',
	accentRgb: '187, 134, 252',
	bgPrimary: '#121218',
	bgSecondary: '#1e1e2e',
	bgTertiary: '#0e0e14',
	bgInput: '#0e0e14',
	bgHover: '#2a2a3e',
	bgCard: '#1e1e2e',
	textPrimary: '#e0e0e0',
	textSecondary: '#999999',
	textMuted: '#666666',
	borderPrimary: '#333344',
	borderSubtle: '#222233'
	},
	```

	The theme automatically appears in the Settings theme picker — no other changes needed. The `applyTheme()` function reads all properties from the object and sets the corresponding CSS variables.

	Available CSS variables: `--theme-accent`, `--theme-accent-rgb`, `--theme-bg`, `--theme-hover-bg`, `--theme-border`, `--bg-primary`, `--bg-secondary`, `--bg-tertiary`, `--bg-input`, `--bg-hover`, `--bg-card`, `--text-primary`, `--text-secondary`, `--text-muted`, `--border-primary`, `--border-subtle`.

	## SSE Event Protocol

	All agents communicate via Server-Sent Events. Each event is a JSON object with a `type` field.

	\| Event \| Description \|
	\|-------\|-------------\|
	\| `done` \| Stream complete (required) \|
	\| `error` \| `{content}` — error message (required) \|
	\| `thinking` \| `{content}` — reasoning text \|
	\| `content` \| `{content}` — streamed response tokens \|
	\| `result` \| `{content, figures?}` — final output for command center \|
	\| `result_preview` \| Same as result, shown inline \|
	\| `retry` \| `{attempt, max_attempts, delay, message}` — retrying \|
	\| `debug_call_input` \| `{call_number, messages}` — LLM input (debug panel) \|
	\| `debug_call_output` \| `{call_number, response}` — LLM output (debug panel) \|
	\| `launch` \| `{agent_type, initial_message, task_id}` — spawn sub-agent \|
	\| `tool_start` \| `{tool, args}` — direct tool started \|
	\| `tool_result` \| `{tool, result}` — direct tool completed \|
	\| `code_start` \| `{code}` — code execution started \|
	\| `code` \| `{output, error, images}` — code execution result \|

	## Key Patterns & Conventions

	### Backend

	- Single source of truth: `AGENT_REGISTRY` in `backend/agents.py` defines all agent types. The frontend fetches it via `GET /api/agents` — never duplicate agent definitions.
	- LLM calls: Always use `call_llm()` from `agents.py` — it handles retries, abort checking, and emits `debug_call_input`/`debug_call_output` events for the debug panel.
	- Streaming pattern: Agent handlers are sync generators yielding event dicts. `_stream_sync_generator()` in `main.py` wraps them for async SSE delivery — never duplicate the async queue boilerplate.
	- Direct tools: `DIRECT_TOOL_REGISTRY` in `tools.py` maps tool name → `{schema, execute}`. `command.py` dispatches automatically.
	- Result nudging: When an agent finishes without `<result>` tags, `nudge_for_result()` in `agents.py` asks the LLM for a final answer. It uses `call_llm` internally.

	### Frontend

	- No build system: Plain `<script>` tags in `index.html`, no bundler. Files share `window` scope.
	- Load order matters: `utils.js` loads first (declares all globals), then other files. Cross-file function calls are fine because they happen at runtime, not parse time.
	- Global state lives in `utils.js`: `AGENT_REGISTRY`, `settings`, `activeTabId`, `tabCounter`, `timelineData`, `debugHistory`, `globalFigureRegistry`, etc.
	- Shared helpers (also in `utils.js`):
	- `setupInputListeners(container, tabId)` — wires textarea auto-resize, Enter-to-send, send button click
	- `setupCollapseToggle(cell, labelSelector)` — wires click-to-collapse on tool/code cells
	- `closeAllPanels()` — closes all right-side panels (settings, debug, files, sessions)
	- Markdown rendering: `parseMarkdown()` in `streaming.js` is the single entry point (marked + KaTeX + Prism).
	- Panel toggle pattern: Call `closeAllPanels()` first, then add `.active` to the panel being opened.
	- Workspace persistence: Changes auto-save via `saveWorkspaceDebounced()`. Tab state is serialized to JSON and posted to `/api/workspace`.
	- Cache busting: Bump `?v=N` query params in `index.html` when changing JS/CSS files.

	### Naming

	- Backend: `stream_<agent>_execution()` for the sync generator, `_stream_<agent>_inner()` for the async wrapper in `main.py`
	- Frontend: Agent types use short keys (`code`, `agent`, `research`, `image`, `command`)
	- CSS: `--theme-` for accent colors, `--bg-` / `--text-` / `--border-` for surface colors

	## Verification

	Verify backend imports: `python -c "from backend.command import stream_command_center"`