--- sidebar_position: 14 title: "API Server" description: "Expose hermes-agent as an OpenAI-compatible API for any frontend" --- # API Server The API server exposes hermes-agent as an OpenAI-compatible HTTP endpoint. Any frontend that speaks the OpenAI format — Open WebUI, LobeChat, LibreChat, NextChat, ChatBox, and hundreds more — can connect to hermes-agent and use it as a backend. Your agent handles requests with its full toolset (terminal, file operations, web search, memory, skills) and returns the final response. Tool calls execute invisibly server-side. ## Quick Start ### 1. Enable the API server Add to `~/.hermes/.env`: ```bash API_SERVER_ENABLED=true API_SERVER_KEY=change-me-local-dev # Optional: only if a browser must call Hermes directly # API_SERVER_CORS_ORIGINS=http://localhost:3000 ``` ### 2. Start the gateway ```bash hermes gateway ``` You'll see: ``` [API Server] API server listening on http://127.0.0.1:8642 ``` ### 3. Connect a frontend Point any OpenAI-compatible client at `http://localhost:8642/v1`: ```bash # Test with curl curl http://localhost:8642/v1/chat/completions \ -H "Authorization: Bearer change-me-local-dev" \ -H "Content-Type: application/json" \ -d '{"model": "hermes-agent", "messages": [{"role": "user", "content": "Hello!"}]}' ``` Or connect Open WebUI, LobeChat, or any other frontend — see the [Open WebUI integration guide](/docs/user-guide/messaging/open-webui) for step-by-step instructions. ## Endpoints ### POST /v1/chat/completions Standard OpenAI Chat Completions format. Stateless — the full conversation is included in each request via the `messages` array. **Request:** ```json { "model": "hermes-agent", "messages": [ {"role": "system", "content": "You are a Python expert."}, {"role": "user", "content": "Write a fibonacci function"} ], "stream": false } ``` **Response:** ```json { "id": "chatcmpl-abc123", "object": "chat.completion", "created": 1710000000, "model": "hermes-agent", "choices": [{ "index": 0, "message": {"role": "assistant", "content": "Here's a fibonacci function..."}, "finish_reason": "stop" }], "usage": {"prompt_tokens": 50, "completion_tokens": 200, "total_tokens": 250} } ``` **Streaming** (`"stream": true`): Returns Server-Sent Events (SSE) with token-by-token response chunks. When streaming is enabled in config, tokens are emitted live as the LLM generates them. When disabled, the full response is sent as a single SSE chunk. ### POST /v1/responses OpenAI Responses API format. Supports server-side conversation state via `previous_response_id` — the server stores full conversation history (including tool calls and results) so multi-turn context is preserved without the client managing it. **Request:** ```json { "model": "hermes-agent", "input": "What files are in my project?", "instructions": "You are a helpful coding assistant.", "store": true } ``` **Response:** ```json { "id": "resp_abc123", "object": "response", "status": "completed", "model": "hermes-agent", "output": [ {"type": "function_call", "name": "terminal", "arguments": "{\"command\": \"ls\"}", "call_id": "call_1"}, {"type": "function_call_output", "call_id": "call_1", "output": "README.md src/ tests/"}, {"type": "message", "role": "assistant", "content": [{"type": "output_text", "text": "Your project has..."}]} ], "usage": {"input_tokens": 50, "output_tokens": 200, "total_tokens": 250} } ``` #### Multi-turn with previous_response_id Chain responses to maintain full context (including tool calls) across turns: ```json { "input": "Now show me the README", "previous_response_id": "resp_abc123" } ``` The server reconstructs the full conversation from the stored response chain — all previous tool calls and results are preserved. #### Named conversations Use the `conversation` parameter instead of tracking response IDs: ```json {"input": "Hello", "conversation": "my-project"} {"input": "What's in src/?", "conversation": "my-project"} {"input": "Run the tests", "conversation": "my-project"} ``` The server automatically chains to the latest response in that conversation. Like the `/title` command for gateway sessions. ### GET /v1/responses/\{id\} Retrieve a previously stored response by ID. ### DELETE /v1/responses/\{id\} Delete a stored response. ### GET /v1/models Lists `hermes-agent` as an available model. Required by most frontends for model discovery. ### GET /health Health check. Returns `{"status": "ok"}`. ## System Prompt Handling When a frontend sends a `system` message (Chat Completions) or `instructions` field (Responses API), hermes-agent **layers it on top** of its core system prompt. Your agent keeps all its tools, memory, and skills — the frontend's system prompt adds extra instructions. This means you can customize behavior per-frontend without losing capabilities: - Open WebUI system prompt: "You are a Python expert. Always include type hints." - The agent still has terminal, file tools, web search, memory, etc. ## Authentication Bearer token auth via the `Authorization` header: ``` Authorization: Bearer *** ``` Configure the key via `API_SERVER_KEY` env var. If you need a browser to call Hermes directly, also set `API_SERVER_CORS_ORIGINS` to an explicit allowlist. :::warning Security The API server gives full access to hermes-agent's toolset, **including terminal commands**. If you change the bind address to `0.0.0.0` (network-accessible), **always set `API_SERVER_KEY`** and keep `API_SERVER_CORS_ORIGINS` narrow — without that, remote callers may be able to execute arbitrary commands on your machine. The default bind address (`127.0.0.1`) is for local-only use. Browser access is disabled by default; enable it only for explicit trusted origins. ::: ## Configuration ### Environment Variables | Variable | Default | Description | |----------|---------|-------------| | `API_SERVER_ENABLED` | `false` | Enable the API server | | `API_SERVER_PORT` | `8642` | HTTP server port | | `API_SERVER_HOST` | `127.0.0.1` | Bind address (localhost only by default) | | `API_SERVER_KEY` | _(none)_ | Bearer token for auth | | `API_SERVER_CORS_ORIGINS` | _(none)_ | Comma-separated allowed browser origins | ### config.yaml ```yaml # Not yet supported — use environment variables. # config.yaml support coming in a future release. ``` ## CORS The API server does **not** enable browser CORS by default. For direct browser access, set an explicit allowlist: ```bash API_SERVER_CORS_ORIGINS=http://localhost:3000,http://127.0.0.1:3000 ``` Most documented frontends such as Open WebUI connect server-to-server and do not need CORS at all. ## Compatible Frontends Any frontend that supports the OpenAI API format works. Tested/documented integrations: | Frontend | Stars | Connection | |----------|-------|------------| | [Open WebUI](/docs/user-guide/messaging/open-webui) | 126k | Full guide available | | LobeChat | 73k | Custom provider endpoint | | LibreChat | 34k | Custom endpoint in librechat.yaml | | AnythingLLM | 56k | Generic OpenAI provider | | NextChat | 87k | BASE_URL env var | | ChatBox | 39k | API Host setting | | Jan | 26k | Remote model config | | HF Chat-UI | 8k | OPENAI_BASE_URL | | big-AGI | 7k | Custom endpoint | | OpenAI Python SDK | — | `OpenAI(base_url="http://localhost:8642/v1")` | | curl | — | Direct HTTP requests | ## Limitations - **Response storage** — stored responses (for `previous_response_id`) are persisted in SQLite and survive gateway restarts. Max 100 stored responses (LRU eviction). - **No file upload** — vision/document analysis via uploaded files is not yet supported through the API. - **Model field is cosmetic** — the `model` field in requests is accepted but the actual LLM model used is configured server-side in config.yaml.