Spaces:

lenson78
/

hermes

Paused

App Files Files Community

hermes / website /docs /user-guide /features /api-server.md

lenson78

initial upload: v2026.3.23 with HF Spaces deployment

9aa5185 verified 7 days ago

preview code

raw

history blame contribute delete

7.88 kB

	---
	sidebar_position: 14
	title: "API Server"
	description: "Expose hermes-agent as an OpenAI-compatible API for any frontend"
	---

	# API Server

	The API server exposes hermes-agent as an OpenAI-compatible HTTP endpoint. Any frontend that speaks the OpenAI format — Open WebUI, LobeChat, LibreChat, NextChat, ChatBox, and hundreds more — can connect to hermes-agent and use it as a backend.

	Your agent handles requests with its full toolset (terminal, file operations, web search, memory, skills) and returns the final response. Tool calls execute invisibly server-side.

	## Quick Start

	### 1. Enable the API server

	Add to `~/.hermes/.env`:

	```bash
	API_SERVER_ENABLED=true
	API_SERVER_KEY=change-me-local-dev
	# Optional: only if a browser must call Hermes directly
	# API_SERVER_CORS_ORIGINS=http://localhost:3000
	```

	### 2. Start the gateway

	```bash
	hermes gateway
	```

	You'll see:

	```
	[API Server] API server listening on http://127.0.0.1:8642
	```

	### 3. Connect a frontend

	Point any OpenAI-compatible client at `http://localhost:8642/v1`:

	```bash
	# Test with curl
	curl http://localhost:8642/v1/chat/completions \
	-H "Authorization: Bearer change-me-local-dev" \
	-H "Content-Type: application/json" \
	-d '{"model": "hermes-agent", "messages": [{"role": "user", "content": "Hello!"}]}'
	```

	Or connect Open WebUI, LobeChat, or any other frontend — see the [Open WebUI integration guide](/docs/user-guide/messaging/open-webui) for step-by-step instructions.

	## Endpoints

	### POST /v1/chat/completions

	Standard OpenAI Chat Completions format. Stateless — the full conversation is included in each request via the `messages` array.

	Request:
	```json
	{
	"model": "hermes-agent",
	"messages": [
	{"role": "system", "content": "You are a Python expert."},
	{"role": "user", "content": "Write a fibonacci function"}
	],
	"stream": false
	}
	```

	Response:
	```json
	{
	"id": "chatcmpl-abc123",
	"object": "chat.completion",
	"created": 1710000000,
	"model": "hermes-agent",
	"choices": [{
	"index": 0,
	"message": {"role": "assistant", "content": "Here's a fibonacci function..."},
	"finish_reason": "stop"
	}],
	"usage": {"prompt_tokens": 50, "completion_tokens": 200, "total_tokens": 250}
	}
	```

	Streaming (`"stream": true`): Returns Server-Sent Events (SSE) with token-by-token response chunks. When streaming is enabled in config, tokens are emitted live as the LLM generates them. When disabled, the full response is sent as a single SSE chunk.

	### POST /v1/responses

	OpenAI Responses API format. Supports server-side conversation state via `previous_response_id` — the server stores full conversation history (including tool calls and results) so multi-turn context is preserved without the client managing it.

	Request:
	```json
	{
	"model": "hermes-agent",
	"input": "What files are in my project?",
	"instructions": "You are a helpful coding assistant.",
	"store": true
	}
	```

	Response:
	```json
	{
	"id": "resp_abc123",
	"object": "response",
	"status": "completed",
	"model": "hermes-agent",
	"output": [
	{"type": "function_call", "name": "terminal", "arguments": "{\"command\": \"ls\"}", "call_id": "call_1"},
	{"type": "function_call_output", "call_id": "call_1", "output": "README.md src/ tests/"},
	{"type": "message", "role": "assistant", "content": [{"type": "output_text", "text": "Your project has..."}]}
	],
	"usage": {"input_tokens": 50, "output_tokens": 200, "total_tokens": 250}
	}
	```

	#### Multi-turn with previous_response_id

	Chain responses to maintain full context (including tool calls) across turns:

	```json
	{
	"input": "Now show me the README",
	"previous_response_id": "resp_abc123"
	}
	```

	The server reconstructs the full conversation from the stored response chain — all previous tool calls and results are preserved.

	#### Named conversations

	Use the `conversation` parameter instead of tracking response IDs:

	```json
	{"input": "Hello", "conversation": "my-project"}
	{"input": "What's in src/?", "conversation": "my-project"}
	{"input": "Run the tests", "conversation": "my-project"}
	```

	The server automatically chains to the latest response in that conversation. Like the `/title` command for gateway sessions.

	### GET /v1/responses/\{id\}

	Retrieve a previously stored response by ID.

	### DELETE /v1/responses/\{id\}

	Delete a stored response.

	### GET /v1/models

	Lists `hermes-agent` as an available model. Required by most frontends for model discovery.

	### GET /health

	Health check. Returns `{"status": "ok"}`.

	## System Prompt Handling

	When a frontend sends a `system` message (Chat Completions) or `instructions` field (Responses API), hermes-agent layers it on top of its core system prompt. Your agent keeps all its tools, memory, and skills — the frontend's system prompt adds extra instructions.

	This means you can customize behavior per-frontend without losing capabilities:
	- Open WebUI system prompt: "You are a Python expert. Always include type hints."
	- The agent still has terminal, file tools, web search, memory, etc.

	## Authentication

	Bearer token auth via the `Authorization` header:

	```
	Authorization: Bearer ***
	```

	Configure the key via `API_SERVER_KEY` env var. If you need a browser to call Hermes directly, also set `API_SERVER_CORS_ORIGINS` to an explicit allowlist.

	:::warning Security
	The API server gives full access to hermes-agent's toolset, including terminal commands. If you change the bind address to `0.0.0.0` (network-accessible), always set `API_SERVER_KEY` and keep `API_SERVER_CORS_ORIGINS` narrow — without that, remote callers may be able to execute arbitrary commands on your machine.

	The default bind address (`127.0.0.1`) is for local-only use. Browser access is disabled by default; enable it only for explicit trusted origins.
	:::

	## Configuration

	### Environment Variables

	\| Variable \| Default \| Description \|
	\|----------\|---------\|-------------\|
	\| `API_SERVER_ENABLED` \| `false` \| Enable the API server \|
	\| `API_SERVER_PORT` \| `8642` \| HTTP server port \|
	\| `API_SERVER_HOST` \| `127.0.0.1` \| Bind address (localhost only by default) \|
	\| `API_SERVER_KEY` \| _(none)_ \| Bearer token for auth \|
	\| `API_SERVER_CORS_ORIGINS` \| _(none)_ \| Comma-separated allowed browser origins \|

	### config.yaml

	```yaml
	# Not yet supported — use environment variables.
	# config.yaml support coming in a future release.
	```

	## CORS

	The API server does not enable browser CORS by default.

	For direct browser access, set an explicit allowlist:

	```bash
	API_SERVER_CORS_ORIGINS=http://localhost:3000,http://127.0.0.1:3000
	```

	Most documented frontends such as Open WebUI connect server-to-server and do not need CORS at all.

	## Compatible Frontends

	Any frontend that supports the OpenAI API format works. Tested/documented integrations:

	\| Frontend \| Stars \| Connection \|
	\|----------\|-------\|------------\|
	\| [Open WebUI](/docs/user-guide/messaging/open-webui) \| 126k \| Full guide available \|
	\| LobeChat \| 73k \| Custom provider endpoint \|
	\| LibreChat \| 34k \| Custom endpoint in librechat.yaml \|
	\| AnythingLLM \| 56k \| Generic OpenAI provider \|
	\| NextChat \| 87k \| BASE_URL env var \|
	\| ChatBox \| 39k \| API Host setting \|
	\| Jan \| 26k \| Remote model config \|
	\| HF Chat-UI \| 8k \| OPENAI_BASE_URL \|
	\| big-AGI \| 7k \| Custom endpoint \|
	\| OpenAI Python SDK \| — \| `OpenAI(base_url="http://localhost:8642/v1")` \|
	\| curl \| — \| Direct HTTP requests \|

	## Limitations

	- Response storage — stored responses (for `previous_response_id`) are persisted in SQLite and survive gateway restarts. Max 100 stored responses (LRU eviction).
	- No file upload — vision/document analysis via uploaded files is not yet supported through the API.
	- Model field is cosmetic — the `model` field in requests is accepted but the actual LLM model used is configured server-side in config.yaml.