sidebar_position: 14
title: API Server
description: Expose hermes-agent as an OpenAI-compatible API for any frontend
API Server
The API server exposes hermes-agent as an OpenAI-compatible HTTP endpoint. Any frontend that speaks the OpenAI format β Open WebUI, LobeChat, LibreChat, NextChat, ChatBox, and hundreds more β can connect to hermes-agent and use it as a backend.
Your agent handles requests with its full toolset (terminal, file operations, web search, memory, skills) and returns the final response. Tool calls execute invisibly server-side.
Quick Start
1. Enable the API server
Add to ~/.hermes/.env:
API_SERVER_ENABLED=true
API_SERVER_KEY=change-me-local-dev
# Optional: only if a browser must call Hermes directly
# API_SERVER_CORS_ORIGINS=http://localhost:3000
2. Start the gateway
hermes gateway
You'll see:
[API Server] API server listening on http://127.0.0.1:8642
3. Connect a frontend
Point any OpenAI-compatible client at http://localhost:8642/v1:
# Test with curl
curl http://localhost:8642/v1/chat/completions \
-H "Authorization: Bearer change-me-local-dev" \
-H "Content-Type: application/json" \
-d '{"model": "hermes-agent", "messages": [{"role": "user", "content": "Hello!"}]}'
Or connect Open WebUI, LobeChat, or any other frontend β see the Open WebUI integration guide for step-by-step instructions.
Endpoints
POST /v1/chat/completions
Standard OpenAI Chat Completions format. Stateless β the full conversation is included in each request via the messages array.
Request:
{
"model": "hermes-agent",
"messages": [
{"role": "system", "content": "You are a Python expert."},
{"role": "user", "content": "Write a fibonacci function"}
],
"stream": false
}
Response:
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1710000000,
"model": "hermes-agent",
"choices": [{
"index": 0,
"message": {"role": "assistant", "content": "Here's a fibonacci function..."},
"finish_reason": "stop"
}],
"usage": {"prompt_tokens": 50, "completion_tokens": 200, "total_tokens": 250}
}
Streaming ("stream": true): Returns Server-Sent Events (SSE) with token-by-token response chunks. When streaming is enabled in config, tokens are emitted live as the LLM generates them. When disabled, the full response is sent as a single SSE chunk.
POST /v1/responses
OpenAI Responses API format. Supports server-side conversation state via previous_response_id β the server stores full conversation history (including tool calls and results) so multi-turn context is preserved without the client managing it.
Request:
{
"model": "hermes-agent",
"input": "What files are in my project?",
"instructions": "You are a helpful coding assistant.",
"store": true
}
Response:
{
"id": "resp_abc123",
"object": "response",
"status": "completed",
"model": "hermes-agent",
"output": [
{"type": "function_call", "name": "terminal", "arguments": "{\"command\": \"ls\"}", "call_id": "call_1"},
{"type": "function_call_output", "call_id": "call_1", "output": "README.md src/ tests/"},
{"type": "message", "role": "assistant", "content": [{"type": "output_text", "text": "Your project has..."}]}
],
"usage": {"input_tokens": 50, "output_tokens": 200, "total_tokens": 250}
}
Multi-turn with previous_response_id
Chain responses to maintain full context (including tool calls) across turns:
{
"input": "Now show me the README",
"previous_response_id": "resp_abc123"
}
The server reconstructs the full conversation from the stored response chain β all previous tool calls and results are preserved.
Named conversations
Use the conversation parameter instead of tracking response IDs:
{"input": "Hello", "conversation": "my-project"}
{"input": "What's in src/?", "conversation": "my-project"}
{"input": "Run the tests", "conversation": "my-project"}
The server automatically chains to the latest response in that conversation. Like the /title command for gateway sessions.
GET /v1/responses/{id}
Retrieve a previously stored response by ID.
DELETE /v1/responses/{id}
Delete a stored response.
GET /v1/models
Lists hermes-agent as an available model. Required by most frontends for model discovery.
GET /health
Health check. Returns {"status": "ok"}.
System Prompt Handling
When a frontend sends a system message (Chat Completions) or instructions field (Responses API), hermes-agent layers it on top of its core system prompt. Your agent keeps all its tools, memory, and skills β the frontend's system prompt adds extra instructions.
This means you can customize behavior per-frontend without losing capabilities:
- Open WebUI system prompt: "You are a Python expert. Always include type hints."
- The agent still has terminal, file tools, web search, memory, etc.
Authentication
Bearer token auth via the Authorization header:
Authorization: Bearer ***
Configure the key via API_SERVER_KEY env var. If you need a browser to call Hermes directly, also set API_SERVER_CORS_ORIGINS to an explicit allowlist.
:::warning Security
The API server gives full access to hermes-agent's toolset, including terminal commands. If you change the bind address to 0.0.0.0 (network-accessible), always set API_SERVER_KEY and keep API_SERVER_CORS_ORIGINS narrow β without that, remote callers may be able to execute arbitrary commands on your machine.
The default bind address (127.0.0.1) is for local-only use. Browser access is disabled by default; enable it only for explicit trusted origins.
:::
Configuration
Environment Variables
| Variable | Default | Description |
|---|---|---|
API_SERVER_ENABLED |
false |
Enable the API server |
API_SERVER_PORT |
8642 |
HTTP server port |
API_SERVER_HOST |
127.0.0.1 |
Bind address (localhost only by default) |
API_SERVER_KEY |
(none) | Bearer token for auth |
API_SERVER_CORS_ORIGINS |
(none) | Comma-separated allowed browser origins |
config.yaml
# Not yet supported β use environment variables.
# config.yaml support coming in a future release.
CORS
The API server does not enable browser CORS by default.
For direct browser access, set an explicit allowlist:
API_SERVER_CORS_ORIGINS=http://localhost:3000,http://127.0.0.1:3000
Most documented frontends such as Open WebUI connect server-to-server and do not need CORS at all.
Compatible Frontends
Any frontend that supports the OpenAI API format works. Tested/documented integrations:
| Frontend | Stars | Connection |
|---|---|---|
| Open WebUI | 126k | Full guide available |
| LobeChat | 73k | Custom provider endpoint |
| LibreChat | 34k | Custom endpoint in librechat.yaml |
| AnythingLLM | 56k | Generic OpenAI provider |
| NextChat | 87k | BASE_URL env var |
| ChatBox | 39k | API Host setting |
| Jan | 26k | Remote model config |
| HF Chat-UI | 8k | OPENAI_BASE_URL |
| big-AGI | 7k | Custom endpoint |
| OpenAI Python SDK | β | OpenAI(base_url="http://localhost:8642/v1") |
| curl | β | Direct HTTP requests |
Limitations
- Response storage β stored responses (for
previous_response_id) are persisted in SQLite and survive gateway restarts. Max 100 stored responses (LRU eviction). - No file upload β vision/document analysis via uploaded files is not yet supported through the API.
- Model field is cosmetic β the
modelfield in requests is accepted but the actual LLM model used is configured server-side in config.yaml.