| # LLM Router | |
| Chat UI includes an intelligent routing system that automatically selects the best model for each request. When enabled, users see a virtual "Omni" model that routes to specialized models based on the conversation context. | |
| The router uses [katanemo/Arch-Router-1.5B](https://huggingface.co/katanemo/Arch-Router-1.5B) for route selection. | |
| ## Configuration | |
| ### Basic Setup | |
| ```ini | |
| # Arch router endpoint (OpenAI-compatible) | |
| LLM_ROUTER_ARCH_BASE_URL=https://router.huggingface.co/v1 | |
| LLM_ROUTER_ARCH_MODEL=katanemo/Arch-Router-1.5B | |
| # Path to your routes policy JSON | |
| LLM_ROUTER_ROUTES_PATH=./config/routes.json | |
| ``` | |
| ### Routes Policy | |
| Create a JSON file defining your routes. Each route specifies: | |
| ```json | |
| [ | |
| { | |
| "name": "coding", | |
| "description": "Programming, debugging, code review", | |
| "primary_model": "Qwen/Qwen3-Coder-480B-A35B-Instruct", | |
| "fallback_models": ["meta-llama/Llama-3.3-70B-Instruct"] | |
| }, | |
| { | |
| "name": "casual_conversation", | |
| "description": "General chat, questions, explanations", | |
| "primary_model": "meta-llama/Llama-3.3-70B-Instruct" | |
| } | |
| ] | |
| ``` | |
| ### Fallback Behavior | |
| ```ini | |
| # Route to use when Arch returns "other" | |
| LLM_ROUTER_OTHER_ROUTE=casual_conversation | |
| # Model to use if Arch selection fails entirely | |
| LLM_ROUTER_FALLBACK_MODEL=meta-llama/Llama-3.3-70B-Instruct | |
| # Selection timeout (milliseconds) | |
| LLM_ROUTER_ARCH_TIMEOUT_MS=10000 | |
| ``` | |
| ## Multimodal Routing | |
| When a user sends an image, the router can bypass Arch and route directly to a vision model: | |
| ```ini | |
| LLM_ROUTER_ENABLE_MULTIMODAL=true | |
| LLM_ROUTER_MULTIMODAL_MODEL=meta-llama/Llama-3.2-90B-Vision-Instruct | |
| ``` | |
| ## Tools Routing | |
| When a user has MCP servers enabled, the router can automatically select a tools-capable model: | |
| ```ini | |
| LLM_ROUTER_ENABLE_TOOLS=true | |
| LLM_ROUTER_TOOLS_MODEL=meta-llama/Llama-3.3-70B-Instruct | |
| ``` | |
| ## UI Customization | |
| Customize how the router appears in the model selector: | |
| ```ini | |
| PUBLIC_LLM_ROUTER_ALIAS_ID=omni | |
| PUBLIC_LLM_ROUTER_DISPLAY_NAME=Omni | |
| PUBLIC_LLM_ROUTER_LOGO_URL=https://example.com/logo.png | |
| ``` | |
| ## How It Works | |
| When a user selects Omni: | |
| 1. Chat UI sends the conversation context to the Arch router | |
| 2. Arch analyzes the content and returns a route name | |
| 3. Chat UI maps the route to the corresponding model | |
| 4. The request streams from the selected model | |
| 5. On errors, fallback models are tried in order | |
| The route selection is displayed in the UI so users can see which model was chosen. | |
| ## Message Length Limits | |
| To optimize router performance, message content is trimmed before sending to Arch: | |
| ```ini | |
| # Max characters for assistant messages (default: 500) | |
| LLM_ROUTER_MAX_ASSISTANT_LENGTH=500 | |
| # Max characters for previous user messages (default: 400) | |
| LLM_ROUTER_MAX_PREV_USER_LENGTH=400 | |
| ``` | |
| The latest user message is never trimmed. | |