prompt-prix / docs /ARCHITECTURE.md
3v324v23's picture
chore: Update documentation
9eb149e
# Architecture
This document describes the system architecture of prompt-prix, including module responsibilities, data flow, and key design decisions.
## System Overview
```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Browser β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Gradio UI (ui.py) β”‚ β”‚
β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚
β”‚ β”‚ β”‚ Config Panelβ”‚ β”‚ Prompt Input β”‚ β”‚ Model Output Tabs β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β€’ Servers β”‚ β”‚ β€’ Single β”‚ β”‚ β€’ Tab 1..10 β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β€’ Models β”‚ β”‚ β€’ Batch β”‚ β”‚ β€’ Streaming display β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β€’ System β”‚ β”‚ β€’ Tools JSON β”‚ β”‚ β€’ Status colors β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ Prompt β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚
β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚
β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚
β”‚ β”‚ β”‚ localStorage: servers, models, temperature, etc. β”‚ β”‚ β”‚
β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Python Backend β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ handlers.py β”‚ β”‚
β”‚ β”‚ β€’ fetch_available_models() β†’ ServerPool.refresh_manifests() β”‚ β”‚
β”‚ β”‚ β€’ initialize_session() β†’ Create ComparisonSession β”‚ β”‚
β”‚ β”‚ β€’ send_single_prompt() β†’ Work-stealing dispatcher β”‚ β”‚
β”‚ β”‚ β€’ export_markdown/json() β†’ Report generation β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ core.py β”‚ β”‚ β”‚
β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚
β”‚ β”‚ β”‚ ServerPool │◄───┴───►│ ComparisonSession β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β€’ servers: dict β”‚ β”‚ β€’ state: SessionState β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β€’ refresh_manifest β”‚ β”‚ β€’ send_prompt_to_model β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β€’ acquire/release β”‚ β”‚ β€’ get_context_display β”‚ β”‚ β”‚
β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚
β”‚ β”‚ β”‚ stream_completion() / get_completion() β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β€’ Async HTTP streaming to LM Studio β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β€’ Yields text chunks or returns full response β”‚ β”‚ β”‚
β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ config.py β”‚ β”‚
β”‚ β”‚ Pydantic Models: ServerConfig, ModelContext, SessionState β”‚ β”‚
β”‚ β”‚ Constants: DEFAULT_TEMPERATURE, DEFAULT_MAX_TOKENS, etc. β”‚ β”‚
β”‚ β”‚ Environment: load_servers_from_env(), get_gradio_port() β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ LM Studio Servers β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Server 1 (e.g. 3090) β”‚ β”‚ Server 2 (e.g. 8000) β”‚ β”‚
β”‚ β”‚ β€’ GET /v1/models β”‚ β”‚ β€’ GET /v1/models β”‚ β”‚
β”‚ β”‚ β€’ POST /v1/chat/... β”‚ β”‚ β€’ POST /v1/chat/... β”‚ β”‚
β”‚ β”‚ └─ Model A β”‚ β”‚ └─ Model B, C β”‚ β”‚
β”‚ β”‚ └─ Model B β”‚ β”‚ β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
## Module Breakdown
### Directory Structure
```
prompt_prix/
β”œβ”€β”€ main.py # Entry point
β”œβ”€β”€ ui.py # Gradio UI definition
β”œβ”€β”€ handlers.py # Shared event handlers (fetch, stop)
β”œβ”€β”€ state.py # Global mutable state
β”œβ”€β”€ core.py # ServerPool, ComparisonSession, streaming
β”œβ”€β”€ config.py # Pydantic models, constants, env loading
β”œβ”€β”€ parsers.py # Input parsing utilities
β”œβ”€β”€ export.py # Report generation
β”œβ”€β”€ dispatcher.py # WorkStealingDispatcher for parallel execution
β”œβ”€β”€ battery.py # BatteryRunner, TestResult, BatteryRun
β”œβ”€β”€ tabs/
β”‚ β”œβ”€β”€ __init__.py
β”‚ β”œβ”€β”€ battery/
β”‚ β”‚ β”œβ”€β”€ __init__.py
β”‚ β”‚ └── handlers.py # Battery-specific handlers
β”‚ └── compare/
β”‚ β”œβ”€β”€ __init__.py
β”‚ └── handlers.py # Compare-specific handlers
β”œβ”€β”€ adapters/
β”‚ └── lmstudio.py # LMStudioAdapter
└── benchmarks/
β”œβ”€β”€ base.py # TestCase protocol
└── custom_json.py # CustomJSONLoader
```
### config.py - Configuration & Data Models
**Purpose**: Define all Pydantic models for type-safe configuration and state.
| Class | Purpose |
|-------|---------|
| `ServerConfig` | Single LM Studio server state (URL, available_models, is_busy) |
| `ModelConfig` | Model identity and display name |
| `Message` | Single message in a conversation (role, content - supports multimodal) |
| `ModelContext` | Complete conversation history for one model |
| `SessionState` | Full session: models, contexts, system_prompt, halted status |
**Message Multimodal Support**:
The `Message` model supports both text and multimodal content:
```python
# Text-only message
Message(role="user", content="Hello")
# Multimodal message (text + image)
Message(role="user", content=[
{"type": "text", "text": "What's in this image?"},
{"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}}
])
# Helper methods
msg.get_text() # Extract text content
msg.has_image() # Check if message contains an image
```
**Key Functions**:
- `load_servers_from_env()` - Read LM_STUDIO_SERVER_N environment variables
- `get_default_servers()` - Return env servers or placeholder defaults
- `get_gradio_port()` - Read GRADIO_PORT or default to 7860
- `get_fara_config()` - Read FARA_SERVER_URL and FARA_MODEL_ID for vision adapter
- `encode_image_to_data_url(path)` - Convert image file to base64 data URL
- `build_multimodal_content(text, image_path)` - Build OpenAI-format multimodal content
### core.py - Server Pool & Session Management
**Purpose**: Core business logic for server management and model interactions.
#### ServerPool
Manages multiple LM Studio servers:
```python
class ServerPool:
servers: dict[str, ServerConfig] # URL -> config
_locks: dict[str, asyncio.Lock] # URL -> lock
async def refresh_all_manifests() # GET /v1/models on all servers
def find_available_server(model_id) -> Optional[str] # Find idle server with model
async def acquire_server(url) # Mark busy, acquire lock
def release_server(url) # Mark available, release lock
```
#### ComparisonSession
Manages a comparison session:
```python
class ComparisonSession:
server_pool: ServerPool
state: SessionState # Contains models, contexts, config
async def send_prompt_to_model(model_id, prompt, on_chunk=None)
async def send_prompt_to_all(prompt, on_chunk=None)
def get_context_display(model_id) -> str
```
#### Streaming Functions
```python
async def stream_completion(
server_url, model_id, messages, temperature, max_tokens,
timeout_seconds, tools=None, seed=None, repeat_penalty=None
) -> AsyncGenerator[str, None]:
"""Yields text chunks as they arrive via SSE.
Args:
seed: Optional int for reproducible outputs (passed to model API)
repeat_penalty: Optional float to penalize repeated tokens (1.0 = off)
"""
async def get_completion(...) -> str:
"""Non-streaming version, returns full response."""
```
### handlers.py - Shared Event Handlers
**Purpose**: Shared async handlers used across multiple tabs.
| Handler | Purpose | Returns |
|---------|---------|---------|
| `fetch_available_models(servers_text)` | Query all servers for available models | `(status, gr.update(choices=[...]))` |
| `handle_stop()` | Signal cancellation via global state | `status` |
| `_init_pool_and_validate(servers_text, models)` | Initialize ServerPool and validate models | `(pool, error_message)` |
### tabs/battery/handlers.py - Battery Tab Handlers
**Purpose**: Handlers specific to the Battery (benchmark) tab.
| Handler | Trigger | Returns |
|---------|---------|---------|
| `validate_file(file_path)` | File upload | Validation status string |
| `get_test_ids(file_path)` | File upload | List of test IDs |
| `run_handler(file, models, servers, ...)` | "Run Battery" button | Generator yielding `(status, grid_df)` |
| `quick_prompt_handler(prompt, models, ...)` | "Run Prompt" button | Markdown results |
| `export_json()` | "Export JSON" button | `(status, preview)` |
| `export_csv()` | "Export CSV" button | `(status, preview)` |
| `get_cell_detail(model, test)` | Detail dropdown | Markdown detail |
| `refresh_grid(display_mode)` | Display mode change | Updated grid DataFrame |
### tabs/compare/handlers.py - Compare Tab Handlers
**Purpose**: Handlers specific to the Compare (interactive) tab.
| Handler | Trigger | Returns |
|---------|---------|---------|
| `initialize_session(servers, models, system_prompt, ...)` | Auto-init on send | `(status, *model_tabs)` |
| `send_single_prompt(prompt, tools_json, image_path, seed, repeat_penalty)` | "Send to All" button | Generator yielding `(status, tab_states, *model_outputs)` |
| `export_markdown()` | "Export Markdown" button | `(status, preview)` |
| `export_json()` | "Export JSON" button | `(status, preview)` |
| `launch_beyond_compare(model_a, model_b)` | "Open in Beyond Compare" button | `status` |
**Compare Tab Features**:
- **Image Attachment**: Upload images for vision models (encoded as base64 data URLs)
- **Seed Parameter**: Set a seed for reproducible outputs across models
- **Repeat Penalty**: Configurable penalty (1.0-2.0) to reduce repetitive token generation
### dispatcher.py - Work-Stealing Dispatcher
**Purpose**: Parallel execution across multiple servers with work-stealing.
```python
class WorkStealingDispatcher:
"""Dispatches work items to servers using work-stealing pattern."""
async def dispatch(
self,
work_items: list[WorkItem],
execute_fn: Callable[[WorkItem, str], Coroutine],
on_progress: Optional[Callable[[str, str], None]] = None
) -> dict[str, Any]:
"""Execute work items in parallel across available servers."""
```
The dispatcher:
1. Maintains a queue of work items (model + test case pairs)
2. Finds idle servers that can run each work item
3. Executes items in parallel across all available servers
4. Supports cooperative cancellation via `state.should_stop()`
### ui.py - Gradio UI Definition
**Purpose**: Define all Gradio components and wire up event bindings.
**Key Components**:
| Component | Type | Purpose |
|-----------|------|---------|
| `servers_input` | Textbox | LM Studio server URLs (one per line) |
| `models_checkboxes` | CheckboxGroup | Select models to compare |
| `system_prompt_input` | Textbox (50 lines) | Editable system prompt |
| `temperature_slider` | Slider | Model temperature (0-2) |
| `timeout_slider` | Slider | Request timeout (30-600s) |
| `max_tokens_slider` | Slider | Max tokens (256-8192) |
| `seed_input` | Number | Optional seed for reproducible outputs |
| `repeat_penalty_slider` | Slider | Repeat penalty (1.0-2.0, default 1.1) |
| `prompt_input` | Textbox | User prompt entry |
| `image_input` | Image | Optional image attachment for vision models |
| `tools_input` | Code (JSON) | Tools for function calling |
| `model_outputs[0..9]` | Markdown | Model response tabs |
| `tab_states` | JSON (hidden) | Tab status for color updates |
**Event Bindings**:
- Buttons trigger async handlers
- `tab_states.change` triggers JavaScript for inline style updates
- `app.load` restores state from localStorage
### state.py - Global State
**Purpose**: Holds mutable state shared across handlers.
```python
server_pool: Optional[ServerPool] = None
session: Optional[ComparisonSession] = None
```
**Design Decision**: Separated to avoid circular imports between ui.py and handlers.py.
### parsers.py - Text Parsing Utilities
**Purpose**: Parse user input from UI components.
| Function | Input | Output |
|----------|-------|--------|
| `parse_models_input(text)` | "model1\nmodel2" | `["model1", "model2"]` |
| `parse_servers_input(text)` | "http://...\nhttp://..." | `["http://...", "http://..."]` |
| `parse_prompts_file(content)` | File content | List of prompts |
| `load_system_prompt(file_path)` | Optional file path | System prompt string |
| `get_default_system_prompt()` | - | Default prompt from file or constant |
### export.py - Report Generation
**Purpose**: Generate exportable reports from session state.
```python
def generate_markdown_report(state: SessionState) -> str:
"""Create Markdown with header, system prompt, and all model conversations."""
def generate_json_report(state: SessionState) -> str:
"""Create structured JSON with configuration and conversations."""
def save_report(content: str, filepath: str):
"""Write report to file."""
```
### main.py - Entry Point
**Purpose**: Application entry point and backwards-compatibility exports.
```python
def run():
app = create_app()
app.launch(server_name="0.0.0.0", server_port=get_gradio_port())
```
## Data Flow: Sending a Prompt
```
1. User types prompt, clicks "Send Prompt"
β”‚
β–Ό
2. ui.py: send_button.click(fn=send_single_prompt, inputs=[prompt, tools])
β”‚
β–Ό
3. handlers.py: send_single_prompt(prompt, tools_json)
β”‚ - Validate session exists
β”‚ - Parse tools JSON
β”‚ - Add user message to all model contexts
β”‚ - Refresh server manifests
β”‚
β–Ό
4. Work-Stealing Dispatcher Loop:
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ β”‚ For each idle server: β”‚
β”‚ β”‚ Find model in queue this server has β”‚
β”‚ β”‚ If found: start async task β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ β”‚ await asyncio.sleep(0.1)
β”‚ β”‚ yield (status, tab_states, *outputs) ──────► UI updates
β”‚ β”‚ Clean up completed tasks
β”‚ └─────────── while queue or active_tasks
β”‚
β–Ό
5. Each async task: run_model_on_server(model_id, server_url)
β”‚ - Mark model as "streaming"
β”‚ - Call stream_completion() ───────────────────► LM Studio API
β”‚ - Accumulate chunks in streaming_responses[model_id]
β”‚ - On complete: add assistant message to context
β”‚ - Release server
β”‚
β–Ό
6. Final yield: ("βœ… All responses complete", final_states, *final_outputs)
```
## State Management
### Session State (Python)
```python
SessionState:
models: list[str] # Selected models
contexts: dict[str, ModelContext] # model_id -> conversation
system_prompt: str
temperature: float
timeout_seconds: int
max_tokens: int
halted: bool # True if any model failed
halt_reason: Optional[str]
```
### UI State (Browser localStorage)
| Key | Type | Purpose |
|-----|------|---------|
| `promptprix_servers` | string | Server URLs (newline-separated) |
| `promptprix_model_choices` | JSON array | Available models from last fetch |
| `promptprix_models` | JSON array | Selected models |
| `promptprix_temperature` | float | Temperature setting |
| `promptprix_timeout` | int | Timeout setting |
| `promptprix_max_tokens` | int | Max tokens setting |
| `promptprix_tools` | string | Tools JSON |
| `promptprix_system_prompt` | string | System prompt text |
**Persistence**: Only saved when user clicks "Save State" button (explicit save).
## Tab Status Visualization
Tab colors indicate model status during streaming:
| Status | Color | Border |
|--------|-------|--------|
| `pending` | Red gradient (#fee2e2 β†’ #fecaca) | 4px solid #ef4444 |
| `streaming` | Yellow gradient (#fef3c7 β†’ #fde68a) | 4px solid #f59e0b |
| `completed` | Green gradient (#d1fae5 β†’ #a7f3d0) | 4px solid #10b981 |
**Implementation**: Uses inline JavaScript styles (`element.style`) to overcome Gradio theme CSS.
## Error Handling
### Fail-Fast Validation
1. `initialize_session` validates:
- Servers are configured
- Models are configured
- All selected models exist on at least one server
2. `send_single_prompt` validates:
- Session is initialized
- Session is not halted
- Prompt is not empty
- Tools JSON is valid (if provided)
### Halt-on-Error
If any model fails during `send_prompt_to_all`:
- `state.halted = True`
- `state.halt_reason = "Model {model_id} failed: {error}"`
- Subsequent prompts are rejected
### Human-Readable Errors
The `LMStudioError` exception extracts error messages from LM Studio's JSON responses:
```python
{"error": {"message": "Model not loaded"}} β†’ "Model not loaded"
```
## Integration Points
### Upstream: Benchmark Sources
prompt-prix can consume test cases from established benchmark ecosystems:
| Source | Format | Usage |
|--------|--------|-------|
| **promptfoo** | YAML with assertions | Full eval format with pass/fail criteria |
| **Inspect AI** | Python test definitions | Export prompts, import as JSON |
| **Custom JSON** | OpenAI-compatible messages | Direct load in prompt-prix |
See [ADR-001](adr/completed/001-use-existing-benchmarks.md) for rationale.
### API Layer: OpenAI-Compatible
All inference servers must expose OpenAI-compatible endpoints:
```
GET /v1/models β†’ List available models
POST /v1/chat/completions β†’ Chat completion (streaming)
```
Supported servers:
- LM Studio (native)
- Ollama (OpenAI mode)
- vLLM
- llama.cpp server
- Any OpenAI-compatible proxy
See [ADR-003](adr/completed/003-openai-compatible-api.md) for rationale.
## Fan-Out Dispatcher Pattern
The core abstraction is **fan-out**: one prompt dispatched to N models in parallel.
```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Fan-Out Dispatcher β”‚
β”‚ β”‚
β”‚ Input: (prompt, [model_a, model_b, model_c]) β”‚
β”‚ β”‚ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β–Ό β–Ό β–Ό β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Model A β”‚ β”‚ Model B β”‚ β”‚ Model C β”‚ β”‚
β”‚ β”‚ Server1 β”‚ β”‚ Server1 β”‚ β”‚ Server2 β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β–Ό β–Ό β–Ό β”‚
β”‚ Response A Response B Response C β”‚
β”‚ β”‚
β”‚ Output: {model_a: resp_a, model_b: resp_b, model_c: resp_c}β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
### Work-Stealing Implementation
The dispatcher uses work-stealing for GPU efficiency:
1. **Queue**: All models to process
2. **Acquire**: Find idle server that has queued model
3. **Execute**: Stream response, update UI
4. **Release**: Server becomes available for next model
This maximizes utilization when models are distributed across multiple GPUs.
See [ADR-002](adr/completed/002-fan-out-pattern-as-core.md) for rationale.
## Architecture Decision Records
| ADR | Decision |
|-----|----------|
| [001](adr/completed/001-use-existing-benchmarks.md) | Use existing benchmarks (promptfoo, Inspect AI) instead of custom eval schema |
| [002](adr/completed/002-fan-out-pattern-as-core.md) | Fan-out pattern as core architectural abstraction |
| [003](adr/completed/003-openai-compatible-api.md) | OpenAI-compatible API as sole integration layer |