Spaces:

MikelWL
/

ConverTA

Sleeping

App Files Files Community

MikelWL commited on Nov 13, 2025

Commit

ed0db0d

1 Parent(s): e3892d4

feat: Added openrouter backend

Browse files

Files changed (8) hide show

.env.example +20 -4
README.md +57 -111
backend/api/conversation_service.py +23 -2
backend/core/conversation_manager.py +12 -5
backend/core/llm_client.py +72 -4
config/settings.py +6 -0
docs/development.md +4 -2
frontend/gradio_app.py +3 -3

.env.example CHANGED Viewed

@@ -2,11 +2,27 @@
 API_HOST=0.0.0.0
 API_PORT=8000
-# LLM backend configuration
-LLM_BACKEND=ollama
-LLM_HOST=http://localhost:11434
-LLM_MODEL=llama3.2:latest
 LLM_TIMEOUT=120
 # Frontend configuration
 FRONTEND_BACKEND_BASE_URL=http://localhost:8000

 API_HOST=0.0.0.0
 API_PORT=8000
+# If you want to switch from hosted to local, comment out
+# the hosted lines and uncomment the local ones
+# Hosted LLM backend configuration
+LLM_BACKEND=openrouter
+LLM_HOST=https://openrouter.ai/api/v1
+LLM_MODEL={choose_a_free_one_from_openrouter}
+LLM_API_KEY={your_api_key}
+# Local LLM backend configuration
+# LLM_BACKEND=ollama
+# LLM_HOST=http://localhost:11434
+# LLM_MODEL=llama3.2:latest
 LLM_TIMEOUT=120
+LLM_MAX_RETRIES=3
+LLM_RETRY_DELAY=1.0
+# Optional (required when LLM_BACKEND=openrouter)
+LLM_SITE_URL=http://localhost:7860
+LLM_APP_NAME=AI_Survey_Simulator
 # Frontend configuration
 FRONTEND_BACKEND_BASE_URL=http://localhost:8000

README.md CHANGED Viewed

@@ -1,156 +1,102 @@
-# AI Survey Simulator (Local Guide)
-Welcome! This guide walks you through running the AI Survey Simulator locally so you can evaluate the interviewer/patient conversation flow without digging into the implementation details.
-If you are looking for architecture deep dives or change history, head to `docs/` where all developer-facing material now lives.
----
-## What You Get
-- A Gradio web interface to monitor and control AI-to-AI healthcare survey conversations
-- A FastAPI backend that orchestrates personas, manages the conversation state, and serves WebSocket updates
-- Out-of-the-box personas for a surveyor and multiple patient profiles stored in `data/`
 ---
-## Prerequisites
-1. **Python 3.9+** installed (`python --version`)
-2. **Pip** available (`pip --version`)
-3. **Ollama** running locally with an accessible model (e.g., `llama3.2:latest`), since the simulator calls the LLM through Ollama by default
-   - Install instructions: <https://ollama.ai>
-   - Pull a model: `ollama pull llama3.2:latest`
-   - Verify: `ollama list`
-> ℹ️ We are actively planning support for hosted LLM providers. When that lands you will be able to configure the app via environment variables instead of relying on a local Ollama instance.
 ---
-## 1. Configure Environment Variables
-Duplicate the sample configuration and adjust if necessary:
-```bash
-cp .env.example .env
 ```
-Key values:
-- `LLM_HOST` / `LLM_MODEL` — where the backend reaches your Ollama model
-- `FRONTEND_BACKEND_BASE_URL` — the FastAPI base URL Gradio should call
-- `FRONTEND_WEBSOCKET_URL` — the WebSocket endpoint prefix (without conversation id)
-You can accept the defaults for a local run. Update them later if you move the backend or change models.
 ---
-## 2. Set Up the Python Environment
-```bash
-git clone <repository-url>
-cd ConversationAI
-# (optional but recommended)
-python -m venv .venv
-source .venv/bin/activate       # Windows: .venv\Scripts\activate
 pip install -r requirements.txt
 ```
 ---
-## 3. Start the Backend (FastAPI)
-```bash
-cd backend
-uvicorn api.main:app --reload --host 0.0.0.0 --port 8000
 ```
-Keep this terminal running. The backend exposes REST endpoints under `http://localhost:8000` and a WebSocket endpoint the UI listens to.
----
-## 4. Launch the Gradio Frontend
-In a new terminal (activate the virtual environment again if you created one):
-```bash
-cd frontend
-python gradio_app.py
 ```
-Gradio starts on <http://localhost:7860>. Open that page in your browser.
----
-## 5. Run a Conversation
-1. Click **“Start Conversation”** — the app will connect to the backend automatically and begin the AI interview flow. Messages appear in the “Live AI Conversation” panel.
-2. Watch the conversation update automatically (the UI polls once per second).
-3. Click **“Stop Conversation”** when you are done.
-If the backend or Ollama becomes unreachable, the status box will show an error message so you know where to look first.
 ---
-## Personas
-- Surveyor profiles live in `data/surveyor_personas.yaml`
-- Patient profiles live in `data/patient_personas.yaml`
-To tweak a persona for experimentation:
-1. Edit the YAML entry (name, tone, system prompt, etc.)
-2. Restart the backend so it reloads the definitions
-We are working on UI controls to swap personas without editing files—stay tuned.
 ---
-## Advanced Configuration
-- Change the log verbosity by setting `LOG_LEVEL=DEBUG` in `.env`
-- Point to a different LLM host/model using the `LLM_*` variables
-- When we introduce hosted-model support, the same `.env` file will control which backend is used without code edits
----
-## Quick Start Script
-Prefer a single command? Run:
-```bash
-./run_local.sh
-```
-The script will:
-- Load environment variables from `.env`
-- Start `ollama serve` (if it is not already running)
-- Launch the FastAPI backend and Gradio frontend in the background
-Press `Ctrl+C` in that terminal to shut everything down cleanly.
-If you want to watch live logs, run the backend/frontend commands manually in separate terminals instead of using this helper.
 ---
-## Helpful Scripts
-- `run_local.sh` — start/stop the full local stack with one command
-- `dev_setup.sh` — (planned) install dependencies and verify prerequisites
-- Smoke tests with mocked LLM responses — (planned)
 ---
-## Need Implementation Details?
-For deeper implementation notes, visit the developer docs:
-- `docs/overview.md`
-- `docs/development.md`
-- `docs/roadmap.md`
----
-Happy testing! If you run into issues, capture the console output from both backend and frontend terminals—it usually reveals configuration or network problems quickly.

+# AI Survey Simulator – Quick Start
+Minimal instructions for running the simulator either with a local Ollama model or with a hosted OpenRouter model.
 ---
+## Requirements
+- Python 3.9+
+- Pip
+- Optional for local mode: [Ollama](https://ollama.ai) with a pulled model (e.g., `ollama pull llama3.2:latest`)
+- Optional for hosted mode: OpenRouter account + API key
 ---
+## 1. Create `.env`
+Copy `.env.example` to `.env` and choose one of the following blocks.
+**Local (Ollama)**
+```
+LLM_BACKEND=ollama
+LLM_HOST=http://localhost:11434
+LLM_MODEL=llama3.2:latest
 ```
+**Hosted (OpenRouter)**
+```
+LLM_BACKEND=openrouter
+LLM_HOST=https://openrouter.ai/api/v1
+LLM_MODEL=anthropic/claude-3-haiku:beta   # pick any model
+LLM_API_KEY=sk-or-...
+LLM_SITE_URL=http://localhost:7860
+LLM_APP_NAME=AI_Survey_Simulator
+```
+Other environment values (ports, websocket URL, log level) are already set in `.env.example`.
 ---
+## 2. Install Python Dependencies
+```
+python -m venv .venv          # optional
+source .venv/bin/activate     # Windows: .venv\Scripts\activate
 pip install -r requirements.txt
 ```
 ---
+## 3. Run the Stack
+### Option A – Single Command
 ```
+./run_local.sh
 ```
+Reads `.env`, starts Ollama if needed, launches FastAPI + Gradio, and keeps them running until `Ctrl+C`.
+### Option B – Manual Terminals
+1. *(Only if LLM_BACKEND=ollama)* `ollama serve`
+2. `cd backend && uvicorn api.main:app --host 0.0.0.0 --port 8000`
+3. `cd frontend && python gradio_app.py`
+Backend listens on `http://localhost:8000`, Gradio on `http://localhost:7860`.
 ---
+## 4. Use the App
+1. Open the Gradio URL.
+2. Click **Start Conversation**. The UI auto-connects to the backend and refreshes once per second.
+3. Click **Stop Conversation** when finished.
+Any connection errors or LLM issues appear in the status panel.
 ---
+## 5. Personas
+- Surveyor definitions: `data/surveyor_personas.yaml`
+- Patient definitions: `data/patient_personas.yaml`
+Edit the YAML, then restart the backend to apply changes.
 ---
+## 6. Troubleshooting
+| Issue | Resolution |
+| --- | --- |
+| “Temporary failure in name resolution” (OpenRouter) | Launch backend from an environment that can resolve `openrouter.ai`; ensure proxies/DNS settings match the working terminal. |
+| “All connection attempts failed” | Backend cannot reach the LLM. Verify `.env`, restart backend, check console logs. |
+| “Model not found” (Ollama) | Pull the model with `ollama pull <model>` and restart backend. |
+| UI stays empty | Backend not running or `.env` mismatch. Restart both processes. |
 ---
+## 7. Reference Docs
+- `docs/overview.md` – architecture summary
+- `docs/development.md` – environment tips and backend switching
+- `docs/roadmap.md` – upcoming work

backend/api/conversation_service.py CHANGED Viewed

@@ -19,7 +19,7 @@ Example:
 import asyncio
 import logging
 from datetime import datetime
-from typing import Dict, Optional
 from dataclasses import dataclass
 from enum import Enum
 import sys
@@ -141,11 +141,15 @@ class ConversationService:
             await self._send_status_update(conversation_id, ConversationStatus.STARTING)
             # Create and start conversation manager
             manager = ConversationManager(
                 surveyor_persona=surveyor_persona,
                 patient_persona=patient_persona,
                 host=resolved_host,
-                model=resolved_model
             )
             # Start conversation streaming task
@@ -302,6 +306,23 @@ class ConversationService:
                 conv_info.status = ConversationStatus.COMPLETED
                 await self._send_status_update(conversation_id, ConversationStatus.COMPLETED)
     async def _send_status_update(self, conversation_id: str, status: ConversationStatus):
         """Send conversation status update to clients.

 import asyncio
 import logging
 from datetime import datetime
+from typing import Dict, Optional, Any
 from dataclasses import dataclass
 from enum import Enum
 import sys
             await self._send_status_update(conversation_id, ConversationStatus.STARTING)
             # Create and start conversation manager
+            llm_parameters = self._build_llm_parameters()
             manager = ConversationManager(
                 surveyor_persona=surveyor_persona,
                 patient_persona=patient_persona,
                 host=resolved_host,
+                model=resolved_model,
+                llm_backend=self.settings.llm.backend,
+                llm_parameters=llm_parameters
             )
             # Start conversation streaming task
                 conv_info.status = ConversationStatus.COMPLETED
                 await self._send_status_update(conversation_id, ConversationStatus.COMPLETED)
+    def _build_llm_parameters(self) -> Dict[str, Any]:
+        """Prepare keyword arguments for LLM client creation."""
+        params: Dict[str, Any] = {
+            "timeout": self.settings.llm.timeout,
+            "max_retries": self.settings.llm.max_retries,
+            "retry_delay": self.settings.llm.retry_delay,
+        }
+        if self.settings.llm.api_key:
+            params["api_key"] = self.settings.llm.api_key
+        if self.settings.llm.site_url:
+            params["site_url"] = self.settings.llm.site_url
+        if self.settings.llm.app_name:
+            params["app_name"] = self.settings.llm.app_name
+        return params
     async def _send_status_update(self, conversation_id: str, status: ConversationStatus):
         """Send conversation status update to clients.

backend/core/conversation_manager.py CHANGED Viewed

@@ -20,7 +20,7 @@ Example:
 """
 from enum import Enum
-from typing import AsyncGenerator, Dict, List, Optional
 import asyncio
 from datetime import datetime
 import sys
@@ -29,7 +29,7 @@ from pathlib import Path
 # Add backend to path for imports
 sys.path.insert(0, str(Path(__file__).parent.parent))
-from core.llm_client import OllamaClient
 from core.persona_system import PersonaSystem
@@ -62,7 +62,9 @@ class ConversationManager:
                  surveyor_persona: dict = None,
                  patient_persona: dict = None,
                  host: str = "http://localhost:11434",
-                 model: str = "llama3.2:latest"):
         """Initialize conversation manager with personas.
         Args:
@@ -72,10 +74,15 @@ class ConversationManager:
             patient_persona: Pre-loaded patient persona dict
             host: Ollama server host
             model: LLM model to use
         """
         # Initialize systems
         self.persona_system = PersonaSystem()
-        self.client = OllamaClient(host=host, model=model)
         # Load personas
         if surveyor_persona:
@@ -331,4 +338,4 @@ class ConversationManager:
     async def close(self):
         """Clean up resources."""
         if hasattr(self, 'client') and self.client:
-            await self.client.close()

 """
 from enum import Enum
+from typing import AsyncGenerator, Dict, List, Optional, Any
 import asyncio
 from datetime import datetime
 import sys
 # Add backend to path for imports
 sys.path.insert(0, str(Path(__file__).parent.parent))
+from core.llm_client import create_llm_client
 from core.persona_system import PersonaSystem
                  surveyor_persona: dict = None,
                  patient_persona: dict = None,
                  host: str = "http://localhost:11434",
+                 model: str = "llama3.2:latest",
+                 llm_backend: str = "ollama",
+                 llm_parameters: Optional[Dict[str, Any]] = None):
         """Initialize conversation manager with personas.
         Args:
             patient_persona: Pre-loaded patient persona dict
             host: Ollama server host
             model: LLM model to use
+            llm_backend: Which LLM backend implementation to use
+            llm_parameters: Additional keyword arguments for the LLM client
         """
         # Initialize systems
         self.persona_system = PersonaSystem()
+        client_kwargs = {"host": host, "model": model}
+        if llm_parameters:
+            client_kwargs.update(llm_parameters)
+        self.client = create_llm_client(llm_backend, **client_kwargs)
         # Load personas
         if surveyor_persona:
     async def close(self):
         """Clean up resources."""
         if hasattr(self, 'client') and self.client:
+            await self.client.close()

backend/core/llm_client.py CHANGED Viewed

@@ -331,6 +331,64 @@ class VLLMClient(LLMClient):
             raise
 def create_llm_client(backend: str, **kwargs) -> LLMClient:
     """Factory function to create appropriate LLM client.
@@ -341,10 +399,14 @@ def create_llm_client(backend: str, **kwargs) -> LLMClient:
     Returns:
         Configured LLM client instance
     """
-    if backend.lower() == "ollama":
         return OllamaClient(**kwargs)
-    elif backend.lower() == "vllm":
         return VLLMClient(**kwargs)
     else:
         raise ValueError(f"Unknown LLM backend: {backend}")
@@ -370,7 +432,10 @@ def create_llm_client_from_config(config_path: Optional[str] = None,
         "model": "llama3.2:latest",
         "timeout": 120,
         "max_retries": 3,
-        "retry_delay": 1.0
     }
     # Load from config file if provided
@@ -406,7 +471,10 @@ def create_llm_client_from_config(config_path: Optional[str] = None,
         "model": f"{env_prefix}MODEL",
         "timeout": f"{env_prefix}TIMEOUT",
         "max_retries": f"{env_prefix}MAX_RETRIES",
-        "retry_delay": f"{env_prefix}RETRY_DELAY"
     }
     for config_key, env_var in env_vars.items():

             raise
+class OpenRouterClient(LLMClient):
+    """Client implementation for OpenRouter-hosted models."""
+    def __init__(self,
+                 host: str,
+                 model: str,
+                 api_key: str,
+                 site_url: Optional[str] = None,
+                 app_name: Optional[str] = None,
+                 **kwargs):
+        if not api_key:
+            raise ValueError("OpenRouterClient requires an API key")
+        super().__init__(host=host.rstrip("/"), model=model, **kwargs)
+        self.headers = {
+            "Authorization": f"Bearer {api_key}",
+            "Content-Type": "application/json",
+        }
+        if site_url:
+            self.headers["HTTP-Referer"] = site_url
+        if app_name:
+            self.headers["X-Title"] = app_name
+    async def generate(self,
+                       prompt: str,
+                       system_prompt: Optional[str] = None,
+                       **kwargs) -> str:
+        """Generate response using OpenRouter's Chat Completions API."""
+        messages = []
+        if system_prompt:
+            messages.append({"role": "system", "content": system_prompt})
+        messages.append({"role": "user", "content": prompt})
+        payload = {
+            "model": self.model,
+            "messages": messages,
+            "stream": False,
+            **kwargs
+        }
+        async def _make_request():
+            response = await self.client.post(
+                f"{self.host}/chat/completions",
+                json=payload,
+                headers=self.headers
+            )
+            response.raise_for_status()
+            data = response.json()
+            usage = data.get("usage", {})
+            self.total_tokens += usage.get("total_tokens", 0)
+            return data["choices"][0]["message"]["content"]
+        return await self._retry_request(_make_request)
 def create_llm_client(backend: str, **kwargs) -> LLMClient:
     """Factory function to create appropriate LLM client.
     Returns:
         Configured LLM client instance
     """
+    backend_name = backend.lower()
+    if backend_name == "ollama":
         return OllamaClient(**kwargs)
+    elif backend_name == "vllm":
         return VLLMClient(**kwargs)
+    elif backend_name in ("openrouter", "open_router"):
+        return OpenRouterClient(**kwargs)
     else:
         raise ValueError(f"Unknown LLM backend: {backend}")
         "model": "llama3.2:latest",
         "timeout": 120,
         "max_retries": 3,
+        "retry_delay": 1.0,
+        "api_key": None,
+        "site_url": None,
+        "app_name": None,
     }
     # Load from config file if provided
         "model": f"{env_prefix}MODEL",
         "timeout": f"{env_prefix}TIMEOUT",
         "max_retries": f"{env_prefix}MAX_RETRIES",
+        "retry_delay": f"{env_prefix}RETRY_DELAY",
+        "api_key": f"{env_prefix}API_KEY",
+        "site_url": f"{env_prefix}SITE_URL",
+        "app_name": f"{env_prefix}APP_NAME",
     }
     for config_key, env_var in env_vars.items():

config/settings.py CHANGED Viewed

@@ -5,6 +5,7 @@ override them through a `.env` file or process environment variables.
 """
 from functools import lru_cache
 from pydantic_settings import BaseSettings, SettingsConfigDict
@@ -30,6 +31,11 @@ class LLMSettings(BaseSettings):
     host: str = "http://localhost:11434"
     model: str = "llama3.2:latest"
     timeout: int = 120
     model_config = SettingsConfigDict(
         env_prefix="LLM_",

 """
 from functools import lru_cache
+from typing import Optional
 from pydantic_settings import BaseSettings, SettingsConfigDict
     host: str = "http://localhost:11434"
     model: str = "llama3.2:latest"
     timeout: int = 120
+    max_retries: int = 3
+    retry_delay: float = 1.0
+    api_key: Optional[str] = None
+    site_url: Optional[str] = None
+    app_name: Optional[str] = None
     model_config = SettingsConfigDict(
         env_prefix="LLM_",

docs/development.md CHANGED Viewed

@@ -15,7 +15,9 @@ pip install -r requirements.txt
 Key environment variables (see `.env.example`):
-- `LLM_HOST` / `LLM_MODEL` — target model endpoint
 - `FRONTEND_BACKEND_BASE_URL` and `FRONTEND_WEBSOCKET_URL` — how the UI talks to FastAPI
 - `LOG_LEVEL` — INFO by default
@@ -25,7 +27,7 @@ Key environment variables (see `.env.example`):
 ```bash
 ./run_local.sh
 ```
-- Starts `ollama serve` (if not already running)
 - Launches FastAPI backend and Gradio frontend in the background
 - Press `Ctrl+C` to stop all three processes

 Key environment variables (see `.env.example`):
+- `LLM_BACKEND` — `ollama` (local default) or `openrouter`
+- `LLM_HOST` / `LLM_MODEL` — target endpoint & model ID
+- `LLM_API_KEY`, `LLM_SITE_URL`, `LLM_APP_NAME` — required when using OpenRouter
 - `FRONTEND_BACKEND_BASE_URL` and `FRONTEND_WEBSOCKET_URL` — how the UI talks to FastAPI
 - `LOG_LEVEL` — INFO by default
 ```bash
 ./run_local.sh
 ```
+- Starts `ollama serve` (if not already running) — this mode expects `LLM_BACKEND=ollama`
 - Launches FastAPI backend and Gradio frontend in the background
 - Press `Ctrl+C` to stop all three processes

frontend/gradio_app.py CHANGED Viewed

@@ -205,7 +205,7 @@ def get_message_display() -> str:
     """Get formatted message display."""
     if not all_messages:
         if conversation_active:
-            return "🔄 Conversation started. AI responses will appear here...\n\nClick 'Refresh' to check for new messages."
         else:
             return "No messages yet. Click 'Start Conversation' to begin!"
@@ -309,9 +309,9 @@ with gr.Blocks(title="🏥 AI Survey Simulator v2") as app:
             gr.HTML("""
             <div style="margin-top: 15px; padding: 10px; background-color: #fff3cd; border-radius: 8px; font-size: 12px;">
                 <strong>🔧 Requirements:</strong><br>
-                • Ollama server running<br>
                 • FastAPI backend on port 8000<br>
-                • llama3.2:latest model available
             </div>
             """)

     """Get formatted message display."""
     if not all_messages:
         if conversation_active:
+            return "🔄 Conversation started. AI responses will appear here...\n\nUpdates arrive automatically every second."
         else:
             return "No messages yet. Click 'Start Conversation' to begin!"
             gr.HTML("""
             <div style="margin-top: 15px; padding: 10px; background-color: #fff3cd; border-radius: 8px; font-size: 12px;">
                 <strong>🔧 Requirements:</strong><br>
                 • FastAPI backend on port 8000<br>
+                • LLM backend reachable (local Ollama or OpenRouter via API key)<br>
+                • Update <code>.env</code> with the model/backend you plan to use
             </div>
             """)