Spaces:

ServiceNow-AI
/

Apriel-Chat

Running

App Files Files Community

bradnow commited on Apr 10

Commit

ca258aa

1 Parent(s): 59a74c6

add claude

Browse files

Files changed (1) hide show

CLAUDE.md +57 -0

CLAUDE.md ADDED Viewed

	@@ -0,0 +1,57 @@

+# CLAUDE.md
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+## What This Is
+A Gradio-based chat interface for ServiceNow-AI's Apriel reasoning models, deployed as a HuggingFace Space. Users chat with vLLM-hosted models via an OpenAI-compatible API, with streaming responses and multimodal (text + image) support.
+## Running Locally
+```bash
+# Install dependencies
+pip install -r requirements.txt
+# Run with hot reload (needs env vars — see below)
+python gradio_runner.py app.py
+# Or run directly
+python app.py
+```
+The Makefile target `make runAppReloading` bundles env vars and launches with hot reload, but contains hardcoded tokens — use it only as a reference for which env vars are needed.
+## Required Environment Variables
+- `AUTH_TOKEN` — vLLM API auth token
+- `HF_TOKEN` — HuggingFace token (for chat logging dataset)
+- `VLLM_API_URL_APRIEL_1_6_15B` — single vLLM endpoint
+- `VLLM_API_URL_LIST_APRIEL_1_6_15B` — comma-separated endpoints for load balancing
+- `MODEL_NAME_APRIEL_1_6_15B` — model name on vLLM server
+- `DEBUG_MODE` — "True"/"False" for verbose logging
+- `APRIEL_PROMPT_DATASET` — HF dataset repo for chat logging
+## Architecture
+**app.py** — Main Gradio app (UI layout, streaming inference, session state). `run_chat_inference()` is the core generator that streams chat completions, handles reasoning tag splitting (`[BEGIN FINAL RESPONSE]`), and supports multimodal input (up to 5 images converted to base64).
+**utils.py** — Model configuration registry (`models_config` dict) and logging helpers. Each model entry defines: HF URL, API name, vLLM endpoints, auth token, reasoning/multimodal flags, temperature, and output tags. Add new models here.
+**log_chat.py** — Async queue-based chat logger. Writes to local `train.csv` and syncs to a HuggingFace Hub dataset. Uses a daemon thread to avoid blocking the UI. Has a `test_log_chat()` function for manual testing.
+**theme.py** — Custom Gradio theme (Apriel) extending Soft theme with custom colors and fonts.
+**styles.css** — Responsive CSS with dark mode support. Chat height uses CSS calc with breakpoints at 1280px, 1024px, 400px.
+**timer.py** — Simple step-based timing utility for performance profiling.
+## HuggingFace Space Deployment
+The Space is configured via YAML frontmatter in `README.md` (sdk, sdk_version, app_file). The `sdk_version` must match the gradio version in `requirements.txt` — mismatches cause build failures.
+## Key Patterns
+- **Endpoint rotation**: `setup_model()` round-robins across vLLM endpoints from the comma-separated env var list
+- **Session state**: A global `session_state` dict tracks streaming status, stop flags, chat/session IDs, and opt-out preference
+- **Reasoning models**: Responses are split on `[BEGIN FINAL RESPONSE]` tag — content before is "thought", content after is the visible response
+- **Concurrency**: Gradio queue with `default_concurrency_limit=4`