bradnow commited on
Commit
ca258aa
Β·
1 Parent(s): 59a74c6

add claude

Browse files
Files changed (1) hide show
  1. CLAUDE.md +57 -0
CLAUDE.md ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # CLAUDE.md
2
+
3
+ This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4
+
5
+ ## What This Is
6
+
7
+ A Gradio-based chat interface for ServiceNow-AI's Apriel reasoning models, deployed as a HuggingFace Space. Users chat with vLLM-hosted models via an OpenAI-compatible API, with streaming responses and multimodal (text + image) support.
8
+
9
+ ## Running Locally
10
+
11
+ ```bash
12
+ # Install dependencies
13
+ pip install -r requirements.txt
14
+
15
+ # Run with hot reload (needs env vars β€” see below)
16
+ python gradio_runner.py app.py
17
+
18
+ # Or run directly
19
+ python app.py
20
+ ```
21
+
22
+ The Makefile target `make runAppReloading` bundles env vars and launches with hot reload, but contains hardcoded tokens β€” use it only as a reference for which env vars are needed.
23
+
24
+ ## Required Environment Variables
25
+
26
+ - `AUTH_TOKEN` β€” vLLM API auth token
27
+ - `HF_TOKEN` β€” HuggingFace token (for chat logging dataset)
28
+ - `VLLM_API_URL_APRIEL_1_6_15B` β€” single vLLM endpoint
29
+ - `VLLM_API_URL_LIST_APRIEL_1_6_15B` β€” comma-separated endpoints for load balancing
30
+ - `MODEL_NAME_APRIEL_1_6_15B` β€” model name on vLLM server
31
+ - `DEBUG_MODE` β€” "True"/"False" for verbose logging
32
+ - `APRIEL_PROMPT_DATASET` β€” HF dataset repo for chat logging
33
+
34
+ ## Architecture
35
+
36
+ **app.py** β€” Main Gradio app (UI layout, streaming inference, session state). `run_chat_inference()` is the core generator that streams chat completions, handles reasoning tag splitting (`[BEGIN FINAL RESPONSE]`), and supports multimodal input (up to 5 images converted to base64).
37
+
38
+ **utils.py** β€” Model configuration registry (`models_config` dict) and logging helpers. Each model entry defines: HF URL, API name, vLLM endpoints, auth token, reasoning/multimodal flags, temperature, and output tags. Add new models here.
39
+
40
+ **log_chat.py** β€” Async queue-based chat logger. Writes to local `train.csv` and syncs to a HuggingFace Hub dataset. Uses a daemon thread to avoid blocking the UI. Has a `test_log_chat()` function for manual testing.
41
+
42
+ **theme.py** β€” Custom Gradio theme (Apriel) extending Soft theme with custom colors and fonts.
43
+
44
+ **styles.css** β€” Responsive CSS with dark mode support. Chat height uses CSS calc with breakpoints at 1280px, 1024px, 400px.
45
+
46
+ **timer.py** β€” Simple step-based timing utility for performance profiling.
47
+
48
+ ## HuggingFace Space Deployment
49
+
50
+ The Space is configured via YAML frontmatter in `README.md` (sdk, sdk_version, app_file). The `sdk_version` must match the gradio version in `requirements.txt` β€” mismatches cause build failures.
51
+
52
+ ## Key Patterns
53
+
54
+ - **Endpoint rotation**: `setup_model()` round-robins across vLLM endpoints from the comma-separated env var list
55
+ - **Session state**: A global `session_state` dict tracks streaming status, stop flags, chat/session IDs, and opt-out preference
56
+ - **Reasoning models**: Responses are split on `[BEGIN FINAL RESPONSE]` tag β€” content before is "thought", content after is the visible response
57
+ - **Concurrency**: Gradio queue with `default_concurrency_limit=4`