Spaces:

Alovestocode
/

ZeroGPU-LLM-Inference

Sleeping

Alikestocode commited on Nov 7, 2025

Commit

4706b45

1 Parent(s): 9af4b77

Update README: Focus on CourseGPT-Pro router checkpoints

Files changed (1) hide show

README.md CHANGED Viewed

@@ -39,9 +39,9 @@ A modern, user-friendly Gradio interface for **token-streaming, chat-style infer
 - **Dynamic system prompts** with automatic date insertion
 ### 🎯 Model Variety
-- **30+ LLM options** from leading providers (Qwen, Microsoft, Meta, Mistral, etc.)
-- Models ranging from **135M to 32B+** parameters
-- Specialized models for **reasoning, coding, and general chat**
 - **Efficient model loading** - one at a time with automatic cache clearing
 ### ⚙️ Advanced Controls
@@ -52,26 +52,8 @@ A modern, user-friendly Gradio interface for **token-streaming, chat-style infer
 ## 🔄 Supported Models
-### Compact Models (< 2B)
-- **SmolLM2-135M-Instruct** - Tiny but capable
-- **SmolLM2-360M-Instruct** - Lightweight conversation
-- **Taiwan-ELM-270M/1.1B** - Multilingual support
-- **Qwen3-0.6B/1.7B** - Fast inference
-### Mid-Size Models (2B-8B)
-- **Qwen3-4B/8B** - Balanced performance
-- **Phi-4-mini** (4.3B) - Reasoning & Instruct variants
-- **MiniCPM3-4B** - Efficient mid-size
-- **Gemma-3-4B-IT** - Instruction-tuned
-- **Llama-3.2-Taiwan-3B** - Regional optimization
-- **Mistral-7B-Instruct** - Classic performer
-- **DeepSeek-R1-Distill-Llama-8B** - Reasoning specialist
-### Large Models (14B+)
-- **Qwen3-14B** - Strong general purpose
-- **Apriel-1.5-15b-Thinker** - Multimodal reasoning
-- **gpt-oss-20b** - Open GPT-style
-- **Qwen3-32B** - Top-tier performance
 ## 🚀 How It Works

 - **Dynamic system prompts** with automatic date insertion
 ### 🎯 Model Variety
+- Purpose-built around **CourseGPT-Pro router checkpoints**
+- Two curated options: **Router-Qwen3-32B (8-bit)** and **Router-Gemma3-27B (8-bit)**
+- Both ship with the same JSON routing schema for math/code/general orchestration
 - **Efficient model loading** - one at a time with automatic cache clearing
 ### ⚙️ Advanced Controls
 ## 🔄 Supported Models
+- **Router-Qwen3-32B-8bit** – Qwen3 32B base with CourseGPT-Pro routing LoRA merged and quantized for ZeroGPU. Best overall accuracy with modest latency.
+- **Router-Gemma3-27B-8bit** – Gemma3 27B base with the same router head, also in 8-bit. Slightly faster warm-up with a Gemma inductive bias that sometimes helps math-first prompts.
 ## 🚀 How It Works