Alikestocode commited on
Commit
4706b45
Β·
1 Parent(s): 9af4b77

Update README: Focus on CourseGPT-Pro router checkpoints

Browse files
Files changed (1) hide show
  1. README.md +5 -23
README.md CHANGED
@@ -39,9 +39,9 @@ A modern, user-friendly Gradio interface for **token-streaming, chat-style infer
39
  - **Dynamic system prompts** with automatic date insertion
40
 
41
  ### 🎯 Model Variety
42
- - **30+ LLM options** from leading providers (Qwen, Microsoft, Meta, Mistral, etc.)
43
- - Models ranging from **135M to 32B+** parameters
44
- - Specialized models for **reasoning, coding, and general chat**
45
  - **Efficient model loading** - one at a time with automatic cache clearing
46
 
47
  ### βš™οΈ Advanced Controls
@@ -52,26 +52,8 @@ A modern, user-friendly Gradio interface for **token-streaming, chat-style infer
52
 
53
  ## πŸ”„ Supported Models
54
 
55
- ### Compact Models (< 2B)
56
- - **SmolLM2-135M-Instruct** - Tiny but capable
57
- - **SmolLM2-360M-Instruct** - Lightweight conversation
58
- - **Taiwan-ELM-270M/1.1B** - Multilingual support
59
- - **Qwen3-0.6B/1.7B** - Fast inference
60
-
61
- ### Mid-Size Models (2B-8B)
62
- - **Qwen3-4B/8B** - Balanced performance
63
- - **Phi-4-mini** (4.3B) - Reasoning & Instruct variants
64
- - **MiniCPM3-4B** - Efficient mid-size
65
- - **Gemma-3-4B-IT** - Instruction-tuned
66
- - **Llama-3.2-Taiwan-3B** - Regional optimization
67
- - **Mistral-7B-Instruct** - Classic performer
68
- - **DeepSeek-R1-Distill-Llama-8B** - Reasoning specialist
69
-
70
- ### Large Models (14B+)
71
- - **Qwen3-14B** - Strong general purpose
72
- - **Apriel-1.5-15b-Thinker** - Multimodal reasoning
73
- - **gpt-oss-20b** - Open GPT-style
74
- - **Qwen3-32B** - Top-tier performance
75
 
76
  ## πŸš€ How It Works
77
 
 
39
  - **Dynamic system prompts** with automatic date insertion
40
 
41
  ### 🎯 Model Variety
42
+ - Purpose-built around **CourseGPT-Pro router checkpoints**
43
+ - Two curated options: **Router-Qwen3-32B (8-bit)** and **Router-Gemma3-27B (8-bit)**
44
+ - Both ship with the same JSON routing schema for math/code/general orchestration
45
  - **Efficient model loading** - one at a time with automatic cache clearing
46
 
47
  ### βš™οΈ Advanced Controls
 
52
 
53
  ## πŸ”„ Supported Models
54
 
55
+ - **Router-Qwen3-32B-8bit** – Qwen3 32B base with CourseGPT-Pro routing LoRA merged and quantized for ZeroGPU. Best overall accuracy with modest latency.
56
+ - **Router-Gemma3-27B-8bit** – Gemma3 27B base with the same router head, also in 8-bit. Slightly faster warm-up with a Gemma inductive bias that sometimes helps math-first prompts.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
57
 
58
  ## πŸš€ How It Works
59