anaspro
commited on
Commit
·
55612d9
1
Parent(s):
d4f3bf5
update
Browse files- CHANGES.md +173 -0
- README.md +43 -6
- USAGE_GUIDE.md +213 -0
- app.py +115 -172
- requirements.txt +2 -5
CHANGES.md
ADDED
|
@@ -0,0 +1,173 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Chatbox2 - Qwen3-14B Update
|
| 2 |
+
|
| 3 |
+
## Summary of Changes
|
| 4 |
+
|
| 5 |
+
Your chatbox has been successfully upgraded to use **Qwen3-14B** with thinking/non-thinking mode capabilities!
|
| 6 |
+
|
| 7 |
+
## What Changed
|
| 8 |
+
|
| 9 |
+
### 1. **Model Upgrade**
|
| 10 |
+
- **Old Model**: `anaspro/Shako-iraqi-4B-it` (multimodal)
|
| 11 |
+
- **New Model**: `Qwen/Qwen3-14B` (text-only with thinking capabilities)
|
| 12 |
+
|
| 13 |
+
### 2. **New Features**
|
| 14 |
+
|
| 15 |
+
#### **Thinking Mode Toggle** 🤔
|
| 16 |
+
You can now switch between two modes:
|
| 17 |
+
|
| 18 |
+
- **Thinking Mode ON** (default):
|
| 19 |
+
- Best for: Math problems, coding, complex reasoning
|
| 20 |
+
- The model shows its reasoning process in `<think>...</think>` tags
|
| 21 |
+
- Uses Temperature=0.6, TopP=0.95, TopK=20
|
| 22 |
+
- More detailed and thorough responses
|
| 23 |
+
|
| 24 |
+
- **Thinking Mode OFF**:
|
| 25 |
+
- Best for: General conversation, quick responses
|
| 26 |
+
- Faster responses without showing reasoning
|
| 27 |
+
- Uses Temperature=0.7, TopP=0.8, TopK=20
|
| 28 |
+
- More efficient for casual chat
|
| 29 |
+
|
| 30 |
+
### 3. **Updated Parameters**
|
| 31 |
+
- Max tokens increased from 2048 to 32768 (matching Qwen3's capabilities)
|
| 32 |
+
- Optimized generation parameters based on mode
|
| 33 |
+
- Removed multimodal support (images/videos) as Qwen3-14B is text-only
|
| 34 |
+
|
| 35 |
+
### 4. **UI Improvements**
|
| 36 |
+
- Added checkbox to toggle thinking mode
|
| 37 |
+
- Updated title and description
|
| 38 |
+
- New examples showcasing both modes
|
| 39 |
+
|
| 40 |
+
## How to Use
|
| 41 |
+
|
| 42 |
+
### Basic Usage
|
| 43 |
+
1. Type your message in the textbox
|
| 44 |
+
2. Adjust settings in the sidebar:
|
| 45 |
+
- **System Prompt**: Customize the AI's behavior (default: Iraqi dialect)
|
| 46 |
+
- **Max New Tokens**: Control response length (100-32768)
|
| 47 |
+
- **Enable Thinking Mode**: Toggle between thinking/non-thinking
|
| 48 |
+
|
| 49 |
+
### When to Use Thinking Mode
|
| 50 |
+
|
| 51 |
+
✅ **Enable Thinking Mode for:**
|
| 52 |
+
- Math problems
|
| 53 |
+
- Coding challenges
|
| 54 |
+
- Complex logical reasoning
|
| 55 |
+
- Step-by-step explanations
|
| 56 |
+
- Problem-solving tasks
|
| 57 |
+
|
| 58 |
+
❌ **Disable Thinking Mode for:**
|
| 59 |
+
- General conversation
|
| 60 |
+
- Quick questions
|
| 61 |
+
- Creative writing
|
| 62 |
+
- Casual chat
|
| 63 |
+
- When you need faster responses
|
| 64 |
+
|
| 65 |
+
### Advanced: Soft Switching with `/think` and `/no_think`
|
| 66 |
+
|
| 67 |
+
When **Enable Thinking Mode** checkbox is ON, you can dynamically control thinking behavior per message using soft switches:
|
| 68 |
+
|
| 69 |
+
- Add `/think` to your message to **force thinking** for that specific turn
|
| 70 |
+
- Add `/no_think` to your message to **skip thinking** for that specific turn
|
| 71 |
+
|
| 72 |
+
**Important Notes:**
|
| 73 |
+
- Soft switches only work when the "Enable Thinking Mode" checkbox is checked (ON)
|
| 74 |
+
- When using `/no_think`, the model still outputs `<think>...</think>` tags, but they will be empty
|
| 75 |
+
- The model follows the most recent instruction in multi-turn conversations
|
| 76 |
+
- You can add the switch anywhere in your message (beginning or end)
|
| 77 |
+
|
| 78 |
+
**Examples:**
|
| 79 |
+
|
| 80 |
+
```
|
| 81 |
+
User: What is the capital of France? /no_think
|
| 82 |
+
Bot: 💬 Response: Paris is the capital of France.
|
| 83 |
+
```
|
| 84 |
+
|
| 85 |
+
```
|
| 86 |
+
User: Solve this complex equation: x^3 + 2x^2 - 5x + 1 = 0 /think
|
| 87 |
+
Bot: 🤔 Thinking Process: Let me approach this step by step...
|
| 88 |
+
💬 Response: The solutions are approximately...
|
| 89 |
+
```
|
| 90 |
+
|
| 91 |
+
```
|
| 92 |
+
User: How many r's in strawberry? /think
|
| 93 |
+
Bot: 🤔 Thinking Process: Let me count each letter: s-t-r-a-w-b-e-r-r-y...
|
| 94 |
+
💬 Response: There are 3 r's in "strawberry".
|
| 95 |
+
|
| 96 |
+
User: What about blueberry? /no_think
|
| 97 |
+
Bot: 💬 Response: There are 2 r's in "blueberry".
|
| 98 |
+
|
| 99 |
+
User: Really? /think
|
| 100 |
+
Bot: 🤔 Thinking Process: Let me recount: b-l-u-e-b-e-r-r-y...
|
| 101 |
+
💬 Response: Yes, there are 2 r's in "blueberry" (positions 7 and 8).
|
| 102 |
+
```
|
| 103 |
+
|
| 104 |
+
**When Soft Switches Don't Work:**
|
| 105 |
+
- If "Enable Thinking Mode" checkbox is OFF, soft switches are ignored
|
| 106 |
+
- The model will not generate any `<think>` tags regardless of `/think` or `/no_think` in your message
|
| 107 |
+
|
| 108 |
+
## Technical Details
|
| 109 |
+
|
| 110 |
+
### Dependencies Updated
|
| 111 |
+
- `transformers>=4.51.0` (required for Qwen3 support)
|
| 112 |
+
- Removed: `av`, `timm`, `gTTS` (no longer needed)
|
| 113 |
+
|
| 114 |
+
### Model Configuration
|
| 115 |
+
```python
|
| 116 |
+
model_id = "Qwen/Qwen3-14B"
|
| 117 |
+
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
| 118 |
+
model = AutoModelForCausalLM.from_pretrained(
|
| 119 |
+
model_id,
|
| 120 |
+
device_map="auto",
|
| 121 |
+
torch_dtype=torch.bfloat16
|
| 122 |
+
)
|
| 123 |
+
```
|
| 124 |
+
|
| 125 |
+
### Generation Parameters
|
| 126 |
+
|
| 127 |
+
**Thinking Mode:**
|
| 128 |
+
- Temperature: 0.6
|
| 129 |
+
- Top-P: 0.95
|
| 130 |
+
- Top-K: 20
|
| 131 |
+
- Min-P: 0.0
|
| 132 |
+
|
| 133 |
+
**Non-Thinking Mode:**
|
| 134 |
+
- Temperature: 0.7
|
| 135 |
+
- Top-P: 0.8
|
| 136 |
+
- Top-K: 20
|
| 137 |
+
- Min-P: 0.0
|
| 138 |
+
|
| 139 |
+
## Running the Application
|
| 140 |
+
|
| 141 |
+
```bash
|
| 142 |
+
python app.py
|
| 143 |
+
```
|
| 144 |
+
|
| 145 |
+
The app will launch on `http://localhost:7860` by default.
|
| 146 |
+
|
| 147 |
+
## Notes
|
| 148 |
+
|
| 149 |
+
1. **Text-Only**: Qwen3-14B doesn't support images, videos, or audio. The multimodal features have been removed.
|
| 150 |
+
|
| 151 |
+
2. **Context Length**: The model supports up to 32,768 tokens natively. For longer contexts (up to 131,072), you can enable YaRN scaling (see Qwen3 documentation).
|
| 152 |
+
|
| 153 |
+
3. **Iraqi Dialect**: The default system prompt is configured for Iraqi Arabic dialect. You can modify this in the System Prompt field.
|
| 154 |
+
|
| 155 |
+
4. **GPU Requirements**: Qwen3-14B requires significant GPU memory. Make sure you have adequate resources.
|
| 156 |
+
|
| 157 |
+
## Reference
|
| 158 |
+
|
| 159 |
+
For more information about Qwen3-14B capabilities, visit:
|
| 160 |
+
- Model Page: https://huggingface.co/Qwen/Qwen3-14B
|
| 161 |
+
- Documentation: https://qwenlm.github.io/blog/qwen3/
|
| 162 |
+
|
| 163 |
+
## Troubleshooting
|
| 164 |
+
|
| 165 |
+
**Issue**: `KeyError: 'qwen3'`
|
| 166 |
+
**Solution**: Make sure you have `transformers>=4.51.0` installed
|
| 167 |
+
|
| 168 |
+
**Issue**: Out of memory errors
|
| 169 |
+
**Solution**: Reduce `max_new_tokens` or use a smaller batch size
|
| 170 |
+
|
| 171 |
+
**Issue**: Slow responses
|
| 172 |
+
**Solution**: Disable thinking mode for faster generation
|
| 173 |
+
|
README.md
CHANGED
|
@@ -1,13 +1,50 @@
|
|
| 1 |
---
|
| 2 |
-
title:
|
| 3 |
-
emoji:
|
| 4 |
-
colorFrom:
|
| 5 |
-
colorTo:
|
| 6 |
sdk: gradio
|
| 7 |
sdk_version: 5.49.1
|
| 8 |
app_file: app.py
|
| 9 |
pinned: false
|
| 10 |
-
short_description:
|
| 11 |
---
|
| 12 |
|
| 13 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
title: Qwen3-14B Iraqi Chatbot
|
| 3 |
+
emoji: 🤔
|
| 4 |
+
colorFrom: blue
|
| 5 |
+
colorTo: purple
|
| 6 |
sdk: gradio
|
| 7 |
sdk_version: 5.49.1
|
| 8 |
app_file: app.py
|
| 9 |
pinned: false
|
| 10 |
+
short_description: Qwen3-14B with thinking mode for Iraqi Arabic
|
| 11 |
---
|
| 12 |
|
| 13 |
+
# Qwen3-14B Iraqi Chatbot with Thinking Mode
|
| 14 |
+
|
| 15 |
+
An advanced chatbot powered by **Qwen3-14B** with seamless switching between thinking and non-thinking modes.
|
| 16 |
+
|
| 17 |
+
## Features
|
| 18 |
+
|
| 19 |
+
- 🤔 **Thinking Mode**: Enhanced reasoning for complex tasks (math, coding, logic)
|
| 20 |
+
- 💬 **Non-Thinking Mode**: Fast responses for general conversation
|
| 21 |
+
- 🇮🇶 **Iraqi Dialect**: Optimized for Iraqi Arabic conversations
|
| 22 |
+
- 🎯 **32K Context**: Supports up to 32,768 tokens
|
| 23 |
+
|
| 24 |
+
## Quick Start
|
| 25 |
+
|
| 26 |
+
1. Type your question in the chat
|
| 27 |
+
2. Toggle "Enable Thinking Mode" for complex reasoning tasks
|
| 28 |
+
3. Adjust system prompt and max tokens as needed
|
| 29 |
+
|
| 30 |
+
## When to Use Thinking Mode
|
| 31 |
+
|
| 32 |
+
**Enable for:**
|
| 33 |
+
- Math problems and equations
|
| 34 |
+
- Coding challenges
|
| 35 |
+
- Complex reasoning tasks
|
| 36 |
+
- Step-by-step explanations
|
| 37 |
+
|
| 38 |
+
**Disable for:**
|
| 39 |
+
- Quick questions
|
| 40 |
+
- General conversation
|
| 41 |
+
- Creative writing
|
| 42 |
+
- Faster responses
|
| 43 |
+
|
| 44 |
+
## Technical Details
|
| 45 |
+
|
| 46 |
+
- **Model**: Qwen/Qwen3-14B
|
| 47 |
+
- **Context Length**: 32,768 tokens (native)
|
| 48 |
+
- **Parameters**: 14.8B total, 13.2B non-embedding
|
| 49 |
+
|
| 50 |
+
Check out the [Qwen3 documentation](https://huggingface.co/Qwen/Qwen3-14B) for more details.
|
USAGE_GUIDE.md
ADDED
|
@@ -0,0 +1,213 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Qwen3-14B Chatbot - Quick Usage Guide
|
| 2 |
+
|
| 3 |
+
## 🎯 Three Ways to Control Thinking Mode
|
| 4 |
+
|
| 5 |
+
### 1. **Hard Switch: Enable Thinking Mode Checkbox**
|
| 6 |
+
|
| 7 |
+
Located in the sidebar, this is the main control:
|
| 8 |
+
|
| 9 |
+
- ✅ **Checked (ON)**: Thinking mode is available
|
| 10 |
+
- Model can show reasoning process
|
| 11 |
+
- Supports `/think` and `/no_think` soft switches
|
| 12 |
+
- Best for: Math, coding, complex reasoning
|
| 13 |
+
- Parameters: Temp=0.6, TopP=0.95, TopK=20
|
| 14 |
+
|
| 15 |
+
- ❌ **Unchecked (OFF)**: Thinking mode is disabled
|
| 16 |
+
- No thinking process shown
|
| 17 |
+
- Faster responses
|
| 18 |
+
- Soft switches are ignored
|
| 19 |
+
- Best for: General chat, quick questions
|
| 20 |
+
- Parameters: Temp=0.7, TopP=0.8, TopK=20
|
| 21 |
+
|
| 22 |
+
---
|
| 23 |
+
|
| 24 |
+
### 2. **Soft Switch: `/think` Tag**
|
| 25 |
+
|
| 26 |
+
Force thinking for a specific message (only works when checkbox is ON):
|
| 27 |
+
|
| 28 |
+
```
|
| 29 |
+
User: How many r's are in "strawberry"? /think
|
| 30 |
+
```
|
| 31 |
+
|
| 32 |
+
**Result:**
|
| 33 |
+
```
|
| 34 |
+
🤔 Thinking Process:
|
| 35 |
+
Let me count each letter carefully:
|
| 36 |
+
s-t-r-a-w-b-e-r-r-y
|
| 37 |
+
The r's appear at positions 3, 8, and 9.
|
| 38 |
+
|
| 39 |
+
💬 Response:
|
| 40 |
+
There are 3 r's in the word "strawberry".
|
| 41 |
+
```
|
| 42 |
+
|
| 43 |
+
---
|
| 44 |
+
|
| 45 |
+
### 3. **Soft Switch: `/no_think` Tag**
|
| 46 |
+
|
| 47 |
+
Skip thinking for a specific message (only works when checkbox is ON):
|
| 48 |
+
|
| 49 |
+
```
|
| 50 |
+
User: What is 2+2? /no_think
|
| 51 |
+
```
|
| 52 |
+
|
| 53 |
+
**Result:**
|
| 54 |
+
```
|
| 55 |
+
💬 Response:
|
| 56 |
+
2+2 equals 4.
|
| 57 |
+
```
|
| 58 |
+
|
| 59 |
+
---
|
| 60 |
+
|
| 61 |
+
## 📊 Comparison Table
|
| 62 |
+
|
| 63 |
+
| Feature | Checkbox ON + `/think` | Checkbox ON + `/no_think` | Checkbox ON (default) | Checkbox OFF |
|
| 64 |
+
|---------|----------------------|--------------------------|---------------------|--------------|
|
| 65 |
+
| Shows thinking | ✅ Yes | ❌ No | ✅ Yes | ❌ No |
|
| 66 |
+
| `<think>` tags | ✅ With content | ⚠️ Empty | ✅ With content | ❌ None |
|
| 67 |
+
| Speed | 🐢 Slower | 🚀 Faster | 🐢 Slower | 🚀 Fastest |
|
| 68 |
+
| Best for | Complex problems | Quick answers | Reasoning tasks | General chat |
|
| 69 |
+
|
| 70 |
+
---
|
| 71 |
+
|
| 72 |
+
## 💡 Real-World Examples
|
| 73 |
+
|
| 74 |
+
### Example 1: Math Problem (Use Thinking)
|
| 75 |
+
|
| 76 |
+
```
|
| 77 |
+
User: Solve: If x^2 + 5x + 6 = 0, what are the values of x? /think
|
| 78 |
+
|
| 79 |
+
Bot:
|
| 80 |
+
🤔 Thinking Process:
|
| 81 |
+
This is a quadratic equation. I can solve it by factoring:
|
| 82 |
+
x^2 + 5x + 6 = 0
|
| 83 |
+
(x + 2)(x + 3) = 0
|
| 84 |
+
So x + 2 = 0 or x + 3 = 0
|
| 85 |
+
Therefore x = -2 or x = -3
|
| 86 |
+
|
| 87 |
+
💬 Response:
|
| 88 |
+
The values of x are -2 and -3.
|
| 89 |
+
```
|
| 90 |
+
|
| 91 |
+
### Example 2: Quick Fact (Skip Thinking)
|
| 92 |
+
|
| 93 |
+
```
|
| 94 |
+
User: What is the capital of Iraq? /no_think
|
| 95 |
+
|
| 96 |
+
Bot:
|
| 97 |
+
💬 Response:
|
| 98 |
+
The capital of Iraq is Baghdad (بغداد).
|
| 99 |
+
```
|
| 100 |
+
|
| 101 |
+
### Example 3: Multi-Turn Conversation
|
| 102 |
+
|
| 103 |
+
```
|
| 104 |
+
User: How many r's in "strawberry"? /think
|
| 105 |
+
Bot: 🤔 [shows counting process] 💬 There are 3 r's.
|
| 106 |
+
|
| 107 |
+
User: What about "blueberry"? /no_think
|
| 108 |
+
Bot: 💬 There are 2 r's in "blueberry".
|
| 109 |
+
|
| 110 |
+
User: Are you sure? /think
|
| 111 |
+
Bot: 🤔 [recounts carefully] 💬 Yes, confirmed: 2 r's in "blueberry".
|
| 112 |
+
```
|
| 113 |
+
|
| 114 |
+
---
|
| 115 |
+
|
| 116 |
+
## 🎓 Best Practices
|
| 117 |
+
|
| 118 |
+
### ✅ DO Use Thinking Mode For:
|
| 119 |
+
- 🧮 Math equations and calculations
|
| 120 |
+
- 💻 Code generation and debugging
|
| 121 |
+
- 🧩 Logic puzzles and riddles
|
| 122 |
+
- 📊 Data analysis questions
|
| 123 |
+
- 🔍 Complex reasoning tasks
|
| 124 |
+
- 📝 Step-by-step explanations
|
| 125 |
+
|
| 126 |
+
### ❌ DON'T Use Thinking Mode For:
|
| 127 |
+
- 💬 Simple greetings
|
| 128 |
+
- ❓ Basic factual questions
|
| 129 |
+
- 🎨 Creative writing
|
| 130 |
+
- 🗣️ Casual conversation
|
| 131 |
+
- ⚡ When you need quick responses
|
| 132 |
+
|
| 133 |
+
---
|
| 134 |
+
|
| 135 |
+
## ⚙️ Settings Explained
|
| 136 |
+
|
| 137 |
+
### System Prompt
|
| 138 |
+
Customizes the AI's personality and language style.
|
| 139 |
+
|
| 140 |
+
**Default (Iraqi Arabic):**
|
| 141 |
+
```
|
| 142 |
+
انت موديل عراقي ذكي من بغداد. تتحدث باللهجة العراقية فقط...
|
| 143 |
+
```
|
| 144 |
+
|
| 145 |
+
**English Alternative:**
|
| 146 |
+
```
|
| 147 |
+
You are a helpful AI assistant. Provide clear, detailed answers.
|
| 148 |
+
```
|
| 149 |
+
|
| 150 |
+
### Max New Tokens
|
| 151 |
+
Controls response length (100 - 32,768 tokens).
|
| 152 |
+
|
| 153 |
+
- **512**: Short answers
|
| 154 |
+
- **2,048**: Standard (default)
|
| 155 |
+
- **8,192**: Long explanations
|
| 156 |
+
- **32,768**: Maximum (for very complex problems)
|
| 157 |
+
|
| 158 |
+
---
|
| 159 |
+
|
| 160 |
+
## 🐛 Troubleshooting
|
| 161 |
+
|
| 162 |
+
### Issue: Soft switches not working
|
| 163 |
+
**Solution**: Make sure "Enable Thinking Mode" checkbox is ON
|
| 164 |
+
|
| 165 |
+
### Issue: Empty thinking blocks
|
| 166 |
+
**Cause**: You used `/no_think` or the model decided not to think
|
| 167 |
+
**Solution**: This is normal behavior; use `/think` to force thinking
|
| 168 |
+
|
| 169 |
+
### Issue: Responses too slow
|
| 170 |
+
**Solution**:
|
| 171 |
+
1. Disable thinking mode checkbox, OR
|
| 172 |
+
2. Use `/no_think` for specific messages, OR
|
| 173 |
+
3. Reduce Max New Tokens
|
| 174 |
+
|
| 175 |
+
### Issue: Not enough detail in responses
|
| 176 |
+
**Solution**:
|
| 177 |
+
1. Enable thinking mode checkbox
|
| 178 |
+
2. Use `/think` tag
|
| 179 |
+
3. Increase Max New Tokens
|
| 180 |
+
4. Adjust system prompt for more detailed responses
|
| 181 |
+
|
| 182 |
+
---
|
| 183 |
+
|
| 184 |
+
## 🚀 Quick Start Checklist
|
| 185 |
+
|
| 186 |
+
1. ✅ Open the chatbot interface
|
| 187 |
+
2. ✅ Check if "Enable Thinking Mode" is ON (for complex tasks) or OFF (for chat)
|
| 188 |
+
3. ✅ Adjust "Max New Tokens" based on expected response length
|
| 189 |
+
4. ✅ (Optional) Customize System Prompt
|
| 190 |
+
5. ✅ Type your message
|
| 191 |
+
6. ✅ (Optional) Add `/think` or `/no_think` at the end
|
| 192 |
+
7. ✅ Press Enter and wait for response
|
| 193 |
+
|
| 194 |
+
---
|
| 195 |
+
|
| 196 |
+
## 📚 Additional Resources
|
| 197 |
+
|
| 198 |
+
- **Model Page**: https://huggingface.co/Qwen/Qwen3-14B
|
| 199 |
+
- **Documentation**: https://qwenlm.github.io/blog/qwen3/
|
| 200 |
+
- **Unsloth Version**: https://huggingface.co/unsloth/Qwen3-14B
|
| 201 |
+
|
| 202 |
+
---
|
| 203 |
+
|
| 204 |
+
## 💬 Need Help?
|
| 205 |
+
|
| 206 |
+
If you encounter issues or have questions:
|
| 207 |
+
1. Check the CHANGES.md file for detailed technical information
|
| 208 |
+
2. Review the examples above
|
| 209 |
+
3. Experiment with different settings
|
| 210 |
+
4. Read the official Qwen3 documentation
|
| 211 |
+
|
| 212 |
+
Happy chatting! 🎉
|
| 213 |
+
|
app.py
CHANGED
|
@@ -1,180 +1,72 @@
|
|
| 1 |
import os
|
| 2 |
-
import pathlib
|
| 3 |
-
import tempfile
|
| 4 |
from collections.abc import Iterator
|
| 5 |
from threading import Thread
|
| 6 |
|
| 7 |
-
import av
|
| 8 |
import gradio as gr
|
| 9 |
import spaces
|
| 10 |
import torch
|
| 11 |
-
from transformers import
|
| 12 |
from transformers.generation.streamers import TextIteratorStreamer
|
| 13 |
|
| 14 |
-
# Model configuration
|
| 15 |
-
model_id = "
|
| 16 |
-
|
| 17 |
-
model =
|
| 18 |
model_id,
|
| 19 |
device_map="auto",
|
| 20 |
torch_dtype=torch.bfloat16
|
| 21 |
)
|
| 22 |
|
| 23 |
-
#
|
| 24 |
-
|
| 25 |
-
VIDEO_FILE_TYPES = (".mp4", ".mov", ".webm")
|
| 26 |
-
AUDIO_FILE_TYPES = (".mp3", ".wav")
|
| 27 |
-
|
| 28 |
-
# Video processing settings
|
| 29 |
-
TARGET_FPS = int(os.getenv("TARGET_FPS", "3"))
|
| 30 |
-
MAX_FRAMES = int(os.getenv("MAX_FRAMES", "30"))
|
| 31 |
-
MAX_INPUT_TOKENS = int(os.getenv("MAX_INPUT_TOKENS", "10_000"))
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
def get_file_type(path: str) -> str:
|
| 35 |
-
if path.endswith(IMAGE_FILE_TYPES):
|
| 36 |
-
return "image"
|
| 37 |
-
if path.endswith(VIDEO_FILE_TYPES):
|
| 38 |
-
return "video"
|
| 39 |
-
if path.endswith(AUDIO_FILE_TYPES):
|
| 40 |
-
return "audio"
|
| 41 |
-
error_message = f"Unsupported file type: {path}"
|
| 42 |
-
raise ValueError(error_message)
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
def count_files_in_new_message(paths: list[str]) -> tuple[int, int]:
|
| 46 |
-
video_count = 0
|
| 47 |
-
non_video_count = 0
|
| 48 |
-
for path in paths:
|
| 49 |
-
if path.endswith(VIDEO_FILE_TYPES):
|
| 50 |
-
video_count += 1
|
| 51 |
-
else:
|
| 52 |
-
non_video_count += 1
|
| 53 |
-
return video_count, non_video_count
|
| 54 |
-
|
| 55 |
-
|
| 56 |
-
def validate_media_constraints(message: dict) -> bool:
|
| 57 |
-
video_count, non_video_count = count_files_in_new_message(message["files"])
|
| 58 |
-
if video_count > 1:
|
| 59 |
-
gr.Warning("Only one video is supported.")
|
| 60 |
-
return False
|
| 61 |
-
if video_count == 1 and non_video_count > 0:
|
| 62 |
-
gr.Warning("Mixing images and videos is not allowed.")
|
| 63 |
-
return False
|
| 64 |
-
return True
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
def extract_frames_to_tempdir(
|
| 68 |
-
video_path: str,
|
| 69 |
-
target_fps: float,
|
| 70 |
-
max_frames: int | None = None,
|
| 71 |
-
parent_dir: str | None = None,
|
| 72 |
-
prefix: str = "frames_",
|
| 73 |
-
) -> str:
|
| 74 |
-
temp_dir = tempfile.mkdtemp(prefix=prefix, dir=parent_dir)
|
| 75 |
-
|
| 76 |
-
container = av.open(video_path)
|
| 77 |
-
video_stream = container.streams.video[0]
|
| 78 |
-
|
| 79 |
-
if video_stream.duration is None or video_stream.time_base is None:
|
| 80 |
-
raise ValueError("video_stream is missing duration or time_base")
|
| 81 |
-
|
| 82 |
-
time_base = video_stream.time_base
|
| 83 |
-
duration = float(video_stream.duration * time_base)
|
| 84 |
-
interval = 1.0 / target_fps
|
| 85 |
-
|
| 86 |
-
total_frames = int(duration * target_fps)
|
| 87 |
-
if max_frames is not None:
|
| 88 |
-
total_frames = min(total_frames, max_frames)
|
| 89 |
-
|
| 90 |
-
target_times = [i * interval for i in range(total_frames)]
|
| 91 |
-
target_index = 0
|
| 92 |
-
|
| 93 |
-
for frame in container.decode(video=0):
|
| 94 |
-
if frame.pts is None:
|
| 95 |
-
continue
|
| 96 |
-
|
| 97 |
-
timestamp = float(frame.pts * time_base)
|
| 98 |
-
|
| 99 |
-
if target_index < len(target_times) and abs(timestamp - target_times[target_index]) < (interval / 2):
|
| 100 |
-
frame_path = pathlib.Path(temp_dir) / f"frame_{target_index:04d}.jpg"
|
| 101 |
-
frame.to_image().save(frame_path)
|
| 102 |
-
target_index += 1
|
| 103 |
-
|
| 104 |
-
if max_frames is not None and target_index >= max_frames:
|
| 105 |
-
break
|
| 106 |
-
|
| 107 |
-
container.close()
|
| 108 |
-
return temp_dir
|
| 109 |
-
|
| 110 |
-
|
| 111 |
-
def process_new_user_message(message: dict) -> list[dict]:
|
| 112 |
-
if not message["files"]:
|
| 113 |
-
return [{"type": "text", "text": message["text"]}]
|
| 114 |
-
|
| 115 |
-
file_types = [get_file_type(path) for path in message["files"]]
|
| 116 |
-
|
| 117 |
-
if len(file_types) == 1 and file_types[0] == "video":
|
| 118 |
-
gr.Info(f"Video will be processed at {TARGET_FPS} FPS, max {MAX_FRAMES} frames in this Space.")
|
| 119 |
-
|
| 120 |
-
temp_dir = extract_frames_to_tempdir(
|
| 121 |
-
message["files"][0],
|
| 122 |
-
target_fps=TARGET_FPS,
|
| 123 |
-
max_frames=MAX_FRAMES,
|
| 124 |
-
)
|
| 125 |
-
paths = sorted(pathlib.Path(temp_dir).glob("*.jpg"))
|
| 126 |
-
return [
|
| 127 |
-
{"type": "text", "text": message["text"]},
|
| 128 |
-
*[{"type": "image", "image": path.as_posix()} for path in paths],
|
| 129 |
-
]
|
| 130 |
|
| 131 |
-
return [
|
| 132 |
-
{"type": "text", "text": message["text"]},
|
| 133 |
-
*[{"type": file_type, file_type: path} for path, file_type in zip(message["files"], file_types, strict=True)],
|
| 134 |
-
]
|
| 135 |
|
| 136 |
-
|
| 137 |
-
|
|
|
|
|
|
|
| 138 |
messages = []
|
| 139 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 140 |
for item in history:
|
| 141 |
if item["role"] == "assistant":
|
| 142 |
-
|
| 143 |
-
|
| 144 |
-
|
| 145 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 146 |
else:
|
|
|
|
| 147 |
content = item["content"]
|
| 148 |
if isinstance(content, str):
|
| 149 |
-
|
| 150 |
else:
|
| 151 |
-
|
| 152 |
-
|
| 153 |
-
|
| 154 |
-
|
| 155 |
-
|
| 156 |
-
|
| 157 |
-
|
| 158 |
-
|
| 159 |
-
|
| 160 |
-
|
| 161 |
-
yield ""
|
| 162 |
-
return
|
| 163 |
-
|
| 164 |
-
messages = []
|
| 165 |
-
if system_prompt:
|
| 166 |
-
messages.append({"role": "system", "content": [{"type": "text", "text": system_prompt}]})
|
| 167 |
-
messages.extend(process_history(history))
|
| 168 |
-
messages.append({"role": "user", "content": process_new_user_message(message)})
|
| 169 |
-
|
| 170 |
-
inputs = processor.apply_chat_template(
|
| 171 |
messages,
|
|
|
|
| 172 |
add_generation_prompt=True,
|
| 173 |
-
|
| 174 |
-
return_dict=True,
|
| 175 |
-
return_tensors="pt",
|
| 176 |
)
|
| 177 |
-
|
|
|
|
|
|
|
|
|
|
| 178 |
if n_tokens > MAX_INPUT_TOKENS:
|
| 179 |
gr.Warning(
|
| 180 |
f"Input too long. Max {MAX_INPUT_TOKENS} tokens. Got {n_tokens} tokens. This limit is set to avoid CUDA out-of-memory errors in this Space."
|
|
@@ -182,36 +74,77 @@ def generate(message: dict, history: list[dict], system_prompt: str = "", max_ne
|
|
| 182 |
yield ""
|
| 183 |
return
|
| 184 |
|
| 185 |
-
|
| 186 |
-
|
| 187 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 188 |
generate_kwargs = dict(
|
| 189 |
-
|
| 190 |
streamer=streamer,
|
| 191 |
max_new_tokens=max_new_tokens,
|
| 192 |
do_sample=True,
|
| 193 |
-
temperature=
|
| 194 |
-
top_k=
|
| 195 |
-
top_p=
|
| 196 |
min_p=0.0,
|
| 197 |
-
repetition_penalty=1.0,
|
| 198 |
-
disable_compile=True,
|
| 199 |
-
|
| 200 |
)
|
| 201 |
t = Thread(target=model.generate, kwargs=generate_kwargs)
|
| 202 |
t.start()
|
| 203 |
|
| 204 |
output = ""
|
|
|
|
|
|
|
|
|
|
| 205 |
for delta in streamer:
|
| 206 |
output += delta
|
| 207 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 208 |
|
| 209 |
|
| 210 |
-
# Examples for the chat interface (with additional inputs: system_prompt, max_new_tokens)
|
| 211 |
examples = [
|
| 212 |
-
["What is the capital of France?", "You are a helpful assistant.", 700],
|
| 213 |
-
["Explain quantum computing in simple terms", "You are a helpful assistant.", 512],
|
| 214 |
-
["
|
| 215 |
]
|
| 216 |
|
| 217 |
system_prompt = (
|
|
@@ -224,17 +157,27 @@ system_prompt = (
|
|
| 224 |
demo = gr.ChatInterface(
|
| 225 |
fn=generate,
|
| 226 |
type="messages",
|
| 227 |
-
textbox=gr.
|
| 228 |
-
|
| 229 |
-
file_count="multiple",
|
| 230 |
autofocus=True,
|
| 231 |
),
|
| 232 |
-
multimodal=
|
| 233 |
additional_inputs=[
|
| 234 |
gr.Textbox(label="System Prompt", value=system_prompt),
|
| 235 |
-
gr.Slider(label="Max New Tokens", minimum=100, maximum=
|
|
|
|
| 236 |
],
|
| 237 |
-
title="
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 238 |
examples=examples,
|
| 239 |
stop_btn=False,
|
| 240 |
css="""
|
|
|
|
| 1 |
import os
|
|
|
|
|
|
|
| 2 |
from collections.abc import Iterator
|
| 3 |
from threading import Thread
|
| 4 |
|
|
|
|
| 5 |
import gradio as gr
|
| 6 |
import spaces
|
| 7 |
import torch
|
| 8 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 9 |
from transformers.generation.streamers import TextIteratorStreamer
|
| 10 |
|
| 11 |
+
# Model configuration - Changed to Qwen3-14B
|
| 12 |
+
model_id = "Qwen/Qwen3-14B"
|
| 13 |
+
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
| 14 |
+
model = AutoModelForCausalLM.from_pretrained(
|
| 15 |
model_id,
|
| 16 |
device_map="auto",
|
| 17 |
torch_dtype=torch.bfloat16
|
| 18 |
)
|
| 19 |
|
| 20 |
+
# Settings
|
| 21 |
+
MAX_INPUT_TOKENS = int(os.getenv("MAX_INPUT_TOKENS", "32_000"))
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 22 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 23 |
|
| 24 |
+
@spaces.GPU()
|
| 25 |
+
@torch.inference_mode()
|
| 26 |
+
def generate(message: dict, history: list[dict], system_prompt: str = "", max_new_tokens: int = 512, enable_thinking: bool = True) -> Iterator[str]:
|
| 27 |
+
# Build messages for Qwen3 (text-only format)
|
| 28 |
messages = []
|
| 29 |
+
if system_prompt:
|
| 30 |
+
messages.append({"role": "system", "content": system_prompt})
|
| 31 |
+
|
| 32 |
+
# Process history - convert to simple text format
|
| 33 |
+
# Note: Don't include thinking content in history (best practice)
|
| 34 |
for item in history:
|
| 35 |
if item["role"] == "assistant":
|
| 36 |
+
# Extract only the response part (without thinking content)
|
| 37 |
+
content = item["content"]
|
| 38 |
+
# Remove thinking process markers if present
|
| 39 |
+
if "**🤔 Thinking Process:**" in content:
|
| 40 |
+
# Extract only the response part
|
| 41 |
+
parts = content.split("**💬 Response:**")
|
| 42 |
+
if len(parts) > 1:
|
| 43 |
+
content = parts[1].strip()
|
| 44 |
+
messages.append({"role": "assistant", "content": content})
|
| 45 |
else:
|
| 46 |
+
# Extract text from user message
|
| 47 |
content = item["content"]
|
| 48 |
if isinstance(content, str):
|
| 49 |
+
messages.append({"role": "user", "content": content})
|
| 50 |
else:
|
| 51 |
+
# For now, just use the text part (Qwen3-14B is text-only)
|
| 52 |
+
messages.append({"role": "user", "content": message.get("text", "")})
|
| 53 |
+
|
| 54 |
+
# Add current user message
|
| 55 |
+
current_message = message.get("text", "")
|
| 56 |
+
messages.append({"role": "user", "content": current_message})
|
| 57 |
+
|
| 58 |
+
# Apply chat template with enable_thinking parameter
|
| 59 |
+
# Note: When enable_thinking=True, the model supports /think and /no_think soft switches
|
| 60 |
+
text = tokenizer.apply_chat_template(
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 61 |
messages,
|
| 62 |
+
tokenize=False,
|
| 63 |
add_generation_prompt=True,
|
| 64 |
+
enable_thinking=enable_thinking
|
|
|
|
|
|
|
| 65 |
)
|
| 66 |
+
|
| 67 |
+
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
|
| 68 |
+
n_tokens = model_inputs["input_ids"].shape[1]
|
| 69 |
+
|
| 70 |
if n_tokens > MAX_INPUT_TOKENS:
|
| 71 |
gr.Warning(
|
| 72 |
f"Input too long. Max {MAX_INPUT_TOKENS} tokens. Got {n_tokens} tokens. This limit is set to avoid CUDA out-of-memory errors in this Space."
|
|
|
|
| 74 |
yield ""
|
| 75 |
return
|
| 76 |
|
| 77 |
+
# Set generation parameters based on mode
|
| 78 |
+
if enable_thinking:
|
| 79 |
+
# Thinking mode: Temperature=0.6, TopP=0.95, TopK=20, MinP=0
|
| 80 |
+
# DO NOT use greedy decoding (temperature=0) to avoid performance degradation
|
| 81 |
+
temperature = 0.6
|
| 82 |
+
top_p = 0.95
|
| 83 |
+
top_k = 20
|
| 84 |
+
else:
|
| 85 |
+
# Non-thinking mode: Temperature=0.7, TopP=0.8, TopK=20, MinP=0
|
| 86 |
+
temperature = 0.7
|
| 87 |
+
top_p = 0.8
|
| 88 |
+
top_k = 20
|
| 89 |
+
|
| 90 |
+
streamer = TextIteratorStreamer(tokenizer, timeout=30.0, skip_prompt=True, skip_special_tokens=False)
|
| 91 |
generate_kwargs = dict(
|
| 92 |
+
**model_inputs,
|
| 93 |
streamer=streamer,
|
| 94 |
max_new_tokens=max_new_tokens,
|
| 95 |
do_sample=True,
|
| 96 |
+
temperature=temperature,
|
| 97 |
+
top_k=top_k,
|
| 98 |
+
top_p=top_p,
|
| 99 |
min_p=0.0,
|
|
|
|
|
|
|
|
|
|
| 100 |
)
|
| 101 |
t = Thread(target=model.generate, kwargs=generate_kwargs)
|
| 102 |
t.start()
|
| 103 |
|
| 104 |
output = ""
|
| 105 |
+
thinking_content = ""
|
| 106 |
+
response_content = ""
|
| 107 |
+
|
| 108 |
for delta in streamer:
|
| 109 |
output += delta
|
| 110 |
+
|
| 111 |
+
# Parse thinking content if in thinking mode
|
| 112 |
+
# When enable_thinking=True, the model always outputs <think>...</think> block
|
| 113 |
+
# (even if empty when using /no_think soft switch)
|
| 114 |
+
if enable_thinking and "<think>" in output:
|
| 115 |
+
if "</think>" in output:
|
| 116 |
+
# Extract thinking and response parts
|
| 117 |
+
try:
|
| 118 |
+
think_start = output.index("<think>") + 7
|
| 119 |
+
think_end = output.index("</think>")
|
| 120 |
+
thinking_content = output[think_start:think_end].strip()
|
| 121 |
+
response_content = output[think_end + 8:].strip()
|
| 122 |
+
|
| 123 |
+
# Display formatted output
|
| 124 |
+
if thinking_content:
|
| 125 |
+
# Thinking content exists (user didn't use /no_think or used /think)
|
| 126 |
+
formatted_output = f"**🤔 Thinking Process:**\n{thinking_content}\n\n**💬 Response:**\n{response_content}"
|
| 127 |
+
else:
|
| 128 |
+
# Empty thinking block (user used /no_think soft switch)
|
| 129 |
+
formatted_output = f"**💬 Response:**\n{response_content}"
|
| 130 |
+
|
| 131 |
+
yield formatted_output
|
| 132 |
+
except ValueError:
|
| 133 |
+
# Still parsing, yield raw output
|
| 134 |
+
yield output
|
| 135 |
+
else:
|
| 136 |
+
# Still generating thinking content
|
| 137 |
+
yield output
|
| 138 |
+
else:
|
| 139 |
+
# Non-thinking mode or no <think> tag yet
|
| 140 |
+
yield output
|
| 141 |
|
| 142 |
|
| 143 |
+
# Examples for the chat interface (with additional inputs: system_prompt, max_new_tokens, enable_thinking)
|
| 144 |
examples = [
|
| 145 |
+
["What is the capital of France? /no_think", "You are a helpful assistant.", 700, True],
|
| 146 |
+
["Explain quantum computing in simple terms", "You are a helpful assistant.", 512, False],
|
| 147 |
+
["Solve this math problem: If x^2 + 5x + 6 = 0, what are the values of x? /think", "You are a helpful assistant.", 2000, True]
|
| 148 |
]
|
| 149 |
|
| 150 |
system_prompt = (
|
|
|
|
| 157 |
demo = gr.ChatInterface(
|
| 158 |
fn=generate,
|
| 159 |
type="messages",
|
| 160 |
+
textbox=gr.Textbox(
|
| 161 |
+
placeholder="Type your message here...",
|
|
|
|
| 162 |
autofocus=True,
|
| 163 |
),
|
| 164 |
+
multimodal=False, # Qwen3-14B is text-only
|
| 165 |
additional_inputs=[
|
| 166 |
gr.Textbox(label="System Prompt", value=system_prompt),
|
| 167 |
+
gr.Slider(label="Max New Tokens", minimum=100, maximum=32768, step=100, value=2048),
|
| 168 |
+
gr.Checkbox(label="Enable Thinking Mode", value=True, info="Enable for complex reasoning tasks (math, coding). Disable for faster general chat."),
|
| 169 |
],
|
| 170 |
+
title="Qwen3-14B Iraqi Chatbot with Thinking Mode",
|
| 171 |
+
description="""
|
| 172 |
+
🤔 **Thinking Mode ON**: Better for math, coding, and complex reasoning
|
| 173 |
+
💬 **Thinking Mode OFF**: Faster responses for general conversation
|
| 174 |
+
|
| 175 |
+
**💡 Pro Tip**: When Thinking Mode is enabled, you can use:
|
| 176 |
+
- `/think` in your message to force thinking for that turn
|
| 177 |
+
- `/no_think` in your message to skip thinking for that turn
|
| 178 |
+
|
| 179 |
+
Example: "Solve this equation: x^2 + 5x + 6 = 0 /think"
|
| 180 |
+
""",
|
| 181 |
examples=examples,
|
| 182 |
stop_btn=False,
|
| 183 |
css="""
|
requirements.txt
CHANGED
|
@@ -1,8 +1,5 @@
|
|
| 1 |
gradio>=4.0.0
|
| 2 |
spaces[huggingface]>=0.28.0
|
| 3 |
-
transformers>=4.
|
| 4 |
torch>=2.1.0
|
| 5 |
-
|
| 6 |
-
accelerate>=0.25.0
|
| 7 |
-
timm
|
| 8 |
-
gTTS>=2.5.0
|
|
|
|
| 1 |
gradio>=4.0.0
|
| 2 |
spaces[huggingface]>=0.28.0
|
| 3 |
+
transformers>=4.51.0
|
| 4 |
torch>=2.1.0
|
| 5 |
+
accelerate>=0.25.0
|
|
|
|
|
|
|
|
|