anaspro commited on
Commit
55612d9
·
1 Parent(s): d4f3bf5
Files changed (5) hide show
  1. CHANGES.md +173 -0
  2. README.md +43 -6
  3. USAGE_GUIDE.md +213 -0
  4. app.py +115 -172
  5. requirements.txt +2 -5
CHANGES.md ADDED
@@ -0,0 +1,173 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Chatbox2 - Qwen3-14B Update
2
+
3
+ ## Summary of Changes
4
+
5
+ Your chatbox has been successfully upgraded to use **Qwen3-14B** with thinking/non-thinking mode capabilities!
6
+
7
+ ## What Changed
8
+
9
+ ### 1. **Model Upgrade**
10
+ - **Old Model**: `anaspro/Shako-iraqi-4B-it` (multimodal)
11
+ - **New Model**: `Qwen/Qwen3-14B` (text-only with thinking capabilities)
12
+
13
+ ### 2. **New Features**
14
+
15
+ #### **Thinking Mode Toggle** 🤔
16
+ You can now switch between two modes:
17
+
18
+ - **Thinking Mode ON** (default):
19
+ - Best for: Math problems, coding, complex reasoning
20
+ - The model shows its reasoning process in `<think>...</think>` tags
21
+ - Uses Temperature=0.6, TopP=0.95, TopK=20
22
+ - More detailed and thorough responses
23
+
24
+ - **Thinking Mode OFF**:
25
+ - Best for: General conversation, quick responses
26
+ - Faster responses without showing reasoning
27
+ - Uses Temperature=0.7, TopP=0.8, TopK=20
28
+ - More efficient for casual chat
29
+
30
+ ### 3. **Updated Parameters**
31
+ - Max tokens increased from 2048 to 32768 (matching Qwen3's capabilities)
32
+ - Optimized generation parameters based on mode
33
+ - Removed multimodal support (images/videos) as Qwen3-14B is text-only
34
+
35
+ ### 4. **UI Improvements**
36
+ - Added checkbox to toggle thinking mode
37
+ - Updated title and description
38
+ - New examples showcasing both modes
39
+
40
+ ## How to Use
41
+
42
+ ### Basic Usage
43
+ 1. Type your message in the textbox
44
+ 2. Adjust settings in the sidebar:
45
+ - **System Prompt**: Customize the AI's behavior (default: Iraqi dialect)
46
+ - **Max New Tokens**: Control response length (100-32768)
47
+ - **Enable Thinking Mode**: Toggle between thinking/non-thinking
48
+
49
+ ### When to Use Thinking Mode
50
+
51
+ ✅ **Enable Thinking Mode for:**
52
+ - Math problems
53
+ - Coding challenges
54
+ - Complex logical reasoning
55
+ - Step-by-step explanations
56
+ - Problem-solving tasks
57
+
58
+ ❌ **Disable Thinking Mode for:**
59
+ - General conversation
60
+ - Quick questions
61
+ - Creative writing
62
+ - Casual chat
63
+ - When you need faster responses
64
+
65
+ ### Advanced: Soft Switching with `/think` and `/no_think`
66
+
67
+ When **Enable Thinking Mode** checkbox is ON, you can dynamically control thinking behavior per message using soft switches:
68
+
69
+ - Add `/think` to your message to **force thinking** for that specific turn
70
+ - Add `/no_think` to your message to **skip thinking** for that specific turn
71
+
72
+ **Important Notes:**
73
+ - Soft switches only work when the "Enable Thinking Mode" checkbox is checked (ON)
74
+ - When using `/no_think`, the model still outputs `<think>...</think>` tags, but they will be empty
75
+ - The model follows the most recent instruction in multi-turn conversations
76
+ - You can add the switch anywhere in your message (beginning or end)
77
+
78
+ **Examples:**
79
+
80
+ ```
81
+ User: What is the capital of France? /no_think
82
+ Bot: 💬 Response: Paris is the capital of France.
83
+ ```
84
+
85
+ ```
86
+ User: Solve this complex equation: x^3 + 2x^2 - 5x + 1 = 0 /think
87
+ Bot: 🤔 Thinking Process: Let me approach this step by step...
88
+ 💬 Response: The solutions are approximately...
89
+ ```
90
+
91
+ ```
92
+ User: How many r's in strawberry? /think
93
+ Bot: 🤔 Thinking Process: Let me count each letter: s-t-r-a-w-b-e-r-r-y...
94
+ 💬 Response: There are 3 r's in "strawberry".
95
+
96
+ User: What about blueberry? /no_think
97
+ Bot: 💬 Response: There are 2 r's in "blueberry".
98
+
99
+ User: Really? /think
100
+ Bot: 🤔 Thinking Process: Let me recount: b-l-u-e-b-e-r-r-y...
101
+ 💬 Response: Yes, there are 2 r's in "blueberry" (positions 7 and 8).
102
+ ```
103
+
104
+ **When Soft Switches Don't Work:**
105
+ - If "Enable Thinking Mode" checkbox is OFF, soft switches are ignored
106
+ - The model will not generate any `<think>` tags regardless of `/think` or `/no_think` in your message
107
+
108
+ ## Technical Details
109
+
110
+ ### Dependencies Updated
111
+ - `transformers>=4.51.0` (required for Qwen3 support)
112
+ - Removed: `av`, `timm`, `gTTS` (no longer needed)
113
+
114
+ ### Model Configuration
115
+ ```python
116
+ model_id = "Qwen/Qwen3-14B"
117
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
118
+ model = AutoModelForCausalLM.from_pretrained(
119
+ model_id,
120
+ device_map="auto",
121
+ torch_dtype=torch.bfloat16
122
+ )
123
+ ```
124
+
125
+ ### Generation Parameters
126
+
127
+ **Thinking Mode:**
128
+ - Temperature: 0.6
129
+ - Top-P: 0.95
130
+ - Top-K: 20
131
+ - Min-P: 0.0
132
+
133
+ **Non-Thinking Mode:**
134
+ - Temperature: 0.7
135
+ - Top-P: 0.8
136
+ - Top-K: 20
137
+ - Min-P: 0.0
138
+
139
+ ## Running the Application
140
+
141
+ ```bash
142
+ python app.py
143
+ ```
144
+
145
+ The app will launch on `http://localhost:7860` by default.
146
+
147
+ ## Notes
148
+
149
+ 1. **Text-Only**: Qwen3-14B doesn't support images, videos, or audio. The multimodal features have been removed.
150
+
151
+ 2. **Context Length**: The model supports up to 32,768 tokens natively. For longer contexts (up to 131,072), you can enable YaRN scaling (see Qwen3 documentation).
152
+
153
+ 3. **Iraqi Dialect**: The default system prompt is configured for Iraqi Arabic dialect. You can modify this in the System Prompt field.
154
+
155
+ 4. **GPU Requirements**: Qwen3-14B requires significant GPU memory. Make sure you have adequate resources.
156
+
157
+ ## Reference
158
+
159
+ For more information about Qwen3-14B capabilities, visit:
160
+ - Model Page: https://huggingface.co/Qwen/Qwen3-14B
161
+ - Documentation: https://qwenlm.github.io/blog/qwen3/
162
+
163
+ ## Troubleshooting
164
+
165
+ **Issue**: `KeyError: 'qwen3'`
166
+ **Solution**: Make sure you have `transformers>=4.51.0` installed
167
+
168
+ **Issue**: Out of memory errors
169
+ **Solution**: Reduce `max_new_tokens` or use a smaller batch size
170
+
171
+ **Issue**: Slow responses
172
+ **Solution**: Disable thinking mode for faster generation
173
+
README.md CHANGED
@@ -1,13 +1,50 @@
1
  ---
2
- title: Chatbox2
3
- emoji: 🦀
4
- colorFrom: red
5
- colorTo: gray
6
  sdk: gradio
7
  sdk_version: 5.49.1
8
  app_file: app.py
9
  pinned: false
10
- short_description: shakoo
11
  ---
12
 
13
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Qwen3-14B Iraqi Chatbot
3
+ emoji: 🤔
4
+ colorFrom: blue
5
+ colorTo: purple
6
  sdk: gradio
7
  sdk_version: 5.49.1
8
  app_file: app.py
9
  pinned: false
10
+ short_description: Qwen3-14B with thinking mode for Iraqi Arabic
11
  ---
12
 
13
+ # Qwen3-14B Iraqi Chatbot with Thinking Mode
14
+
15
+ An advanced chatbot powered by **Qwen3-14B** with seamless switching between thinking and non-thinking modes.
16
+
17
+ ## Features
18
+
19
+ - 🤔 **Thinking Mode**: Enhanced reasoning for complex tasks (math, coding, logic)
20
+ - 💬 **Non-Thinking Mode**: Fast responses for general conversation
21
+ - 🇮🇶 **Iraqi Dialect**: Optimized for Iraqi Arabic conversations
22
+ - 🎯 **32K Context**: Supports up to 32,768 tokens
23
+
24
+ ## Quick Start
25
+
26
+ 1. Type your question in the chat
27
+ 2. Toggle "Enable Thinking Mode" for complex reasoning tasks
28
+ 3. Adjust system prompt and max tokens as needed
29
+
30
+ ## When to Use Thinking Mode
31
+
32
+ **Enable for:**
33
+ - Math problems and equations
34
+ - Coding challenges
35
+ - Complex reasoning tasks
36
+ - Step-by-step explanations
37
+
38
+ **Disable for:**
39
+ - Quick questions
40
+ - General conversation
41
+ - Creative writing
42
+ - Faster responses
43
+
44
+ ## Technical Details
45
+
46
+ - **Model**: Qwen/Qwen3-14B
47
+ - **Context Length**: 32,768 tokens (native)
48
+ - **Parameters**: 14.8B total, 13.2B non-embedding
49
+
50
+ Check out the [Qwen3 documentation](https://huggingface.co/Qwen/Qwen3-14B) for more details.
USAGE_GUIDE.md ADDED
@@ -0,0 +1,213 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Qwen3-14B Chatbot - Quick Usage Guide
2
+
3
+ ## 🎯 Three Ways to Control Thinking Mode
4
+
5
+ ### 1. **Hard Switch: Enable Thinking Mode Checkbox**
6
+
7
+ Located in the sidebar, this is the main control:
8
+
9
+ - ✅ **Checked (ON)**: Thinking mode is available
10
+ - Model can show reasoning process
11
+ - Supports `/think` and `/no_think` soft switches
12
+ - Best for: Math, coding, complex reasoning
13
+ - Parameters: Temp=0.6, TopP=0.95, TopK=20
14
+
15
+ - ❌ **Unchecked (OFF)**: Thinking mode is disabled
16
+ - No thinking process shown
17
+ - Faster responses
18
+ - Soft switches are ignored
19
+ - Best for: General chat, quick questions
20
+ - Parameters: Temp=0.7, TopP=0.8, TopK=20
21
+
22
+ ---
23
+
24
+ ### 2. **Soft Switch: `/think` Tag**
25
+
26
+ Force thinking for a specific message (only works when checkbox is ON):
27
+
28
+ ```
29
+ User: How many r's are in "strawberry"? /think
30
+ ```
31
+
32
+ **Result:**
33
+ ```
34
+ 🤔 Thinking Process:
35
+ Let me count each letter carefully:
36
+ s-t-r-a-w-b-e-r-r-y
37
+ The r's appear at positions 3, 8, and 9.
38
+
39
+ 💬 Response:
40
+ There are 3 r's in the word "strawberry".
41
+ ```
42
+
43
+ ---
44
+
45
+ ### 3. **Soft Switch: `/no_think` Tag**
46
+
47
+ Skip thinking for a specific message (only works when checkbox is ON):
48
+
49
+ ```
50
+ User: What is 2+2? /no_think
51
+ ```
52
+
53
+ **Result:**
54
+ ```
55
+ 💬 Response:
56
+ 2+2 equals 4.
57
+ ```
58
+
59
+ ---
60
+
61
+ ## 📊 Comparison Table
62
+
63
+ | Feature | Checkbox ON + `/think` | Checkbox ON + `/no_think` | Checkbox ON (default) | Checkbox OFF |
64
+ |---------|----------------------|--------------------------|---------------------|--------------|
65
+ | Shows thinking | ✅ Yes | ❌ No | ✅ Yes | ❌ No |
66
+ | `<think>` tags | ✅ With content | ⚠️ Empty | ✅ With content | ❌ None |
67
+ | Speed | 🐢 Slower | 🚀 Faster | 🐢 Slower | 🚀 Fastest |
68
+ | Best for | Complex problems | Quick answers | Reasoning tasks | General chat |
69
+
70
+ ---
71
+
72
+ ## 💡 Real-World Examples
73
+
74
+ ### Example 1: Math Problem (Use Thinking)
75
+
76
+ ```
77
+ User: Solve: If x^2 + 5x + 6 = 0, what are the values of x? /think
78
+
79
+ Bot:
80
+ 🤔 Thinking Process:
81
+ This is a quadratic equation. I can solve it by factoring:
82
+ x^2 + 5x + 6 = 0
83
+ (x + 2)(x + 3) = 0
84
+ So x + 2 = 0 or x + 3 = 0
85
+ Therefore x = -2 or x = -3
86
+
87
+ 💬 Response:
88
+ The values of x are -2 and -3.
89
+ ```
90
+
91
+ ### Example 2: Quick Fact (Skip Thinking)
92
+
93
+ ```
94
+ User: What is the capital of Iraq? /no_think
95
+
96
+ Bot:
97
+ 💬 Response:
98
+ The capital of Iraq is Baghdad (بغداد).
99
+ ```
100
+
101
+ ### Example 3: Multi-Turn Conversation
102
+
103
+ ```
104
+ User: How many r's in "strawberry"? /think
105
+ Bot: 🤔 [shows counting process] 💬 There are 3 r's.
106
+
107
+ User: What about "blueberry"? /no_think
108
+ Bot: 💬 There are 2 r's in "blueberry".
109
+
110
+ User: Are you sure? /think
111
+ Bot: 🤔 [recounts carefully] 💬 Yes, confirmed: 2 r's in "blueberry".
112
+ ```
113
+
114
+ ---
115
+
116
+ ## 🎓 Best Practices
117
+
118
+ ### ✅ DO Use Thinking Mode For:
119
+ - 🧮 Math equations and calculations
120
+ - 💻 Code generation and debugging
121
+ - 🧩 Logic puzzles and riddles
122
+ - 📊 Data analysis questions
123
+ - 🔍 Complex reasoning tasks
124
+ - 📝 Step-by-step explanations
125
+
126
+ ### ❌ DON'T Use Thinking Mode For:
127
+ - 💬 Simple greetings
128
+ - ❓ Basic factual questions
129
+ - 🎨 Creative writing
130
+ - 🗣️ Casual conversation
131
+ - ⚡ When you need quick responses
132
+
133
+ ---
134
+
135
+ ## ⚙️ Settings Explained
136
+
137
+ ### System Prompt
138
+ Customizes the AI's personality and language style.
139
+
140
+ **Default (Iraqi Arabic):**
141
+ ```
142
+ انت موديل عراقي ذكي من بغداد. تتحدث باللهجة العراقية فقط...
143
+ ```
144
+
145
+ **English Alternative:**
146
+ ```
147
+ You are a helpful AI assistant. Provide clear, detailed answers.
148
+ ```
149
+
150
+ ### Max New Tokens
151
+ Controls response length (100 - 32,768 tokens).
152
+
153
+ - **512**: Short answers
154
+ - **2,048**: Standard (default)
155
+ - **8,192**: Long explanations
156
+ - **32,768**: Maximum (for very complex problems)
157
+
158
+ ---
159
+
160
+ ## 🐛 Troubleshooting
161
+
162
+ ### Issue: Soft switches not working
163
+ **Solution**: Make sure "Enable Thinking Mode" checkbox is ON
164
+
165
+ ### Issue: Empty thinking blocks
166
+ **Cause**: You used `/no_think` or the model decided not to think
167
+ **Solution**: This is normal behavior; use `/think` to force thinking
168
+
169
+ ### Issue: Responses too slow
170
+ **Solution**:
171
+ 1. Disable thinking mode checkbox, OR
172
+ 2. Use `/no_think` for specific messages, OR
173
+ 3. Reduce Max New Tokens
174
+
175
+ ### Issue: Not enough detail in responses
176
+ **Solution**:
177
+ 1. Enable thinking mode checkbox
178
+ 2. Use `/think` tag
179
+ 3. Increase Max New Tokens
180
+ 4. Adjust system prompt for more detailed responses
181
+
182
+ ---
183
+
184
+ ## 🚀 Quick Start Checklist
185
+
186
+ 1. ✅ Open the chatbot interface
187
+ 2. ✅ Check if "Enable Thinking Mode" is ON (for complex tasks) or OFF (for chat)
188
+ 3. ✅ Adjust "Max New Tokens" based on expected response length
189
+ 4. ✅ (Optional) Customize System Prompt
190
+ 5. ✅ Type your message
191
+ 6. ✅ (Optional) Add `/think` or `/no_think` at the end
192
+ 7. ✅ Press Enter and wait for response
193
+
194
+ ---
195
+
196
+ ## 📚 Additional Resources
197
+
198
+ - **Model Page**: https://huggingface.co/Qwen/Qwen3-14B
199
+ - **Documentation**: https://qwenlm.github.io/blog/qwen3/
200
+ - **Unsloth Version**: https://huggingface.co/unsloth/Qwen3-14B
201
+
202
+ ---
203
+
204
+ ## 💬 Need Help?
205
+
206
+ If you encounter issues or have questions:
207
+ 1. Check the CHANGES.md file for detailed technical information
208
+ 2. Review the examples above
209
+ 3. Experiment with different settings
210
+ 4. Read the official Qwen3 documentation
211
+
212
+ Happy chatting! 🎉
213
+
app.py CHANGED
@@ -1,180 +1,72 @@
1
  import os
2
- import pathlib
3
- import tempfile
4
  from collections.abc import Iterator
5
  from threading import Thread
6
 
7
- import av
8
  import gradio as gr
9
  import spaces
10
  import torch
11
- from transformers import AutoModelForImageTextToText, AutoProcessor
12
  from transformers.generation.streamers import TextIteratorStreamer
13
 
14
- # Model configuration
15
- model_id = "anaspro/Shako-iraqi-4B-it"
16
- processor = AutoProcessor.from_pretrained(model_id)
17
- model = AutoModelForImageTextToText.from_pretrained(
18
  model_id,
19
  device_map="auto",
20
  torch_dtype=torch.bfloat16
21
  )
22
 
23
- # Supported file types
24
- IMAGE_FILE_TYPES = (".jpg", ".jpeg", ".png", ".webp")
25
- VIDEO_FILE_TYPES = (".mp4", ".mov", ".webm")
26
- AUDIO_FILE_TYPES = (".mp3", ".wav")
27
-
28
- # Video processing settings
29
- TARGET_FPS = int(os.getenv("TARGET_FPS", "3"))
30
- MAX_FRAMES = int(os.getenv("MAX_FRAMES", "30"))
31
- MAX_INPUT_TOKENS = int(os.getenv("MAX_INPUT_TOKENS", "10_000"))
32
-
33
-
34
- def get_file_type(path: str) -> str:
35
- if path.endswith(IMAGE_FILE_TYPES):
36
- return "image"
37
- if path.endswith(VIDEO_FILE_TYPES):
38
- return "video"
39
- if path.endswith(AUDIO_FILE_TYPES):
40
- return "audio"
41
- error_message = f"Unsupported file type: {path}"
42
- raise ValueError(error_message)
43
-
44
-
45
- def count_files_in_new_message(paths: list[str]) -> tuple[int, int]:
46
- video_count = 0
47
- non_video_count = 0
48
- for path in paths:
49
- if path.endswith(VIDEO_FILE_TYPES):
50
- video_count += 1
51
- else:
52
- non_video_count += 1
53
- return video_count, non_video_count
54
-
55
-
56
- def validate_media_constraints(message: dict) -> bool:
57
- video_count, non_video_count = count_files_in_new_message(message["files"])
58
- if video_count > 1:
59
- gr.Warning("Only one video is supported.")
60
- return False
61
- if video_count == 1 and non_video_count > 0:
62
- gr.Warning("Mixing images and videos is not allowed.")
63
- return False
64
- return True
65
-
66
-
67
- def extract_frames_to_tempdir(
68
- video_path: str,
69
- target_fps: float,
70
- max_frames: int | None = None,
71
- parent_dir: str | None = None,
72
- prefix: str = "frames_",
73
- ) -> str:
74
- temp_dir = tempfile.mkdtemp(prefix=prefix, dir=parent_dir)
75
-
76
- container = av.open(video_path)
77
- video_stream = container.streams.video[0]
78
-
79
- if video_stream.duration is None or video_stream.time_base is None:
80
- raise ValueError("video_stream is missing duration or time_base")
81
-
82
- time_base = video_stream.time_base
83
- duration = float(video_stream.duration * time_base)
84
- interval = 1.0 / target_fps
85
-
86
- total_frames = int(duration * target_fps)
87
- if max_frames is not None:
88
- total_frames = min(total_frames, max_frames)
89
-
90
- target_times = [i * interval for i in range(total_frames)]
91
- target_index = 0
92
-
93
- for frame in container.decode(video=0):
94
- if frame.pts is None:
95
- continue
96
-
97
- timestamp = float(frame.pts * time_base)
98
-
99
- if target_index < len(target_times) and abs(timestamp - target_times[target_index]) < (interval / 2):
100
- frame_path = pathlib.Path(temp_dir) / f"frame_{target_index:04d}.jpg"
101
- frame.to_image().save(frame_path)
102
- target_index += 1
103
-
104
- if max_frames is not None and target_index >= max_frames:
105
- break
106
-
107
- container.close()
108
- return temp_dir
109
-
110
-
111
- def process_new_user_message(message: dict) -> list[dict]:
112
- if not message["files"]:
113
- return [{"type": "text", "text": message["text"]}]
114
-
115
- file_types = [get_file_type(path) for path in message["files"]]
116
-
117
- if len(file_types) == 1 and file_types[0] == "video":
118
- gr.Info(f"Video will be processed at {TARGET_FPS} FPS, max {MAX_FRAMES} frames in this Space.")
119
-
120
- temp_dir = extract_frames_to_tempdir(
121
- message["files"][0],
122
- target_fps=TARGET_FPS,
123
- max_frames=MAX_FRAMES,
124
- )
125
- paths = sorted(pathlib.Path(temp_dir).glob("*.jpg"))
126
- return [
127
- {"type": "text", "text": message["text"]},
128
- *[{"type": "image", "image": path.as_posix()} for path in paths],
129
- ]
130
 
131
- return [
132
- {"type": "text", "text": message["text"]},
133
- *[{"type": file_type, file_type: path} for path, file_type in zip(message["files"], file_types, strict=True)],
134
- ]
135
 
136
-
137
- def process_history(history: list[dict]) -> list[dict]:
 
 
138
  messages = []
139
- current_user_content: list[dict] = []
 
 
 
 
140
  for item in history:
141
  if item["role"] == "assistant":
142
- if current_user_content:
143
- messages.append({"role": "user", "content": current_user_content})
144
- current_user_content = []
145
- messages.append({"role": "assistant", "content": [{"type": "text", "text": item["content"]}]})
 
 
 
 
 
146
  else:
 
147
  content = item["content"]
148
  if isinstance(content, str):
149
- current_user_content.append({"type": "text", "text": content})
150
  else:
151
- filepath = content[0]
152
- file_type = get_file_type(filepath)
153
- current_user_content.append({"type": file_type, file_type: filepath})
154
- return messages
155
-
156
-
157
- @spaces.GPU()
158
- @torch.inference_mode()
159
- def generate(message: dict, history: list[dict], system_prompt: str = "", max_new_tokens: int = 512) -> Iterator[str]:
160
- if not validate_media_constraints(message):
161
- yield ""
162
- return
163
-
164
- messages = []
165
- if system_prompt:
166
- messages.append({"role": "system", "content": [{"type": "text", "text": system_prompt}]})
167
- messages.extend(process_history(history))
168
- messages.append({"role": "user", "content": process_new_user_message(message)})
169
-
170
- inputs = processor.apply_chat_template(
171
  messages,
 
172
  add_generation_prompt=True,
173
- tokenize=True,
174
- return_dict=True,
175
- return_tensors="pt",
176
  )
177
- n_tokens = inputs["input_ids"].shape[1]
 
 
 
178
  if n_tokens > MAX_INPUT_TOKENS:
179
  gr.Warning(
180
  f"Input too long. Max {MAX_INPUT_TOKENS} tokens. Got {n_tokens} tokens. This limit is set to avoid CUDA out-of-memory errors in this Space."
@@ -182,36 +74,77 @@ def generate(message: dict, history: list[dict], system_prompt: str = "", max_ne
182
  yield ""
183
  return
184
 
185
- inputs = inputs.to(device=model.device, dtype=torch.bfloat16)
186
-
187
- streamer = TextIteratorStreamer(processor, timeout=30.0, skip_prompt=True, skip_special_tokens=True)
 
 
 
 
 
 
 
 
 
 
 
188
  generate_kwargs = dict(
189
- inputs,
190
  streamer=streamer,
191
  max_new_tokens=max_new_tokens,
192
  do_sample=True,
193
- temperature=1.0,
194
- top_k=64,
195
- top_p=0.95,
196
  min_p=0.0,
197
- repetition_penalty=1.0,
198
- disable_compile=True,
199
-
200
  )
201
  t = Thread(target=model.generate, kwargs=generate_kwargs)
202
  t.start()
203
 
204
  output = ""
 
 
 
205
  for delta in streamer:
206
  output += delta
207
- yield output
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
208
 
209
 
210
- # Examples for the chat interface (with additional inputs: system_prompt, max_new_tokens)
211
  examples = [
212
- ["What is the capital of France?", "You are a helpful assistant.", 700],
213
- ["Explain quantum computing in simple terms", "You are a helpful assistant.", 512],
214
- ["Write a short story about a robot learning to paint", "You are a helpful assistant.", 1000]
215
  ]
216
 
217
  system_prompt = (
@@ -224,17 +157,27 @@ system_prompt = (
224
  demo = gr.ChatInterface(
225
  fn=generate,
226
  type="messages",
227
- textbox=gr.MultimodalTextbox(
228
- file_types=list(IMAGE_FILE_TYPES + VIDEO_FILE_TYPES + AUDIO_FILE_TYPES),
229
- file_count="multiple",
230
  autofocus=True,
231
  ),
232
- multimodal=True,
233
  additional_inputs=[
234
  gr.Textbox(label="System Prompt", value=system_prompt),
235
- gr.Slider(label="Max New Tokens", minimum=100, maximum=2048, step=10, value=2048),
 
236
  ],
237
- title="Shako IRAQI AI",
 
 
 
 
 
 
 
 
 
 
238
  examples=examples,
239
  stop_btn=False,
240
  css="""
 
1
  import os
 
 
2
  from collections.abc import Iterator
3
  from threading import Thread
4
 
 
5
  import gradio as gr
6
  import spaces
7
  import torch
8
+ from transformers import AutoModelForCausalLM, AutoTokenizer
9
  from transformers.generation.streamers import TextIteratorStreamer
10
 
11
+ # Model configuration - Changed to Qwen3-14B
12
+ model_id = "Qwen/Qwen3-14B"
13
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
14
+ model = AutoModelForCausalLM.from_pretrained(
15
  model_id,
16
  device_map="auto",
17
  torch_dtype=torch.bfloat16
18
  )
19
 
20
+ # Settings
21
+ MAX_INPUT_TOKENS = int(os.getenv("MAX_INPUT_TOKENS", "32_000"))
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22
 
 
 
 
 
23
 
24
+ @spaces.GPU()
25
+ @torch.inference_mode()
26
+ def generate(message: dict, history: list[dict], system_prompt: str = "", max_new_tokens: int = 512, enable_thinking: bool = True) -> Iterator[str]:
27
+ # Build messages for Qwen3 (text-only format)
28
  messages = []
29
+ if system_prompt:
30
+ messages.append({"role": "system", "content": system_prompt})
31
+
32
+ # Process history - convert to simple text format
33
+ # Note: Don't include thinking content in history (best practice)
34
  for item in history:
35
  if item["role"] == "assistant":
36
+ # Extract only the response part (without thinking content)
37
+ content = item["content"]
38
+ # Remove thinking process markers if present
39
+ if "**🤔 Thinking Process:**" in content:
40
+ # Extract only the response part
41
+ parts = content.split("**💬 Response:**")
42
+ if len(parts) > 1:
43
+ content = parts[1].strip()
44
+ messages.append({"role": "assistant", "content": content})
45
  else:
46
+ # Extract text from user message
47
  content = item["content"]
48
  if isinstance(content, str):
49
+ messages.append({"role": "user", "content": content})
50
  else:
51
+ # For now, just use the text part (Qwen3-14B is text-only)
52
+ messages.append({"role": "user", "content": message.get("text", "")})
53
+
54
+ # Add current user message
55
+ current_message = message.get("text", "")
56
+ messages.append({"role": "user", "content": current_message})
57
+
58
+ # Apply chat template with enable_thinking parameter
59
+ # Note: When enable_thinking=True, the model supports /think and /no_think soft switches
60
+ text = tokenizer.apply_chat_template(
 
 
 
 
 
 
 
 
 
 
61
  messages,
62
+ tokenize=False,
63
  add_generation_prompt=True,
64
+ enable_thinking=enable_thinking
 
 
65
  )
66
+
67
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
68
+ n_tokens = model_inputs["input_ids"].shape[1]
69
+
70
  if n_tokens > MAX_INPUT_TOKENS:
71
  gr.Warning(
72
  f"Input too long. Max {MAX_INPUT_TOKENS} tokens. Got {n_tokens} tokens. This limit is set to avoid CUDA out-of-memory errors in this Space."
 
74
  yield ""
75
  return
76
 
77
+ # Set generation parameters based on mode
78
+ if enable_thinking:
79
+ # Thinking mode: Temperature=0.6, TopP=0.95, TopK=20, MinP=0
80
+ # DO NOT use greedy decoding (temperature=0) to avoid performance degradation
81
+ temperature = 0.6
82
+ top_p = 0.95
83
+ top_k = 20
84
+ else:
85
+ # Non-thinking mode: Temperature=0.7, TopP=0.8, TopK=20, MinP=0
86
+ temperature = 0.7
87
+ top_p = 0.8
88
+ top_k = 20
89
+
90
+ streamer = TextIteratorStreamer(tokenizer, timeout=30.0, skip_prompt=True, skip_special_tokens=False)
91
  generate_kwargs = dict(
92
+ **model_inputs,
93
  streamer=streamer,
94
  max_new_tokens=max_new_tokens,
95
  do_sample=True,
96
+ temperature=temperature,
97
+ top_k=top_k,
98
+ top_p=top_p,
99
  min_p=0.0,
 
 
 
100
  )
101
  t = Thread(target=model.generate, kwargs=generate_kwargs)
102
  t.start()
103
 
104
  output = ""
105
+ thinking_content = ""
106
+ response_content = ""
107
+
108
  for delta in streamer:
109
  output += delta
110
+
111
+ # Parse thinking content if in thinking mode
112
+ # When enable_thinking=True, the model always outputs <think>...</think> block
113
+ # (even if empty when using /no_think soft switch)
114
+ if enable_thinking and "<think>" in output:
115
+ if "</think>" in output:
116
+ # Extract thinking and response parts
117
+ try:
118
+ think_start = output.index("<think>") + 7
119
+ think_end = output.index("</think>")
120
+ thinking_content = output[think_start:think_end].strip()
121
+ response_content = output[think_end + 8:].strip()
122
+
123
+ # Display formatted output
124
+ if thinking_content:
125
+ # Thinking content exists (user didn't use /no_think or used /think)
126
+ formatted_output = f"**🤔 Thinking Process:**\n{thinking_content}\n\n**💬 Response:**\n{response_content}"
127
+ else:
128
+ # Empty thinking block (user used /no_think soft switch)
129
+ formatted_output = f"**💬 Response:**\n{response_content}"
130
+
131
+ yield formatted_output
132
+ except ValueError:
133
+ # Still parsing, yield raw output
134
+ yield output
135
+ else:
136
+ # Still generating thinking content
137
+ yield output
138
+ else:
139
+ # Non-thinking mode or no <think> tag yet
140
+ yield output
141
 
142
 
143
+ # Examples for the chat interface (with additional inputs: system_prompt, max_new_tokens, enable_thinking)
144
  examples = [
145
+ ["What is the capital of France? /no_think", "You are a helpful assistant.", 700, True],
146
+ ["Explain quantum computing in simple terms", "You are a helpful assistant.", 512, False],
147
+ ["Solve this math problem: If x^2 + 5x + 6 = 0, what are the values of x? /think", "You are a helpful assistant.", 2000, True]
148
  ]
149
 
150
  system_prompt = (
 
157
  demo = gr.ChatInterface(
158
  fn=generate,
159
  type="messages",
160
+ textbox=gr.Textbox(
161
+ placeholder="Type your message here...",
 
162
  autofocus=True,
163
  ),
164
+ multimodal=False, # Qwen3-14B is text-only
165
  additional_inputs=[
166
  gr.Textbox(label="System Prompt", value=system_prompt),
167
+ gr.Slider(label="Max New Tokens", minimum=100, maximum=32768, step=100, value=2048),
168
+ gr.Checkbox(label="Enable Thinking Mode", value=True, info="Enable for complex reasoning tasks (math, coding). Disable for faster general chat."),
169
  ],
170
+ title="Qwen3-14B Iraqi Chatbot with Thinking Mode",
171
+ description="""
172
+ 🤔 **Thinking Mode ON**: Better for math, coding, and complex reasoning
173
+ 💬 **Thinking Mode OFF**: Faster responses for general conversation
174
+
175
+ **💡 Pro Tip**: When Thinking Mode is enabled, you can use:
176
+ - `/think` in your message to force thinking for that turn
177
+ - `/no_think` in your message to skip thinking for that turn
178
+
179
+ Example: "Solve this equation: x^2 + 5x + 6 = 0 /think"
180
+ """,
181
  examples=examples,
182
  stop_btn=False,
183
  css="""
requirements.txt CHANGED
@@ -1,8 +1,5 @@
1
  gradio>=4.0.0
2
  spaces[huggingface]>=0.28.0
3
- transformers>=4.35.0
4
  torch>=2.1.0
5
- av
6
- accelerate>=0.25.0
7
- timm
8
- gTTS>=2.5.0
 
1
  gradio>=4.0.0
2
  spaces[huggingface]>=0.28.0
3
+ transformers>=4.51.0
4
  torch>=2.1.0
5
+ accelerate>=0.25.0