Spaces:

anaspro
/

chatbox2

Runtime error

App Files Files Community

anaspro commited on Nov 16, 2025

Commit

55612d9

1 Parent(s): d4f3bf5

update

Browse files

Files changed (5) hide show

CHANGES.md +173 -0
README.md +43 -6
USAGE_GUIDE.md +213 -0
app.py +115 -172
requirements.txt +2 -5

CHANGES.md ADDED Viewed

	@@ -0,0 +1,173 @@

+# Chatbox2 - Qwen3-14B Update
+## Summary of Changes
+Your chatbox has been successfully upgraded to use **Qwen3-14B** with thinking/non-thinking mode capabilities!
+## What Changed
+### 1. **Model Upgrade**
+- **Old Model**: `anaspro/Shako-iraqi-4B-it` (multimodal)
+- **New Model**: `Qwen/Qwen3-14B` (text-only with thinking capabilities)
+### 2. **New Features**
+#### **Thinking Mode Toggle** 🤔
+You can now switch between two modes:
+- **Thinking Mode ON** (default):
+  - Best for: Math problems, coding, complex reasoning
+  - The model shows its reasoning process in `<think>...</think>` tags
+  - Uses Temperature=0.6, TopP=0.95, TopK=20
+  - More detailed and thorough responses
+- **Thinking Mode OFF**:
+  - Best for: General conversation, quick responses
+  - Faster responses without showing reasoning
+  - Uses Temperature=0.7, TopP=0.8, TopK=20
+  - More efficient for casual chat
+### 3. **Updated Parameters**
+- Max tokens increased from 2048 to 32768 (matching Qwen3's capabilities)
+- Optimized generation parameters based on mode
+- Removed multimodal support (images/videos) as Qwen3-14B is text-only
+### 4. **UI Improvements**
+- Added checkbox to toggle thinking mode
+- Updated title and description
+- New examples showcasing both modes
+## How to Use
+### Basic Usage
+1. Type your message in the textbox
+2. Adjust settings in the sidebar:
+   - **System Prompt**: Customize the AI's behavior (default: Iraqi dialect)
+   - **Max New Tokens**: Control response length (100-32768)
+   - **Enable Thinking Mode**: Toggle between thinking/non-thinking
+### When to Use Thinking Mode
+✅ **Enable Thinking Mode for:**
+- Math problems
+- Coding challenges
+- Complex logical reasoning
+- Step-by-step explanations
+- Problem-solving tasks
+❌ **Disable Thinking Mode for:**
+- General conversation
+- Quick questions
+- Creative writing
+- Casual chat
+- When you need faster responses
+### Advanced: Soft Switching with `/think` and `/no_think`
+When **Enable Thinking Mode** checkbox is ON, you can dynamically control thinking behavior per message using soft switches:
+- Add `/think` to your message to **force thinking** for that specific turn
+- Add `/no_think` to your message to **skip thinking** for that specific turn
+**Important Notes:**
+- Soft switches only work when the "Enable Thinking Mode" checkbox is checked (ON)
+- When using `/no_think`, the model still outputs `<think>...</think>` tags, but they will be empty
+- The model follows the most recent instruction in multi-turn conversations
+- You can add the switch anywhere in your message (beginning or end)
+**Examples:**
+```
+User: What is the capital of France? /no_think
+Bot: 💬 Response: Paris is the capital of France.
+```
+```
+User: Solve this complex equation: x^3 + 2x^2 - 5x + 1 = 0 /think
+Bot: 🤔 Thinking Process: Let me approach this step by step...
+     💬 Response: The solutions are approximately...
+```
+```
+User: How many r's in strawberry? /think
+Bot: 🤔 Thinking Process: Let me count each letter: s-t-r-a-w-b-e-r-r-y...
+     💬 Response: There are 3 r's in "strawberry".
+User: What about blueberry? /no_think
+Bot: 💬 Response: There are 2 r's in "blueberry".
+User: Really? /think
+Bot: 🤔 Thinking Process: Let me recount: b-l-u-e-b-e-r-r-y...
+     💬 Response: Yes, there are 2 r's in "blueberry" (positions 7 and 8).
+```
+**When Soft Switches Don't Work:**
+- If "Enable Thinking Mode" checkbox is OFF, soft switches are ignored
+- The model will not generate any `<think>` tags regardless of `/think` or `/no_think` in your message
+## Technical Details
+### Dependencies Updated
+- `transformers>=4.51.0` (required for Qwen3 support)
+- Removed: `av`, `timm`, `gTTS` (no longer needed)
+### Model Configuration
+```python
+model_id = "Qwen/Qwen3-14B"
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForCausalLM.from_pretrained(
+    model_id,
+    device_map="auto",
+    torch_dtype=torch.bfloat16
+)
+```
+### Generation Parameters
+**Thinking Mode:**
+- Temperature: 0.6
+- Top-P: 0.95
+- Top-K: 20
+- Min-P: 0.0
+**Non-Thinking Mode:**
+- Temperature: 0.7
+- Top-P: 0.8
+- Top-K: 20
+- Min-P: 0.0
+## Running the Application
+```bash
+python app.py
+```
+The app will launch on `http://localhost:7860` by default.
+## Notes
+1. **Text-Only**: Qwen3-14B doesn't support images, videos, or audio. The multimodal features have been removed.
+2. **Context Length**: The model supports up to 32,768 tokens natively. For longer contexts (up to 131,072), you can enable YaRN scaling (see Qwen3 documentation).
+3. **Iraqi Dialect**: The default system prompt is configured for Iraqi Arabic dialect. You can modify this in the System Prompt field.
+4. **GPU Requirements**: Qwen3-14B requires significant GPU memory. Make sure you have adequate resources.
+## Reference
+For more information about Qwen3-14B capabilities, visit:
+- Model Page: https://huggingface.co/Qwen/Qwen3-14B
+- Documentation: https://qwenlm.github.io/blog/qwen3/
+## Troubleshooting
+**Issue**: `KeyError: 'qwen3'`
+**Solution**: Make sure you have `transformers>=4.51.0` installed
+**Issue**: Out of memory errors
+**Solution**: Reduce `max_new_tokens` or use a smaller batch size
+**Issue**: Slow responses
+**Solution**: Disable thinking mode for faster generation

README.md CHANGED Viewed

@@ -1,13 +1,50 @@
 ---
-title: Chatbox2
-emoji: 🦀
-colorFrom: red
-colorTo: gray
 sdk: gradio
 sdk_version: 5.49.1
 app_file: app.py
 pinned: false
-short_description: shakoo
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: Qwen3-14B Iraqi Chatbot
+emoji: 🤔
+colorFrom: blue
+colorTo: purple
 sdk: gradio
 sdk_version: 5.49.1
 app_file: app.py
 pinned: false
+short_description: Qwen3-14B with thinking mode for Iraqi Arabic
 ---
+# Qwen3-14B Iraqi Chatbot with Thinking Mode
+An advanced chatbot powered by **Qwen3-14B** with seamless switching between thinking and non-thinking modes.
+## Features
+- 🤔 **Thinking Mode**: Enhanced reasoning for complex tasks (math, coding, logic)
+- 💬 **Non-Thinking Mode**: Fast responses for general conversation
+- 🇮🇶 **Iraqi Dialect**: Optimized for Iraqi Arabic conversations
+- 🎯 **32K Context**: Supports up to 32,768 tokens
+## Quick Start
+1. Type your question in the chat
+2. Toggle "Enable Thinking Mode" for complex reasoning tasks
+3. Adjust system prompt and max tokens as needed
+## When to Use Thinking Mode
+**Enable for:**
+- Math problems and equations
+- Coding challenges
+- Complex reasoning tasks
+- Step-by-step explanations
+**Disable for:**
+- Quick questions
+- General conversation
+- Creative writing
+- Faster responses
+## Technical Details
+- **Model**: Qwen/Qwen3-14B
+- **Context Length**: 32,768 tokens (native)
+- **Parameters**: 14.8B total, 13.2B non-embedding
+Check out the [Qwen3 documentation](https://huggingface.co/Qwen/Qwen3-14B) for more details.

USAGE_GUIDE.md ADDED Viewed

	@@ -0,0 +1,213 @@

+# Qwen3-14B Chatbot - Quick Usage Guide
+## 🎯 Three Ways to Control Thinking Mode
+### 1. **Hard Switch: Enable Thinking Mode Checkbox**
+Located in the sidebar, this is the main control:
+- ✅ **Checked (ON)**: Thinking mode is available
+  - Model can show reasoning process
+  - Supports `/think` and `/no_think` soft switches
+  - Best for: Math, coding, complex reasoning
+  - Parameters: Temp=0.6, TopP=0.95, TopK=20
+- ❌ **Unchecked (OFF)**: Thinking mode is disabled
+  - No thinking process shown
+  - Faster responses
+  - Soft switches are ignored
+  - Best for: General chat, quick questions
+  - Parameters: Temp=0.7, TopP=0.8, TopK=20
+---
+### 2. **Soft Switch: `/think` Tag**
+Force thinking for a specific message (only works when checkbox is ON):
+```
+User: How many r's are in "strawberry"? /think
+```
+**Result:**
+```
+🤔 Thinking Process:
+Let me count each letter carefully:
+s-t-r-a-w-b-e-r-r-y
+The r's appear at positions 3, 8, and 9.
+💬 Response:
+There are 3 r's in the word "strawberry".
+```
+---
+### 3. **Soft Switch: `/no_think` Tag**
+Skip thinking for a specific message (only works when checkbox is ON):
+```
+User: What is 2+2? /no_think
+```
+**Result:**
+```
+💬 Response:
+2+2 equals 4.
+```
+---
+## 📊 Comparison Table
+| Feature | Checkbox ON + `/think` | Checkbox ON + `/no_think` | Checkbox ON (default) | Checkbox OFF |
+|---------|----------------------|--------------------------|---------------------|--------------|
+| Shows thinking | ✅ Yes | ❌ No | ✅ Yes | ❌ No |
+| `<think>` tags | ✅ With content | ⚠️ Empty | ✅ With content | ❌ None |
+| Speed | 🐢 Slower | 🚀 Faster | 🐢 Slower | 🚀 Fastest |
+| Best for | Complex problems | Quick answers | Reasoning tasks | General chat |
+---
+## 💡 Real-World Examples
+### Example 1: Math Problem (Use Thinking)
+```
+User: Solve: If x^2 + 5x + 6 = 0, what are the values of x? /think
+Bot:
+🤔 Thinking Process:
+This is a quadratic equation. I can solve it by factoring:
+x^2 + 5x + 6 = 0
+(x + 2)(x + 3) = 0
+So x + 2 = 0 or x + 3 = 0
+Therefore x = -2 or x = -3
+💬 Response:
+The values of x are -2 and -3.
+```
+### Example 2: Quick Fact (Skip Thinking)
+```
+User: What is the capital of Iraq? /no_think
+Bot:
+💬 Response:
+The capital of Iraq is Baghdad (بغداد).
+```
+### Example 3: Multi-Turn Conversation
+```
+User: How many r's in "strawberry"? /think
+Bot: 🤔 [shows counting process] 💬 There are 3 r's.
+User: What about "blueberry"? /no_think
+Bot: 💬 There are 2 r's in "blueberry".
+User: Are you sure? /think
+Bot: 🤔 [recounts carefully] 💬 Yes, confirmed: 2 r's in "blueberry".
+```
+---
+## 🎓 Best Practices
+### ✅ DO Use Thinking Mode For:
+- 🧮 Math equations and calculations
+- 💻 Code generation and debugging
+- 🧩 Logic puzzles and riddles
+- 📊 Data analysis questions
+- 🔍 Complex reasoning tasks
+- 📝 Step-by-step explanations
+### ❌ DON'T Use Thinking Mode For:
+- 💬 Simple greetings
+- ❓ Basic factual questions
+- 🎨 Creative writing
+- 🗣️ Casual conversation
+- ⚡ When you need quick responses
+---
+## ⚙️ Settings Explained
+### System Prompt
+Customizes the AI's personality and language style.
+**Default (Iraqi Arabic):**
+```
+انت موديل عراقي ذكي من بغداد. تتحدث باللهجة العراقية فقط...
+```
+**English Alternative:**
+```
+You are a helpful AI assistant. Provide clear, detailed answers.
+```
+### Max New Tokens
+Controls response length (100 - 32,768 tokens).
+- **512**: Short answers
+- **2,048**: Standard (default)
+- **8,192**: Long explanations
+- **32,768**: Maximum (for very complex problems)
+---
+## 🐛 Troubleshooting
+### Issue: Soft switches not working
+**Solution**: Make sure "Enable Thinking Mode" checkbox is ON
+### Issue: Empty thinking blocks
+**Cause**: You used `/no_think` or the model decided not to think
+**Solution**: This is normal behavior; use `/think` to force thinking
+### Issue: Responses too slow
+**Solution**:
+1. Disable thinking mode checkbox, OR
+2. Use `/no_think` for specific messages, OR
+3. Reduce Max New Tokens
+### Issue: Not enough detail in responses
+**Solution**:
+1. Enable thinking mode checkbox
+2. Use `/think` tag
+3. Increase Max New Tokens
+4. Adjust system prompt for more detailed responses
+---
+## 🚀 Quick Start Checklist
+1. ✅ Open the chatbot interface
+2. ✅ Check if "Enable Thinking Mode" is ON (for complex tasks) or OFF (for chat)
+3. ✅ Adjust "Max New Tokens" based on expected response length
+4. ✅ (Optional) Customize System Prompt
+5. ✅ Type your message
+6. ✅ (Optional) Add `/think` or `/no_think` at the end
+7. ✅ Press Enter and wait for response
+---
+## 📚 Additional Resources
+- **Model Page**: https://huggingface.co/Qwen/Qwen3-14B
+- **Documentation**: https://qwenlm.github.io/blog/qwen3/
+- **Unsloth Version**: https://huggingface.co/unsloth/Qwen3-14B
+---
+## 💬 Need Help?
+If you encounter issues or have questions:
+1. Check the CHANGES.md file for detailed technical information
+2. Review the examples above
+3. Experiment with different settings
+4. Read the official Qwen3 documentation
+Happy chatting! 🎉

app.py CHANGED Viewed

@@ -1,180 +1,72 @@
 import os
-import pathlib
-import tempfile
 from collections.abc import Iterator
 from threading import Thread
-import av
 import gradio as gr
 import spaces
 import torch
-from transformers import AutoModelForImageTextToText, AutoProcessor
 from transformers.generation.streamers import TextIteratorStreamer
-# Model configuration
-model_id = "anaspro/Shako-iraqi-4B-it"
-processor = AutoProcessor.from_pretrained(model_id)
-model = AutoModelForImageTextToText.from_pretrained(
     model_id,
     device_map="auto",
     torch_dtype=torch.bfloat16
 )
-# Supported file types
-IMAGE_FILE_TYPES = (".jpg", ".jpeg", ".png", ".webp")
-VIDEO_FILE_TYPES = (".mp4", ".mov", ".webm")
-AUDIO_FILE_TYPES = (".mp3", ".wav")
-# Video processing settings
-TARGET_FPS = int(os.getenv("TARGET_FPS", "3"))
-MAX_FRAMES = int(os.getenv("MAX_FRAMES", "30"))
-MAX_INPUT_TOKENS = int(os.getenv("MAX_INPUT_TOKENS", "10_000"))
-def get_file_type(path: str) -> str:
-    if path.endswith(IMAGE_FILE_TYPES):
-        return "image"
-    if path.endswith(VIDEO_FILE_TYPES):
-        return "video"
-    if path.endswith(AUDIO_FILE_TYPES):
-        return "audio"
-    error_message = f"Unsupported file type: {path}"
-    raise ValueError(error_message)
-def count_files_in_new_message(paths: list[str]) -> tuple[int, int]:
-    video_count = 0
-    non_video_count = 0
-    for path in paths:
-        if path.endswith(VIDEO_FILE_TYPES):
-            video_count += 1
-        else:
-            non_video_count += 1
-    return video_count, non_video_count
-def validate_media_constraints(message: dict) -> bool:
-    video_count, non_video_count = count_files_in_new_message(message["files"])
-    if video_count > 1:
-        gr.Warning("Only one video is supported.")
-        return False
-    if video_count == 1 and non_video_count > 0:
-        gr.Warning("Mixing images and videos is not allowed.")
-        return False
-    return True
-def extract_frames_to_tempdir(
-    video_path: str,
-    target_fps: float,
-    max_frames: int | None = None,
-    parent_dir: str | None = None,
-    prefix: str = "frames_",
-) -> str:
-    temp_dir = tempfile.mkdtemp(prefix=prefix, dir=parent_dir)
-    container = av.open(video_path)
-    video_stream = container.streams.video[0]
-    if video_stream.duration is None or video_stream.time_base is None:
-        raise ValueError("video_stream is missing duration or time_base")
-    time_base = video_stream.time_base
-    duration = float(video_stream.duration * time_base)
-    interval = 1.0 / target_fps
-    total_frames = int(duration * target_fps)
-    if max_frames is not None:
-        total_frames = min(total_frames, max_frames)
-    target_times = [i * interval for i in range(total_frames)]
-    target_index = 0
-    for frame in container.decode(video=0):
-        if frame.pts is None:
-            continue
-        timestamp = float(frame.pts * time_base)
-        if target_index < len(target_times) and abs(timestamp - target_times[target_index]) < (interval / 2):
-            frame_path = pathlib.Path(temp_dir) / f"frame_{target_index:04d}.jpg"
-            frame.to_image().save(frame_path)
-            target_index += 1
-            if max_frames is not None and target_index >= max_frames:
-                break
-    container.close()
-    return temp_dir
-def process_new_user_message(message: dict) -> list[dict]:
-    if not message["files"]:
-        return [{"type": "text", "text": message["text"]}]
-    file_types = [get_file_type(path) for path in message["files"]]
-    if len(file_types) == 1 and file_types[0] == "video":
-        gr.Info(f"Video will be processed at {TARGET_FPS} FPS, max {MAX_FRAMES} frames in this Space.")
-        temp_dir = extract_frames_to_tempdir(
-            message["files"][0],
-            target_fps=TARGET_FPS,
-            max_frames=MAX_FRAMES,
-        )
-        paths = sorted(pathlib.Path(temp_dir).glob("*.jpg"))
-        return [
-            {"type": "text", "text": message["text"]},
-            *[{"type": "image", "image": path.as_posix()} for path in paths],
-        ]
-    return [
-        {"type": "text", "text": message["text"]},
-        *[{"type": file_type, file_type: path} for path, file_type in zip(message["files"], file_types, strict=True)],
-    ]
-def process_history(history: list[dict]) -> list[dict]:
     messages = []
-    current_user_content: list[dict] = []
     for item in history:
         if item["role"] == "assistant":
-            if current_user_content:
-                messages.append({"role": "user", "content": current_user_content})
-                current_user_content = []
-            messages.append({"role": "assistant", "content": [{"type": "text", "text": item["content"]}]})
         else:
             content = item["content"]
             if isinstance(content, str):
-                current_user_content.append({"type": "text", "text": content})
             else:
-                filepath = content[0]
-                file_type = get_file_type(filepath)
-                current_user_content.append({"type": file_type, file_type: filepath})
-    return messages
-@spaces.GPU()
-@torch.inference_mode()
-def generate(message: dict, history: list[dict], system_prompt: str = "", max_new_tokens: int = 512) -> Iterator[str]:
-    if not validate_media_constraints(message):
-        yield ""
-        return
-    messages = []
-    if system_prompt:
-        messages.append({"role": "system", "content": [{"type": "text", "text": system_prompt}]})
-    messages.extend(process_history(history))
-    messages.append({"role": "user", "content": process_new_user_message(message)})
-    inputs = processor.apply_chat_template(
         messages,
         add_generation_prompt=True,
-        tokenize=True,
-        return_dict=True,
-        return_tensors="pt",
     )
-    n_tokens = inputs["input_ids"].shape[1]
     if n_tokens > MAX_INPUT_TOKENS:
         gr.Warning(
             f"Input too long. Max {MAX_INPUT_TOKENS} tokens. Got {n_tokens} tokens. This limit is set to avoid CUDA out-of-memory errors in this Space."
@@ -182,36 +74,77 @@ def generate(message: dict, history: list[dict], system_prompt: str = "", max_ne
         yield ""
         return
-    inputs = inputs.to(device=model.device, dtype=torch.bfloat16)
-    streamer = TextIteratorStreamer(processor, timeout=30.0, skip_prompt=True, skip_special_tokens=True)
     generate_kwargs = dict(
-        inputs,
         streamer=streamer,
         max_new_tokens=max_new_tokens,
         do_sample=True,
-        temperature=1.0,
-        top_k=64,
-        top_p=0.95,
         min_p=0.0,
-        repetition_penalty=1.0,
-        disable_compile=True,
     )
     t = Thread(target=model.generate, kwargs=generate_kwargs)
     t.start()
     output = ""
     for delta in streamer:
         output += delta
-        yield output
-# Examples for the chat interface (with additional inputs: system_prompt, max_new_tokens)
 examples = [
-    ["What is the capital of France?", "You are a helpful assistant.", 700],
-    ["Explain quantum computing in simple terms", "You are a helpful assistant.", 512],
-    ["Write a short story about a robot learning to paint", "You are a helpful assistant.", 1000]
 ]
 system_prompt = (
@@ -224,17 +157,27 @@ system_prompt = (
 demo = gr.ChatInterface(
     fn=generate,
     type="messages",
-    textbox=gr.MultimodalTextbox(
-        file_types=list(IMAGE_FILE_TYPES + VIDEO_FILE_TYPES + AUDIO_FILE_TYPES),
-        file_count="multiple",
         autofocus=True,
     ),
-    multimodal=True,
     additional_inputs=[
         gr.Textbox(label="System Prompt", value=system_prompt),
-        gr.Slider(label="Max New Tokens", minimum=100, maximum=2048, step=10, value=2048),
     ],
-    title="Shako IRAQI AI",
     examples=examples,
     stop_btn=False,
     css="""

 import os
 from collections.abc import Iterator
 from threading import Thread
 import gradio as gr
 import spaces
 import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer
 from transformers.generation.streamers import TextIteratorStreamer
+# Model configuration - Changed to Qwen3-14B
+model_id = "Qwen/Qwen3-14B"
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForCausalLM.from_pretrained(
     model_id,
     device_map="auto",
     torch_dtype=torch.bfloat16
 )
+# Settings
+MAX_INPUT_TOKENS = int(os.getenv("MAX_INPUT_TOKENS", "32_000"))
+@spaces.GPU()
+@torch.inference_mode()
+def generate(message: dict, history: list[dict], system_prompt: str = "", max_new_tokens: int = 512, enable_thinking: bool = True) -> Iterator[str]:
+    # Build messages for Qwen3 (text-only format)
     messages = []
+    if system_prompt:
+        messages.append({"role": "system", "content": system_prompt})
+    # Process history - convert to simple text format
+    # Note: Don't include thinking content in history (best practice)
     for item in history:
         if item["role"] == "assistant":
+            # Extract only the response part (without thinking content)
+            content = item["content"]
+            # Remove thinking process markers if present
+            if "**🤔 Thinking Process:**" in content:
+                # Extract only the response part
+                parts = content.split("**💬 Response:**")
+                if len(parts) > 1:
+                    content = parts[1].strip()
+            messages.append({"role": "assistant", "content": content})
         else:
+            # Extract text from user message
             content = item["content"]
             if isinstance(content, str):
+                messages.append({"role": "user", "content": content})
             else:
+                # For now, just use the text part (Qwen3-14B is text-only)
+                messages.append({"role": "user", "content": message.get("text", "")})
+    # Add current user message
+    current_message = message.get("text", "")
+    messages.append({"role": "user", "content": current_message})
+    # Apply chat template with enable_thinking parameter
+    # Note: When enable_thinking=True, the model supports /think and /no_think soft switches
+    text = tokenizer.apply_chat_template(
         messages,
+        tokenize=False,
         add_generation_prompt=True,
+        enable_thinking=enable_thinking
     )
+    model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
+    n_tokens = model_inputs["input_ids"].shape[1]
     if n_tokens > MAX_INPUT_TOKENS:
         gr.Warning(
             f"Input too long. Max {MAX_INPUT_TOKENS} tokens. Got {n_tokens} tokens. This limit is set to avoid CUDA out-of-memory errors in this Space."
         yield ""
         return
+    # Set generation parameters based on mode
+    if enable_thinking:
+        # Thinking mode: Temperature=0.6, TopP=0.95, TopK=20, MinP=0
+        # DO NOT use greedy decoding (temperature=0) to avoid performance degradation
+        temperature = 0.6
+        top_p = 0.95
+        top_k = 20
+    else:
+        # Non-thinking mode: Temperature=0.7, TopP=0.8, TopK=20, MinP=0
+        temperature = 0.7
+        top_p = 0.8
+        top_k = 20
+    streamer = TextIteratorStreamer(tokenizer, timeout=30.0, skip_prompt=True, skip_special_tokens=False)
     generate_kwargs = dict(
+        **model_inputs,
         streamer=streamer,
         max_new_tokens=max_new_tokens,
         do_sample=True,
+        temperature=temperature,
+        top_k=top_k,
+        top_p=top_p,
         min_p=0.0,
     )
     t = Thread(target=model.generate, kwargs=generate_kwargs)
     t.start()
     output = ""
+    thinking_content = ""
+    response_content = ""
     for delta in streamer:
         output += delta
+        # Parse thinking content if in thinking mode
+        # When enable_thinking=True, the model always outputs <think>...</think> block
+        # (even if empty when using /no_think soft switch)
+        if enable_thinking and "<think>" in output:
+            if "</think>" in output:
+                # Extract thinking and response parts
+                try:
+                    think_start = output.index("<think>") + 7
+                    think_end = output.index("</think>")
+                    thinking_content = output[think_start:think_end].strip()
+                    response_content = output[think_end + 8:].strip()
+                    # Display formatted output
+                    if thinking_content:
+                        # Thinking content exists (user didn't use /no_think or used /think)
+                        formatted_output = f"**🤔 Thinking Process:**\n{thinking_content}\n\n**💬 Response:**\n{response_content}"
+                    else:
+                        # Empty thinking block (user used /no_think soft switch)
+                        formatted_output = f"**💬 Response:**\n{response_content}"
+                    yield formatted_output
+                except ValueError:
+                    # Still parsing, yield raw output
+                    yield output
+            else:
+                # Still generating thinking content
+                yield output
+        else:
+            # Non-thinking mode or no <think> tag yet
+            yield output
+# Examples for the chat interface (with additional inputs: system_prompt, max_new_tokens, enable_thinking)
 examples = [
+    ["What is the capital of France? /no_think", "You are a helpful assistant.", 700, True],
+    ["Explain quantum computing in simple terms", "You are a helpful assistant.", 512, False],
+    ["Solve this math problem: If x^2 + 5x + 6 = 0, what are the values of x? /think", "You are a helpful assistant.", 2000, True]
 ]
 system_prompt = (
 demo = gr.ChatInterface(
     fn=generate,
     type="messages",
+    textbox=gr.Textbox(
+        placeholder="Type your message here...",
         autofocus=True,
     ),
+    multimodal=False,  # Qwen3-14B is text-only
     additional_inputs=[
         gr.Textbox(label="System Prompt", value=system_prompt),
+        gr.Slider(label="Max New Tokens", minimum=100, maximum=32768, step=100, value=2048),
+        gr.Checkbox(label="Enable Thinking Mode", value=True, info="Enable for complex reasoning tasks (math, coding). Disable for faster general chat."),
     ],
+    title="Qwen3-14B Iraqi Chatbot with Thinking Mode",
+    description="""
+🤔 **Thinking Mode ON**: Better for math, coding, and complex reasoning
+💬 **Thinking Mode OFF**: Faster responses for general conversation
+**💡 Pro Tip**: When Thinking Mode is enabled, you can use:
+- `/think` in your message to force thinking for that turn
+- `/no_think` in your message to skip thinking for that turn
+Example: "Solve this equation: x^2 + 5x + 6 = 0 /think"
+""",
     examples=examples,
     stop_btn=False,
     css="""

requirements.txt CHANGED Viewed

@@ -1,8 +1,5 @@
 gradio>=4.0.0
 spaces[huggingface]>=0.28.0
-transformers>=4.35.0
 torch>=2.1.0
-av
-accelerate>=0.25.0
-timm
-gTTS>=2.5.0

 gradio>=4.0.0
 spaces[huggingface]>=0.28.0
+transformers>=4.51.0
 torch>=2.1.0
+accelerate>=0.25.0