A newer version of the Gradio SDK is available:
6.1.0
Chatbox2 - Qwen3-14B Update
Summary of Changes
Your chatbox has been successfully upgraded to use Qwen3-14B with thinking/non-thinking mode capabilities!
What Changed
1. Model Upgrade
- Old Model:
anaspro/Shako-iraqi-4B-it(multimodal) - New Model:
Qwen/Qwen3-14B(text-only with thinking capabilities)
2. New Features
Thinking Mode Toggle π€
You can now switch between two modes:
Thinking Mode ON (default):
- Best for: Math problems, coding, complex reasoning
- The model shows its reasoning process in
<think>...</think>tags - Uses Temperature=0.6, TopP=0.95, TopK=20
- More detailed and thorough responses
Thinking Mode OFF:
- Best for: General conversation, quick responses
- Faster responses without showing reasoning
- Uses Temperature=0.7, TopP=0.8, TopK=20
- More efficient for casual chat
3. Updated Parameters
- Max tokens increased from 2048 to 32768 (matching Qwen3's capabilities)
- Optimized generation parameters based on mode
- Removed multimodal support (images/videos) as Qwen3-14B is text-only
4. UI Improvements
- Added checkbox to toggle thinking mode
- Updated title and description
- New examples showcasing both modes
How to Use
Basic Usage
- Type your message in the textbox
- Adjust settings in the sidebar:
- System Prompt: Customize the AI's behavior (default: Iraqi dialect)
- Max New Tokens: Control response length (100-32768)
- Enable Thinking Mode: Toggle between thinking/non-thinking
When to Use Thinking Mode
β Enable Thinking Mode for:
- Math problems
- Coding challenges
- Complex logical reasoning
- Step-by-step explanations
- Problem-solving tasks
β Disable Thinking Mode for:
- General conversation
- Quick questions
- Creative writing
- Casual chat
- When you need faster responses
Advanced: Soft Switching with /think and /no_think
When Enable Thinking Mode checkbox is ON, you can dynamically control thinking behavior per message using soft switches:
- Add
/thinkto your message to force thinking for that specific turn - Add
/no_thinkto your message to skip thinking for that specific turn
Important Notes:
- Soft switches only work when the "Enable Thinking Mode" checkbox is checked (ON)
- When using
/no_think, the model still outputs<think>...</think>tags, but they will be empty - The model follows the most recent instruction in multi-turn conversations
- You can add the switch anywhere in your message (beginning or end)
Examples:
User: What is the capital of France? /no_think
Bot: π¬ Response: Paris is the capital of France.
User: Solve this complex equation: x^3 + 2x^2 - 5x + 1 = 0 /think
Bot: π€ Thinking Process: Let me approach this step by step...
π¬ Response: The solutions are approximately...
User: How many r's in strawberry? /think
Bot: π€ Thinking Process: Let me count each letter: s-t-r-a-w-b-e-r-r-y...
π¬ Response: There are 3 r's in "strawberry".
User: What about blueberry? /no_think
Bot: π¬ Response: There are 2 r's in "blueberry".
User: Really? /think
Bot: π€ Thinking Process: Let me recount: b-l-u-e-b-e-r-r-y...
π¬ Response: Yes, there are 2 r's in "blueberry" (positions 7 and 8).
When Soft Switches Don't Work:
- If "Enable Thinking Mode" checkbox is OFF, soft switches are ignored
- The model will not generate any
<think>tags regardless of/thinkor/no_thinkin your message
Technical Details
Dependencies Updated
transformers>=4.51.0(required for Qwen3 support)- Removed:
av,timm,gTTS(no longer needed)
Model Configuration
model_id = "Qwen/Qwen3-14B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
torch_dtype=torch.bfloat16
)
Generation Parameters
Thinking Mode:
- Temperature: 0.6
- Top-P: 0.95
- Top-K: 20
- Min-P: 0.0
Non-Thinking Mode:
- Temperature: 0.7
- Top-P: 0.8
- Top-K: 20
- Min-P: 0.0
Running the Application
python app.py
The app will launch on http://localhost:7860 by default.
Notes
Text-Only: Qwen3-14B doesn't support images, videos, or audio. The multimodal features have been removed.
Context Length: The model supports up to 32,768 tokens natively. For longer contexts (up to 131,072), you can enable YaRN scaling (see Qwen3 documentation).
Iraqi Dialect: The default system prompt is configured for Iraqi Arabic dialect. You can modify this in the System Prompt field.
GPU Requirements: Qwen3-14B requires significant GPU memory. Make sure you have adequate resources.
Reference
For more information about Qwen3-14B capabilities, visit:
- Model Page: https://huggingface.co/Qwen/Qwen3-14B
- Documentation: https://qwenlm.github.io/blog/qwen3/
Troubleshooting
Issue: KeyError: 'qwen3'
Solution: Make sure you have transformers>=4.51.0 installed
Issue: Out of memory errors
Solution: Reduce max_new_tokens or use a smaller batch size
Issue: Slow responses Solution: Disable thinking mode for faster generation