| # Chatbox2 - Qwen3-14B Update |
|
|
| ## Summary of Changes |
|
|
| Your chatbox has been successfully upgraded to use **Qwen3-14B** with thinking/non-thinking mode capabilities! |
|
|
| ## What Changed |
|
|
| ### 1. **Model Upgrade** |
| - **Old Model**: `anaspro/Shako-iraqi-4B-it` (multimodal) |
| - **New Model**: `Qwen/Qwen3-14B` (text-only with thinking capabilities) |
|
|
| ### 2. **New Features** |
|
|
| #### **Thinking Mode Toggle** π€ |
| You can now switch between two modes: |
|
|
| - **Thinking Mode ON** (default): |
| - Best for: Math problems, coding, complex reasoning |
| - The model shows its reasoning process in `<think>...</think>` tags |
| - Uses Temperature=0.6, TopP=0.95, TopK=20 |
| - More detailed and thorough responses |
|
|
| - **Thinking Mode OFF**: |
| - Best for: General conversation, quick responses |
| - Faster responses without showing reasoning |
| - Uses Temperature=0.7, TopP=0.8, TopK=20 |
| - More efficient for casual chat |
|
|
| ### 3. **Updated Parameters** |
| - Max tokens increased from 2048 to 32768 (matching Qwen3's capabilities) |
| - Optimized generation parameters based on mode |
| - Removed multimodal support (images/videos) as Qwen3-14B is text-only |
|
|
| ### 4. **UI Improvements** |
| - Added checkbox to toggle thinking mode |
| - Updated title and description |
| - New examples showcasing both modes |
|
|
| ## How to Use |
|
|
| ### Basic Usage |
| 1. Type your message in the textbox |
| 2. Adjust settings in the sidebar: |
| - **System Prompt**: Customize the AI's behavior (default: Iraqi dialect) |
| - **Max New Tokens**: Control response length (100-32768) |
| - **Enable Thinking Mode**: Toggle between thinking/non-thinking |
|
|
| ### When to Use Thinking Mode |
|
|
| β
**Enable Thinking Mode for:** |
| - Math problems |
| - Coding challenges |
| - Complex logical reasoning |
| - Step-by-step explanations |
| - Problem-solving tasks |
|
|
| β **Disable Thinking Mode for:** |
| - General conversation |
| - Quick questions |
| - Creative writing |
| - Casual chat |
| - When you need faster responses |
|
|
| ### Advanced: Soft Switching with `/think` and `/no_think` |
| |
| When **Enable Thinking Mode** checkbox is ON, you can dynamically control thinking behavior per message using soft switches: |
| |
| - Add `/think` to your message to **force thinking** for that specific turn |
| - Add `/no_think` to your message to **skip thinking** for that specific turn |
|
|
| **Important Notes:** |
| - Soft switches only work when the "Enable Thinking Mode" checkbox is checked (ON) |
| - When using `/no_think`, the model still outputs `<think>...</think>` tags, but they will be empty |
| - The model follows the most recent instruction in multi-turn conversations |
| - You can add the switch anywhere in your message (beginning or end) |
|
|
| **Examples:** |
|
|
| ``` |
| User: What is the capital of France? /no_think |
| Bot: π¬ Response: Paris is the capital of France. |
| ``` |
|
|
| ``` |
| User: Solve this complex equation: x^3 + 2x^2 - 5x + 1 = 0 /think |
| Bot: π€ Thinking Process: Let me approach this step by step... |
| π¬ Response: The solutions are approximately... |
| ``` |
|
|
| ``` |
| User: How many r's in strawberry? /think |
| Bot: π€ Thinking Process: Let me count each letter: s-t-r-a-w-b-e-r-r-y... |
| π¬ Response: There are 3 r's in "strawberry". |
| |
| User: What about blueberry? /no_think |
| Bot: π¬ Response: There are 2 r's in "blueberry". |
| |
| User: Really? /think |
| Bot: π€ Thinking Process: Let me recount: b-l-u-e-b-e-r-r-y... |
| π¬ Response: Yes, there are 2 r's in "blueberry" (positions 7 and 8). |
| ``` |
|
|
| **When Soft Switches Don't Work:** |
| - If "Enable Thinking Mode" checkbox is OFF, soft switches are ignored |
| - The model will not generate any `<think>` tags regardless of `/think` or `/no_think` in your message |
|
|
| ## Technical Details |
|
|
| ### Dependencies Updated |
| - `transformers>=4.51.0` (required for Qwen3 support) |
| - Removed: `av`, `timm`, `gTTS` (no longer needed) |
|
|
| ### Model Configuration |
| ```python |
| model_id = "Qwen/Qwen3-14B" |
| tokenizer = AutoTokenizer.from_pretrained(model_id) |
| model = AutoModelForCausalLM.from_pretrained( |
| model_id, |
| device_map="auto", |
| torch_dtype=torch.bfloat16 |
| ) |
| ``` |
|
|
| ### Generation Parameters |
|
|
| **Thinking Mode:** |
| - Temperature: 0.6 |
| - Top-P: 0.95 |
| - Top-K: 20 |
| - Min-P: 0.0 |
|
|
| **Non-Thinking Mode:** |
| - Temperature: 0.7 |
| - Top-P: 0.8 |
| - Top-K: 20 |
| - Min-P: 0.0 |
|
|
| ## Running the Application |
|
|
| ```bash |
| python app.py |
| ``` |
|
|
| The app will launch on `http://localhost:7860` by default. |
|
|
| ## Notes |
|
|
| 1. **Text-Only**: Qwen3-14B doesn't support images, videos, or audio. The multimodal features have been removed. |
|
|
| 2. **Context Length**: The model supports up to 32,768 tokens natively. For longer contexts (up to 131,072), you can enable YaRN scaling (see Qwen3 documentation). |
|
|
| 3. **Iraqi Dialect**: The default system prompt is configured for Iraqi Arabic dialect. You can modify this in the System Prompt field. |
|
|
| 4. **GPU Requirements**: Qwen3-14B requires significant GPU memory. Make sure you have adequate resources. |
|
|
| ## Reference |
|
|
| For more information about Qwen3-14B capabilities, visit: |
| - Model Page: https://huggingface.co/Qwen/Qwen3-14B |
| - Documentation: https://qwenlm.github.io/blog/qwen3/ |
|
|
| ## Troubleshooting |
|
|
| **Issue**: `KeyError: 'qwen3'` |
| **Solution**: Make sure you have `transformers>=4.51.0` installed |
|
|
| **Issue**: Out of memory errors |
| **Solution**: Reduce `max_new_tokens` or use a smaller batch size |
|
|
| **Issue**: Slow responses |
| **Solution**: Disable thinking mode for faster generation |
|
|
|
|