chatbox2 / CHANGES.md
anaspro
update
55612d9

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

Chatbox2 - Qwen3-14B Update

Summary of Changes

Your chatbox has been successfully upgraded to use Qwen3-14B with thinking/non-thinking mode capabilities!

What Changed

1. Model Upgrade

  • Old Model: anaspro/Shako-iraqi-4B-it (multimodal)
  • New Model: Qwen/Qwen3-14B (text-only with thinking capabilities)

2. New Features

Thinking Mode Toggle πŸ€”

You can now switch between two modes:

  • Thinking Mode ON (default):

    • Best for: Math problems, coding, complex reasoning
    • The model shows its reasoning process in <think>...</think> tags
    • Uses Temperature=0.6, TopP=0.95, TopK=20
    • More detailed and thorough responses
  • Thinking Mode OFF:

    • Best for: General conversation, quick responses
    • Faster responses without showing reasoning
    • Uses Temperature=0.7, TopP=0.8, TopK=20
    • More efficient for casual chat

3. Updated Parameters

  • Max tokens increased from 2048 to 32768 (matching Qwen3's capabilities)
  • Optimized generation parameters based on mode
  • Removed multimodal support (images/videos) as Qwen3-14B is text-only

4. UI Improvements

  • Added checkbox to toggle thinking mode
  • Updated title and description
  • New examples showcasing both modes

How to Use

Basic Usage

  1. Type your message in the textbox
  2. Adjust settings in the sidebar:
    • System Prompt: Customize the AI's behavior (default: Iraqi dialect)
    • Max New Tokens: Control response length (100-32768)
    • Enable Thinking Mode: Toggle between thinking/non-thinking

When to Use Thinking Mode

βœ… Enable Thinking Mode for:

  • Math problems
  • Coding challenges
  • Complex logical reasoning
  • Step-by-step explanations
  • Problem-solving tasks

❌ Disable Thinking Mode for:

  • General conversation
  • Quick questions
  • Creative writing
  • Casual chat
  • When you need faster responses

Advanced: Soft Switching with /think and /no_think

When Enable Thinking Mode checkbox is ON, you can dynamically control thinking behavior per message using soft switches:

  • Add /think to your message to force thinking for that specific turn
  • Add /no_think to your message to skip thinking for that specific turn

Important Notes:

  • Soft switches only work when the "Enable Thinking Mode" checkbox is checked (ON)
  • When using /no_think, the model still outputs <think>...</think> tags, but they will be empty
  • The model follows the most recent instruction in multi-turn conversations
  • You can add the switch anywhere in your message (beginning or end)

Examples:

User: What is the capital of France? /no_think
Bot: πŸ’¬ Response: Paris is the capital of France.
User: Solve this complex equation: x^3 + 2x^2 - 5x + 1 = 0 /think
Bot: πŸ€” Thinking Process: Let me approach this step by step...
     πŸ’¬ Response: The solutions are approximately...
User: How many r's in strawberry? /think
Bot: πŸ€” Thinking Process: Let me count each letter: s-t-r-a-w-b-e-r-r-y...
     πŸ’¬ Response: There are 3 r's in "strawberry".

User: What about blueberry? /no_think
Bot: πŸ’¬ Response: There are 2 r's in "blueberry".

User: Really? /think
Bot: πŸ€” Thinking Process: Let me recount: b-l-u-e-b-e-r-r-y...
     πŸ’¬ Response: Yes, there are 2 r's in "blueberry" (positions 7 and 8).

When Soft Switches Don't Work:

  • If "Enable Thinking Mode" checkbox is OFF, soft switches are ignored
  • The model will not generate any <think> tags regardless of /think or /no_think in your message

Technical Details

Dependencies Updated

  • transformers>=4.51.0 (required for Qwen3 support)
  • Removed: av, timm, gTTS (no longer needed)

Model Configuration

model_id = "Qwen/Qwen3-14B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype=torch.bfloat16
)

Generation Parameters

Thinking Mode:

  • Temperature: 0.6
  • Top-P: 0.95
  • Top-K: 20
  • Min-P: 0.0

Non-Thinking Mode:

  • Temperature: 0.7
  • Top-P: 0.8
  • Top-K: 20
  • Min-P: 0.0

Running the Application

python app.py

The app will launch on http://localhost:7860 by default.

Notes

  1. Text-Only: Qwen3-14B doesn't support images, videos, or audio. The multimodal features have been removed.

  2. Context Length: The model supports up to 32,768 tokens natively. For longer contexts (up to 131,072), you can enable YaRN scaling (see Qwen3 documentation).

  3. Iraqi Dialect: The default system prompt is configured for Iraqi Arabic dialect. You can modify this in the System Prompt field.

  4. GPU Requirements: Qwen3-14B requires significant GPU memory. Make sure you have adequate resources.

Reference

For more information about Qwen3-14B capabilities, visit:

Troubleshooting

Issue: KeyError: 'qwen3' Solution: Make sure you have transformers>=4.51.0 installed

Issue: Out of memory errors Solution: Reduce max_new_tokens or use a smaller batch size

Issue: Slow responses Solution: Disable thinking mode for faster generation