Spaces:

anaspro
/

chatbox2

Sleeping

App Files Files Community

chatbox2 / CHANGES.md

anaspro

update

55612d9 26 days ago

preview code

raw

history blame contribute delete

5.24 kB

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

Chatbox2 - Qwen3-14B Update

Summary of Changes

Your chatbox has been successfully upgraded to use Qwen3-14B with thinking/non-thinking mode capabilities!

What Changed

1. Model Upgrade

Old Model: anaspro/Shako-iraqi-4B-it (multimodal)
New Model: Qwen/Qwen3-14B (text-only with thinking capabilities)

2. New Features

Thinking Mode Toggle 🤔

You can now switch between two modes:

Thinking Mode ON (default):
- Best for: Math problems, coding, complex reasoning
- The model shows its reasoning process in <think>...</think> tags
- Uses Temperature=0.6, TopP=0.95, TopK=20
- More detailed and thorough responses
Thinking Mode OFF:
- Best for: General conversation, quick responses
- Faster responses without showing reasoning
- Uses Temperature=0.7, TopP=0.8, TopK=20
- More efficient for casual chat

3. Updated Parameters

Max tokens increased from 2048 to 32768 (matching Qwen3's capabilities)
Optimized generation parameters based on mode
Removed multimodal support (images/videos) as Qwen3-14B is text-only

4. UI Improvements

Added checkbox to toggle thinking mode
Updated title and description
New examples showcasing both modes

How to Use

Basic Usage

Type your message in the textbox
Adjust settings in the sidebar:
- System Prompt: Customize the AI's behavior (default: Iraqi dialect)
- Max New Tokens: Control response length (100-32768)
- Enable Thinking Mode: Toggle between thinking/non-thinking

When to Use Thinking Mode

✅ Enable Thinking Mode for:

Math problems
Coding challenges
Complex logical reasoning
Step-by-step explanations
Problem-solving tasks

❌ Disable Thinking Mode for:

General conversation
Quick questions
Creative writing
Casual chat
When you need faster responses

Advanced: Soft Switching with `/think` and `/no_think`

When Enable Thinking Mode checkbox is ON, you can dynamically control thinking behavior per message using soft switches:

Add /think to your message to force thinking for that specific turn
Add /no_think to your message to skip thinking for that specific turn

Important Notes:

Soft switches only work when the "Enable Thinking Mode" checkbox is checked (ON)
When using /no_think, the model still outputs <think>...</think> tags, but they will be empty
The model follows the most recent instruction in multi-turn conversations
You can add the switch anywhere in your message (beginning or end)

Examples:

User: What is the capital of France? /no_think
Bot: 💬 Response: Paris is the capital of France.

User: Solve this complex equation: x^3 + 2x^2 - 5x + 1 = 0 /think
Bot: 🤔 Thinking Process: Let me approach this step by step...
     💬 Response: The solutions are approximately...

User: How many r's in strawberry? /think
Bot: 🤔 Thinking Process: Let me count each letter: s-t-r-a-w-b-e-r-r-y...
     💬 Response: There are 3 r's in "strawberry".

User: What about blueberry? /no_think
Bot: 💬 Response: There are 2 r's in "blueberry".

User: Really? /think
Bot: 🤔 Thinking Process: Let me recount: b-l-u-e-b-e-r-r-y...
     💬 Response: Yes, there are 2 r's in "blueberry" (positions 7 and 8).

When Soft Switches Don't Work:

If "Enable Thinking Mode" checkbox is OFF, soft switches are ignored
The model will not generate any <think> tags regardless of /think or /no_think in your message

Technical Details

Dependencies Updated

transformers>=4.51.0 (required for Qwen3 support)
Removed: av, timm, gTTS (no longer needed)

Model Configuration

model_id = "Qwen/Qwen3-14B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype=torch.bfloat16
)

Generation Parameters

Thinking Mode:

Temperature: 0.6
Top-P: 0.95
Top-K: 20
Min-P: 0.0

Non-Thinking Mode:

Temperature: 0.7
Top-P: 0.8
Top-K: 20
Min-P: 0.0

Running the Application

python app.py

The app will launch on http://localhost:7860 by default.

Notes

Text-Only: Qwen3-14B doesn't support images, videos, or audio. The multimodal features have been removed.
Context Length: The model supports up to 32,768 tokens natively. For longer contexts (up to 131,072), you can enable YaRN scaling (see Qwen3 documentation).
Iraqi Dialect: The default system prompt is configured for Iraqi Arabic dialect. You can modify this in the System Prompt field.
GPU Requirements: Qwen3-14B requires significant GPU memory. Make sure you have adequate resources.

Reference

For more information about Qwen3-14B capabilities, visit:

Model Page: https://huggingface.co/Qwen/Qwen3-14B
Documentation: https://qwenlm.github.io/blog/qwen3/

Troubleshooting

Issue: KeyError: 'qwen3' Solution: Make sure you have transformers>=4.51.0 installed

Issue: Out of memory errors Solution: Reduce max_new_tokens or use a smaller batch size

Issue: Slow responses Solution: Disable thinking mode for faster generation