Spaces:

anaspro
/

chatbox2

Runtime error

App Files Files Community

chatbox2 / CHANGES.md

anaspro

update

55612d9 6 months ago

preview code

raw

history blame contribute delete

5.24 kB

	# Chatbox2 - Qwen3-14B Update

	## Summary of Changes

	Your chatbox has been successfully upgraded to use Qwen3-14B with thinking/non-thinking mode capabilities!

	## What Changed

	### 1. Model Upgrade
	- Old Model: `anaspro/Shako-iraqi-4B-it` (multimodal)
	- New Model: `Qwen/Qwen3-14B` (text-only with thinking capabilities)

	### 2. New Features

	#### Thinking Mode Toggle 🤔
	You can now switch between two modes:

	- Thinking Mode ON (default):
	- Best for: Math problems, coding, complex reasoning
	- The model shows its reasoning process in `<think>...</think>` tags
	- Uses Temperature=0.6, TopP=0.95, TopK=20
	- More detailed and thorough responses

	- Thinking Mode OFF:
	- Best for: General conversation, quick responses
	- Faster responses without showing reasoning
	- Uses Temperature=0.7, TopP=0.8, TopK=20
	- More efficient for casual chat

	### 3. Updated Parameters
	- Max tokens increased from 2048 to 32768 (matching Qwen3's capabilities)
	- Optimized generation parameters based on mode
	- Removed multimodal support (images/videos) as Qwen3-14B is text-only

	### 4. UI Improvements
	- Added checkbox to toggle thinking mode
	- Updated title and description
	- New examples showcasing both modes

	## How to Use

	### Basic Usage
	1. Type your message in the textbox
	2. Adjust settings in the sidebar:
	- System Prompt: Customize the AI's behavior (default: Iraqi dialect)
	- Max New Tokens: Control response length (100-32768)
	- Enable Thinking Mode: Toggle between thinking/non-thinking

	### When to Use Thinking Mode

	✅ Enable Thinking Mode for:
	- Math problems
	- Coding challenges
	- Complex logical reasoning
	- Step-by-step explanations
	- Problem-solving tasks

	❌ Disable Thinking Mode for:
	- General conversation
	- Quick questions
	- Creative writing
	- Casual chat
	- When you need faster responses

	### Advanced: Soft Switching with `/think` and `/no_think`

	When Enable Thinking Mode checkbox is ON, you can dynamically control thinking behavior per message using soft switches:

	- Add `/think` to your message to force thinking for that specific turn
	- Add `/no_think` to your message to skip thinking for that specific turn

	Important Notes:
	- Soft switches only work when the "Enable Thinking Mode" checkbox is checked (ON)
	- When using `/no_think`, the model still outputs `<think>...</think>` tags, but they will be empty
	- The model follows the most recent instruction in multi-turn conversations
	- You can add the switch anywhere in your message (beginning or end)

	Examples:

	```
	User: What is the capital of France? /no_think
	Bot: 💬 Response: Paris is the capital of France.
	```

	```
	User: Solve this complex equation: x^3 + 2x^2 - 5x + 1 = 0 /think
	Bot: 🤔 Thinking Process: Let me approach this step by step...
	💬 Response: The solutions are approximately...
	```

	```
	User: How many r's in strawberry? /think
	Bot: 🤔 Thinking Process: Let me count each letter: s-t-r-a-w-b-e-r-r-y...
	💬 Response: There are 3 r's in "strawberry".

	User: What about blueberry? /no_think
	Bot: 💬 Response: There are 2 r's in "blueberry".

	User: Really? /think
	Bot: 🤔 Thinking Process: Let me recount: b-l-u-e-b-e-r-r-y...
	💬 Response: Yes, there are 2 r's in "blueberry" (positions 7 and 8).
	```

	When Soft Switches Don't Work:
	- If "Enable Thinking Mode" checkbox is OFF, soft switches are ignored
	- The model will not generate any `<think>` tags regardless of `/think` or `/no_think` in your message

	## Technical Details

	### Dependencies Updated
	- `transformers>=4.51.0` (required for Qwen3 support)
	- Removed: `av`, `timm`, `gTTS` (no longer needed)

	### Model Configuration
	```python
	model_id = "Qwen/Qwen3-14B"
	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForCausalLM.from_pretrained(
	model_id,
	device_map="auto",
	torch_dtype=torch.bfloat16
	)
	```

	### Generation Parameters

	Thinking Mode:
	- Temperature: 0.6
	- Top-P: 0.95
	- Top-K: 20
	- Min-P: 0.0

	Non-Thinking Mode:
	- Temperature: 0.7
	- Top-P: 0.8
	- Top-K: 20
	- Min-P: 0.0

	## Running the Application

	```bash
	python app.py
	```

	The app will launch on `http://localhost:7860` by default.

	## Notes

	1. Text-Only: Qwen3-14B doesn't support images, videos, or audio. The multimodal features have been removed.

	2. Context Length: The model supports up to 32,768 tokens natively. For longer contexts (up to 131,072), you can enable YaRN scaling (see Qwen3 documentation).

	3. Iraqi Dialect: The default system prompt is configured for Iraqi Arabic dialect. You can modify this in the System Prompt field.

	4. GPU Requirements: Qwen3-14B requires significant GPU memory. Make sure you have adequate resources.

	## Reference

	For more information about Qwen3-14B capabilities, visit:
	- Model Page: https://huggingface.co/Qwen/Qwen3-14B
	- Documentation: https://qwenlm.github.io/blog/qwen3/

	## Troubleshooting

	Issue: `KeyError: 'qwen3'`
	Solution: Make sure you have `transformers>=4.51.0` installed

	Issue: Out of memory errors
	Solution: Reduce `max_new_tokens` or use a smaller batch size

	Issue: Slow responses
	Solution: Disable thinking mode for faster generation