YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Qwen3 Non-Thinking Tokenizer
This is a modified tokenizer for Qwen3 models that always uses non-thinking mode.
Purpose
The original Qwen3 tokenizer supports both thinking and non-thinking modes via the enable_thinking parameter.
This tokenizer removes the thinking mode entirely, ensuring the model responds directly without internal reasoning.
Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
# Load non-thinking tokenizer
tokenizer = AutoTokenizer.from_pretrained("tuandunghcmut/qwen3-tokenizer-non-thinking")
# Load any Qwen3 model
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-0.6B")
messages = [{{"role": "user", "content": "Hello!"}}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
# Note: No enable_thinking parameter needed - always non-thinking
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0]))
Key Differences from Original
| Feature | Original Qwen3 Tokenizer | This Tokenizer |
|---|---|---|
enable_thinking=True |
Model thinks before responding | N/A |
enable_thinking=False |
Direct response (no thinking) | Always this mode |
| Default behavior | Thinking enabled | Non-thinking only |
Base Model
This tokenizer is based on {base_model}.
License
Apache 2.0 (same as base model) """
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support