YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.


Qwen3 Non-Thinking Tokenizer

This is a modified tokenizer for Qwen3 models that always uses non-thinking mode.

Purpose

The original Qwen3 tokenizer supports both thinking and non-thinking modes via the enable_thinking parameter. This tokenizer removes the thinking mode entirely, ensuring the model responds directly without internal reasoning.

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

# Load non-thinking tokenizer
tokenizer = AutoTokenizer.from_pretrained("tuandunghcmut/qwen3-tokenizer-non-thinking")

# Load any Qwen3 model
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-0.6B")

messages = [{{"role": "user", "content": "Hello!"}}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
# Note: No enable_thinking parameter needed - always non-thinking

inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0]))

Key Differences from Original

Feature Original Qwen3 Tokenizer This Tokenizer
enable_thinking=True Model thinks before responding N/A
enable_thinking=False Direct response (no thinking) Always this mode
Default behavior Thinking enabled Non-thinking only

Base Model

This tokenizer is based on {base_model}.

License

Apache 2.0 (same as base model) """

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support