Qwen3.5-4B-Safety-Thinking
4B parameters • 1M context possible • Safety reasoning
🤗 Model | [📖 arXiv in progress]
Model Overview
This model has been specifically optimized to excel in several key areas:
- Structured Reasoning Quality: Enhanced ability to break down complex problems and think step-by-step.
- Instruction Adherence: Superior capability to follow strict guidelines and constraints provided in prompts.
- Safety-Aligned Behavior: Designed to operate safely in practical assistant and autonomous agent workflows.
- Robustness: Increased resistance against common misalignment patterns and adversarial inputs.
It leverages a rigorous post-training stack that combines supervised reasoning tuning with alignment-oriented optimization, focusing heavily on reliable behavior in real-world applications.
Training Approach
- Base Model:
Qwen/Qwen3.5-4B - Methodology: LoRA-based Supervised Fine-Tuning (SFT) resulting in a merged BF16 checkpoint.
- Reasoning Architecture: Native support and normalization for the
<think>...</think>format to explicitly separate the reasoning process from the final output. - Optimization Focus: Enhancing safety reasoning, maximizing controllability, and ensuring response consistency.
Data
This model was trained on Merlin Research private datasets built from internal R&D pipelines for:
- reasoning reliability improvements,
- instruction-following robustness,
- safety behavior refinement,
- misalignment reduction in applied scenarios.
- Using Anthropic’s framework Bloom&Petri for for better behavioral alignment.
(https://www.anthropic.com/research/petri-open-source-auditing)
Intended Use Cases
This model is particularly well-suited for:
- Building safety-oriented reasoning assistants and chatbots.
- Tasks requiring strict, constrained instruction-following.
- Experimentation in AI alignment, safety research, and robustness testing.
- Agentic workflows where predictable and safe autonomous behavior is required.
GGUF Status
GGUF artifacts are currently in active development and validation.
At this stage, we recommend using the BF16 Transformers checkpoint for stable results. Updated and fully validated GGUF builds will be published in future releases.
For Ollama
ollama create qwen35-safety-thinking-bf16 -f Modelfile
ollama run qwen35-safety-thinking-bf16
Organization
Designed, developed, and maintained with ❤️ by Merlin Research.
Citation
If you utilize this model in your research or applications, please cite it as follows:
@misc{qwen3.5-4b-safety-thinking,
author = {Merlin Research},
title = {Qwen3.5-4B-Safety-Thinking: A Reasoning and Safety Aligned Model},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/MerlinSafety/Qwen3.5-4B-Safety-Thinking}},
note = {Base model: Qwen/Qwen3.5-4B}
}
- Downloads last month
- 1,691

