Qwen3.5-4B-Safety-Thinking

QwenMerlin

4B parameters • 1M context possible • Safety reasoning

🤗 Model | [📖 arXiv in progress]

Model Overview

qwen3.5_small_size_score

This model has been specifically optimized to excel in several key areas:

  • Structured Reasoning Quality: Enhanced ability to break down complex problems and think step-by-step.
  • Instruction Adherence: Superior capability to follow strict guidelines and constraints provided in prompts.
  • Safety-Aligned Behavior: Designed to operate safely in practical assistant and autonomous agent workflows.
  • Robustness: Increased resistance against common misalignment patterns and adversarial inputs.

It leverages a rigorous post-training stack that combines supervised reasoning tuning with alignment-oriented optimization, focusing heavily on reliable behavior in real-world applications.

Training Approach

  • Base Model: Qwen/Qwen3.5-4B
  • Methodology: LoRA-based Supervised Fine-Tuning (SFT) resulting in a merged BF16 checkpoint.
  • Reasoning Architecture: Native support and normalization for the <think>...</think> format to explicitly separate the reasoning process from the final output.
  • Optimization Focus: Enhancing safety reasoning, maximizing controllability, and ensuring response consistency.

Data

This model was trained on Merlin Research private datasets built from internal R&D pipelines for:

  • reasoning reliability improvements,
  • instruction-following robustness,
  • safety behavior refinement,
  • misalignment reduction in applied scenarios.
  • Using Anthropic’s framework Bloom&Petri for for better behavioral alignment.

petri (https://www.anthropic.com/research/petri-open-source-auditing)

Intended Use Cases

This model is particularly well-suited for:

  • Building safety-oriented reasoning assistants and chatbots.
  • Tasks requiring strict, constrained instruction-following.
  • Experimentation in AI alignment, safety research, and robustness testing.
  • Agentic workflows where predictable and safe autonomous behavior is required.

GGUF Status

GGUF artifacts are currently in active development and validation.

At this stage, we recommend using the BF16 Transformers checkpoint for stable results. Updated and fully validated GGUF builds will be published in future releases.

For Ollama

ollama create qwen35-safety-thinking-bf16 -f Modelfile
ollama run qwen35-safety-thinking-bf16

Organization

Designed, developed, and maintained with ❤️ by Merlin Research.

Citation

If you utilize this model in your research or applications, please cite it as follows:

@misc{qwen3.5-4b-safety-thinking,
  author = {Merlin Research},
  title = {Qwen3.5-4B-Safety-Thinking: A Reasoning and Safety Aligned Model},
  year = {2026},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/MerlinSafety/Qwen3.5-4B-Safety-Thinking}},
  note = {Base model: Qwen/Qwen3.5-4B}
}
Downloads last month
1,691
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MerlinSafety/Qwen3.5-4B-Safety-Thinking

Finetuned
Qwen/Qwen3.5-4B
Finetuned
(22)
this model