ZiRA-Researcher
ZiRA-Researcher is a fine-tuned version of Qwen3.5-4B, developed under the 0xvoid project. It's built specifically for deep research tasks, multi-step reasoning, and complex question answering, with a particular emphasis on catching and correcting its own mistakes mid-generation.
If you've ever used a model that confidently states something wrong and just... keeps going, that's exactly what ZiRA-Researcher is trained not to do.
What's Different Here
The base Qwen3.5-4B is already a strong reasoning model. ZiRA-Researcher takes that foundation and sharpens it toward a specific use case: research-grade responses where accuracy matters more than speed and self-doubt is a feature, not a bug.
Three things define this fine-tune:
1. Error self-correction
ZiRA doesn't just think before it answers, it actively revisits its own reasoning chain. During training, the model was exposed to examples where mid-chain corrections were necessary and rewarded. In practice, you'll see it catch faulty assumptions and revise them before committing to a final answer, rather than rationalizing bad premises all the way to a wrong conclusion.
2. Research-oriented instruction following
The model is tuned on datasets from state-of-the-art frontier models, responses that demonstrate what good research synthesis actually looks like. Structured arguments, source-aware hedging, citing uncertainty where it exists, and building conclusions incrementally rather than pattern-matching to the nearest plausible answer.
3. Long-horizon coherence
Complex research questions often require holding a lot of context at once. The Qwen3.5 architecture natively supports up to 262K tokens, and ZiRA-Researcher is fine-tuned to actually use that window productively, staying coherent and consistent across long reasoning chains without drifting.
Model Details
| Property | Value |
|---|---|
| Base Model | Qwen/Qwen3.5-4B |
| Parameters | ~4B |
| Architecture | Gated Delta Network + Sparse MoE hybrid |
| Context Length | 262,144 tokens (native) |
| Training Method | Supervised Fine-Tuning (SFT) via TRL |
| Thinking Mode | Enabled by default (<think>...</think>) |
| Developer | 0xvoid |
Training Metrics

Training Loss

Mean Token Accuracy
The hybrid architecture Qwen3.5 uses — Gated Delta Networks layered with sparse Mixture-of-Experts — gives this model a surprisingly good throughput-to-quality ratio for its size. It punches above 4B in most reasoning benchmarks, which makes it a practical choice if you're running inference locally or on a budget.
Training Data
ZiRA-Researcher was trained on curated, high-quality datasets sourced from state-of-the-art model outputs, specifically selected to reflect:
- Deep research synthesis and academic-style reasoning
- Multi-step logical deduction with explicit intermediate steps
- Complex Q&A pairs that require cross-referencing multiple sub-claims
- Instances of error detection and self-correction within the chain-of-thought
The goal was to teach the model what good thinking looks like, using examples generated by frontier models as the standard.
Quickstart
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "0xvoid0000/zira-researcher"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype="auto",
device_map="auto"
)
messages = [
{
"role": "user",
"content": "What are the key limitations of transformer-based architectures for long-horizon reasoning tasks, and how do recent hybrid approaches attempt to address them?"
}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=4096,
temperature=1.0,
top_p=0.95,
top_k=20,
do_sample=True,
presence_penalty=1.5,
)
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True))
By default, the model will produce a <think>...</think> block before the final response. That's intentional — it's where the self-correction happens. If you want direct output without the reasoning trace, you can disable it:
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
enable_thinking=False # skips the <think> block
)
Recommended Sampling Parameters
These are the settings that tend to work well for research-style queries:
| Mode | temperature | top_p | top_k | presence_penalty |
|---|---|---|---|---|
| Thinking (general research) | 1.0 | 0.95 | 20 | 1.5 |
| Thinking (precise/technical) | 0.6 | 0.95 | 20 | 0.0 |
| Direct (no thinking) | 0.7 | 0.8 | 20 | 1.5 |
For particularly involved questions, think graduate-level exam problems, multi-document synthesis, or long chains of logical deduction, giving the model room to breathe helps. Set max_new_tokens to at least 8192, and don't be surprised if it uses most of it.
What It's Good At
- Research synthesis — combining information from multiple sub-questions into a coherent, well-structured answer
- Hypothesis-driven reasoning — forming a claim, stress-testing it, and revising if the logic doesn't hold
- Error-aware generation — catching faulty premises or arithmetic mistakes within the thinking chain before they propagate
- Long-context tasks — sustained coherence across documents, conversation history, or multi-stage problems
- Technical deep dives — STEM, CS theory, economics, philosophy of science, and adjacent domains
What It's Not
ZiRA-Researcher is not a general-purpose chat assistant. It's tuned for deliberate, thoughtful responses to complex questions, if you're looking for something snappy and conversational, this isn't it. The thinking traces can get long. That's by design.
It also doesn't have real-time web access or retrieval built in. For RAG setups, treat it as the reasoning engine and pipe the retrieved context into the prompt.
Limitations
Like any fine-tune at 4B parameters, ZiRA-Researcher has a ceiling. On highly specialized domains with narrow technical vocabulary, it can still confabulate, though the self-correction mechanism does catch a meaningful fraction of these cases. On genuinely ambiguous or underspecified questions, it tends to lay out the uncertainty rather than pick an arbitrary answer, which is usually the right call but can feel unsatisfying if you just want a direct response.
The model inherits Qwen3.5-4B's 201-language support at the architecture level, but ZiRA-Researcher's fine-tuning was primarily English-focused. Non-English research queries will work but may not reflect the same quality improvements.
Acknowledgements
Built on top of Qwen3.5-4B by the Qwen Team at Alibaba. Fine-tuned using TRL. Part of the ZiRA model family developed under the 0xvoid project.
ZiRA-Researcher is part of the ongoing 0xvoid model series. More variants incoming.
- Downloads last month
- 90