ZiRA-Researcher

ZiRA-Researcher Logo

ZiRA-Researcher is a fine-tuned version of Qwen3.5-4B, developed under the 0xvoid project. It's built specifically for deep research tasks, multi-step reasoning, and complex question answering, with a particular emphasis on catching and correcting its own mistakes mid-generation.

If you've ever used a model that confidently states something wrong and just... keeps going, that's exactly what ZiRA-Researcher is trained not to do.

What's Different Here

The base Qwen3.5-4B is already a strong reasoning model. ZiRA-Researcher takes that foundation and sharpens it toward a specific use case: research-grade responses where accuracy matters more than speed and self-doubt is a feature, not a bug.

Three things define this fine-tune:

1. Error self-correction
ZiRA doesn't just think before it answers, it actively revisits its own reasoning chain. During training, the model was exposed to examples where mid-chain corrections were necessary and rewarded. In practice, you'll see it catch faulty assumptions and revise them before committing to a final answer, rather than rationalizing bad premises all the way to a wrong conclusion.

2. Research-oriented instruction following
The model is tuned on datasets from state-of-the-art frontier models, responses that demonstrate what good research synthesis actually looks like. Structured arguments, source-aware hedging, citing uncertainty where it exists, and building conclusions incrementally rather than pattern-matching to the nearest plausible answer.

3. Long-horizon coherence
Complex research questions often require holding a lot of context at once. The Qwen3.5 architecture natively supports up to 262K tokens, and ZiRA-Researcher is fine-tuned to actually use that window productively, staying coherent and consistent across long reasoning chains without drifting.

Model Details

Property	Value
Base Model	Qwen/Qwen3.5-4B
Parameters	~4B
Architecture	Gated Delta Network + Sparse MoE hybrid
Context Length	262,144 tokens (native)
Training Method	Supervised Fine-Tuning (SFT) via TRL
Thinking Mode	Enabled by default (`<think>...</think>`)
Developer	0xvoid

Training Metrics

Training Loss

Mean Token Accuracy

The hybrid architecture Qwen3.5 uses — Gated Delta Networks layered with sparse Mixture-of-Experts — gives this model a surprisingly good throughput-to-quality ratio for its size. It punches above 4B in most reasoning benchmarks, which makes it a practical choice if you're running inference locally or on a budget.

Training Data

ZiRA-Researcher was trained on curated, high-quality datasets sourced from state-of-the-art model outputs, specifically selected to reflect:

Deep research synthesis and academic-style reasoning
Multi-step logical deduction with explicit intermediate steps
Complex Q&A pairs that require cross-referencing multiple sub-claims
Instances of error detection and self-correction within the chain-of-thought

The goal was to teach the model what good thinking looks like, using examples generated by frontier models as the standard.

Quickstart

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "0xvoid0000/zira-researcher"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype="auto",
    device_map="auto"
)

messages = [
    {
        "role": "user",
        "content": "What are the key limitations of transformer-based architectures for long-horizon reasoning tasks, and how do recent hybrid approaches attempt to address them?"
    }
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

inputs = tokenizer(text, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=4096,
    temperature=1.0,
    top_p=0.95,
    top_k=20,
    do_sample=True,
    presence_penalty=1.5,
)

print(tokenizer.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True))

By default, the model will produce a <think>...</think> block before the final response. That's intentional — it's where the self-correction happens. If you want direct output without the reasoning trace, you can disable it:

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=False  # skips the <think> block
)

Recommended Sampling Parameters

These are the settings that tend to work well for research-style queries:

Mode	temperature	top_p	top_k	presence_penalty
Thinking (general research)	1.0	0.95	20	1.5
Thinking (precise/technical)	0.6	0.95	20	0.0
Direct (no thinking)	0.7	0.8	20	1.5

For particularly involved questions, think graduate-level exam problems, multi-document synthesis, or long chains of logical deduction, giving the model room to breathe helps. Set max_new_tokens to at least 8192, and don't be surprised if it uses most of it.

What It's Good At

Research synthesis — combining information from multiple sub-questions into a coherent, well-structured answer
Hypothesis-driven reasoning — forming a claim, stress-testing it, and revising if the logic doesn't hold
Error-aware generation — catching faulty premises or arithmetic mistakes within the thinking chain before they propagate
Long-context tasks — sustained coherence across documents, conversation history, or multi-stage problems
Technical deep dives — STEM, CS theory, economics, philosophy of science, and adjacent domains

What It's Not

ZiRA-Researcher is not a general-purpose chat assistant. It's tuned for deliberate, thoughtful responses to complex questions, if you're looking for something snappy and conversational, this isn't it. The thinking traces can get long. That's by design.

It also doesn't have real-time web access or retrieval built in. For RAG setups, treat it as the reasoning engine and pipe the retrieved context into the prompt.

Limitations

Like any fine-tune at 4B parameters, ZiRA-Researcher has a ceiling. On highly specialized domains with narrow technical vocabulary, it can still confabulate, though the self-correction mechanism does catch a meaningful fraction of these cases. On genuinely ambiguous or underspecified questions, it tends to lay out the uncertainty rather than pick an arbitrary answer, which is usually the right call but can feel unsatisfying if you just want a direct response.

The model inherits Qwen3.5-4B's 201-language support at the architecture level, but ZiRA-Researcher's fine-tuning was primarily English-focused. Non-English research queries will work but may not reflect the same quality improvements.