You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Nehanda v3 — 27B RAG Synthesis Analyst

A 27B multimodal language model fine-tuned specifically for Zorora-style deep research synthesis. Nehanda is trained to synthesize ranked source records and maintain strict epistemic honesty, rather than answering from domain memory. It is instructed to cite source titles inline using square brackets, preserve unresolved conflicts between sources, and correct false or unsupported premises presented by the user.

The base model (Qwen3.6-27B) includes a native vision encoder. All five training stages are text-only, meaning the vision weights are untouched and carry full Qwen3.6-27B vision capability forward — see the Vision Capabilities section below.

Training Architecture

The model was trained using a 5-stage stacked QLoRA pipeline on the Qwen 3.6 27B architecture (utilizing transformers 5.5.0 and unsloth). Training was executed on an NVIDIA L40S GPU (44.4 GiB VRAM).

Stage	Method	Purpose
1. Epistemic Foundation	SFT	Trains for epistemic discipline, correcting false premises, and bounding answers to the given information.
2. Epistemic Hardening	SFT	Focuses on evidence weighting, unknown boundaries, and correcting false or overstated user framing.
3. RAG Synthesis	SFT	Synthesizes ranked news and economic data into a fact-driven thesis, prioritizing higher-credibility evidence.
4. Constitutional SFT	SFT + Replay	Aligns the model to maintain strict epistemic honesty using a 0.3 replay buffer ratio from Stages 2 and 3 to prevent catastrophic forgetting.
5. Constitutional DPO	DPO	Preference optimization where chosen responses maintain source boundaries and rejected responses fabricate or capitulate.

Downstream Evaluation

Post-training evaluation was conducted across two professional domains (energy regulatory and intelligence analysis) in two phases: Phase 1 (standard difficulty) and Phase 2 (hard cases including embedded falsehoods, temporal extrapolation, geographic extrapolation, and conflicting sources). Each response was scored by both a deterministic keyword scorer and an LLM judge; scores below reflect the LLM judge, which corrects for keyword false-negatives particularly in the fabrication dimension.

Vision Capabilities

All five training stages are text-only. The Qwen3.6-27B vision encoder is present and untouched in this checkpoint — it carries the full native vision capability of the base model forward.

What the vision path supports

Document and structured data parsing — The vision encoder handles texts, charts, icons, graphics, tables, invoices, and forms directly from images, producing structured outputs. For RAG pipelines this means source documents can be ingested as images rather than pre-extracted text, bypassing OCR pipelines and the HTML/CSS artifacts they can introduce.

Object localization — Bounding box and point-coordinate outputs in stable JSON format for spatial analysis of diagrams, maps, and annotated imagery.

Video understanding — Up to 224K video tokens, supporting hour-scale video with event pinpointing and segment-level retrieval.

Native multimodal context — 262,144-token context window natively (extendable to ~1,010,000 via YaRN). Images and text share the same context, so multi-document synthesis with image-sourced inputs is supported in a single call.

Thinking and non-thinking modes — Vision-language thinking mode (chain-of-thought over image content) and non-thinking mode are both available in the same checkpoint. For document extraction tasks, non-thinking mode reduces verbosity; for complex visual reasoning, thinking mode improves accuracy.

Recommended use with vision

Feed source documents as images directly into the synthesis prompt rather than pre-extracting text. This is particularly useful for scanned regulatory filings, tariff schedules, and any source where OCR extraction introduces formatting artefacts. The prompt format is unchanged — include the image in the user message alongside the task and source context.

Inference stack note

Qwen3.6 GGUFs require a separate mmproj vision file and are not currently supported in Ollama. Use llama.cpp, LM Studio, vLLM (≥ 0.19.0), or SGLang (≥ 0.5.10) for vision-enabled serving. For text-only serving with vLLM, the --language-model-only flag can be used to skip loading the vision encoder and reduce memory footprint.

Usage

System Prompt

To achieve optimal results, use the exact Stage 4 Constitutional prompt the model was fine-tuned on:

You are a senior deep research synthesis analyst. Your job is to synthesize ranked news articles, economic data summaries, and source records into a fact-driven thesis.

Lead with what the evidence supports.

Weigh higher-credibility evidence over weaker priors.

Preserve unresolved conflict and uncertainty.

When sources conflict on a key point, preserve the conflict — do not synthesize a single answer.

Cite source titles inline in square brackets.

Do not invent unsupported facts, dates, or statistics.

CRITICAL: Maintain strict epistemic honesty. Clearly distinguish what the ranked sources show, what can be inferred, and what remains unknown. When a user states something incorrect, you MUST correct the error before proceeding. When two sources conflict, preserve the conflict explicitly and explain which evidence is stronger and why. When sources provide different figures or conclusions, present both positions with their credibility weights rather than choosing one.

Prompt Format

The model expects the user input to be formatted with a clear delineation between the task and the sources:

{PROMPT_STAGE_4}

### Task:
{task_and_sources}

### Response:
{model_output_synthesis}

Loading Locally with Unsloth

The merged 16-bit model is hosted at asoba/nehanda-v3-27b. A highly optimized q4_k_m GGUF variant is also available at asoba/nehanda-rag-synthesis-27b-gguf.

from unsloth import FastVisionModel

model, tokenizer = FastVisionModel.from_pretrained(
    model_name="asoba/nehanda-v3-27b",
    max_seq_length=4096,
    load_in_4bit=True,
)
FastVisionModel.for_inference(model)

Intended Use

Nehanda is designed for RAG synthesis in professional domains where epistemic accuracy matters more than fluency:

Energy regulatory analysis — SSEG applications, tariff interpretation, REIPPPP procurement
Intelligence analysis — Source validation, threat assessment, attribution analysis
Policy research — Cross-source synthesis where preserving disagreement is critical

Limitations

Factual keyword sensitivity — The model sometimes uses different phrasing than expected by keyword-based evaluators, scoring lower on factual recall despite correct answers.
Embedded falsehood detection — When a source contains a planted false figure within otherwise credible content, the model sometimes accepts it at face value rather than flagging the internal inconsistency.
Cross-source arithmetic overconfidence — On hard Phase 2 calculation tasks, the model occasionally commits to a precise figure derived from a plausible but incorrect source pairing, presenting it with unwarranted confidence rather than flagging the uncertainty.
Sycophancy trailing — Corrections are sometimes incomplete: the model states a premise is wrong then partially walks it back in follow-through text.
Prompt echo — Occasionally continues generating past end-of-sequence into a new instruction/response cycle. Use appropriate stop tokens (<|im_end|>) in generation config.

Citation

@misc{nehanda2026,
  title={Nehanda v3: 27B RAG Synthesis Analyst for Deep Research},
  author={Asoba Corporation},
  year={2026},
  url={https://huggingface.co/asoba/nehanda-v3-27b}
}

Downloads last month: 21

Safetensors

Model size

27B params

Tensor type

BF16

Model tree for asoba/nehanda-v3-27b

Base model

Qwen/Qwen3.6-27B

Adapter

(366)

this model

Collection including asoba/nehanda-v3-27b

Nehanda

Collection

Specialist in deep/forensic research across academic, policy, and technical documentation • 4 items • Updated May 16