Instructions to use asoba/nehanda-v3-27b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use asoba/nehanda-v3-27b with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- Unsloth Studio new
How to use asoba/nehanda-v3-27b with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for asoba/nehanda-v3-27b to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for asoba/nehanda-v3-27b to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for asoba/nehanda-v3-27b to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="asoba/nehanda-v3-27b", max_seq_length=2048, )
Nehanda v3 — 27B RAG Synthesis Analyst
A 27B multimodal language model fine-tuned specifically for Zorora-style deep research synthesis. Nehanda is trained to synthesize ranked source records and maintain strict epistemic honesty, rather than answering from domain memory. It is instructed to cite source titles inline using square brackets, preserve unresolved conflicts between sources, and correct false or unsupported premises presented by the user.
The base model (Qwen3.6-27B) includes a native vision encoder. All five training stages are text-only, meaning the vision weights are untouched and carry full Qwen3.6-27B vision capability forward — see the Vision Capabilities section below.
Training Architecture
The model was trained using a 5-stage stacked QLoRA pipeline on the Qwen 3.6 27B architecture (utilizing transformers 5.5.0 and unsloth). Training was executed on an NVIDIA L40S GPU (44.4 GiB VRAM).
| Stage | Method | Purpose |
|---|---|---|
| 1. Epistemic Foundation | SFT | Trains for epistemic discipline, correcting false premises, and bounding answers to the given information. |
| 2. Epistemic Hardening | SFT | Focuses on evidence weighting, unknown boundaries, and correcting false or overstated user framing. |
| 3. RAG Synthesis | SFT | Synthesizes ranked news and economic data into a fact-driven thesis, prioritizing higher-credibility evidence. |
| 4. Constitutional SFT | SFT + Replay | Aligns the model to maintain strict epistemic honesty using a 0.3 replay buffer ratio from Stages 2 and 3 to prevent catastrophic forgetting. |
| 5. Constitutional DPO | DPO | Preference optimization where chosen responses maintain source boundaries and rejected responses fabricate or capitulate. |
Downstream Evaluation
Post-training evaluation was conducted across two professional domains (energy regulatory and intelligence analysis) in two phases: Phase 1 (standard difficulty) and Phase 2 (hard cases including embedded falsehoods, temporal extrapolation, geographic extrapolation, and conflicting sources). Each response was scored by both a deterministic keyword scorer and an LLM judge; scores below reflect the LLM judge, which corrects for keyword false-negatives particularly in the fabrication dimension.
Vision Capabilities
All five training stages are text-only. The Qwen3.6-27B vision encoder is present and untouched in this checkpoint — it carries the full native vision capability of the base model forward.
What the vision path supports
Document and structured data parsing — The vision encoder handles texts, charts, icons, graphics, tables, invoices, and forms directly from images, producing structured outputs. For RAG pipelines this means source documents can be ingested as images rather than pre-extracted text, bypassing OCR pipelines and the HTML/CSS artifacts they can introduce.
Object localization — Bounding box and point-coordinate outputs in stable JSON format for spatial analysis of diagrams, maps, and annotated imagery.
Video understanding — Up to 224K video tokens, supporting hour-scale video with event pinpointing and segment-level retrieval.
Native multimodal context — 262,144-token context window natively (extendable to ~1,010,000 via YaRN). Images and text share the same context, so multi-document synthesis with image-sourced inputs is supported in a single call.
Thinking and non-thinking modes — Vision-language thinking mode (chain-of-thought over image content) and non-thinking mode are both available in the same checkpoint. For document extraction tasks, non-thinking mode reduces verbosity; for complex visual reasoning, thinking mode improves accuracy.
Recommended use with vision
Feed source documents as images directly into the synthesis prompt rather than pre-extracting text. This is particularly useful for scanned regulatory filings, tariff schedules, and any source where OCR extraction introduces formatting artefacts. The prompt format is unchanged — include the image in the user message alongside the task and source context.
Inference stack note
Qwen3.6 GGUFs require a separate mmproj vision file and are not currently supported in Ollama. Use llama.cpp, LM Studio, vLLM (≥ 0.19.0), or SGLang (≥ 0.5.10) for vision-enabled serving. For text-only serving with vLLM, the --language-model-only flag can be used to skip loading the vision encoder and reduce memory footprint.
Usage
System Prompt
To achieve optimal results, use the exact Stage 4 Constitutional prompt the model was fine-tuned on:
You are a senior deep research synthesis analyst. Your job is to synthesize ranked news articles, economic data summaries, and source records into a fact-driven thesis.
- Lead with what the evidence supports.
- Weigh higher-credibility evidence over weaker priors.
- Preserve unresolved conflict and uncertainty.
- When sources conflict on a key point, preserve the conflict — do not synthesize a single answer.
- Cite source titles inline in square brackets.
- Do not invent unsupported facts, dates, or statistics.
CRITICAL: Maintain strict epistemic honesty. Clearly distinguish what the ranked sources show, what can be inferred, and what remains unknown. When a user states something incorrect, you MUST correct the error before proceeding. When two sources conflict, preserve the conflict explicitly and explain which evidence is stronger and why. When sources provide different figures or conclusions, present both positions with their credibility weights rather than choosing one.
Prompt Format
The model expects the user input to be formatted with a clear delineation between the task and the sources:
{PROMPT_STAGE_4}
### Task:
{task_and_sources}
### Response:
{model_output_synthesis}
Loading Locally with Unsloth
The merged 16-bit model is hosted at asoba/nehanda-v3-27b. A highly optimized q4_k_m GGUF variant is also available at asoba/nehanda-rag-synthesis-27b-gguf.
from unsloth import FastVisionModel
model, tokenizer = FastVisionModel.from_pretrained(
model_name="asoba/nehanda-v3-27b",
max_seq_length=4096,
load_in_4bit=True,
)
FastVisionModel.for_inference(model)
Intended Use
Nehanda is designed for RAG synthesis in professional domains where epistemic accuracy matters more than fluency:
- Energy regulatory analysis — SSEG applications, tariff interpretation, REIPPPP procurement
- Intelligence analysis — Source validation, threat assessment, attribution analysis
- Policy research — Cross-source synthesis where preserving disagreement is critical
Limitations
- Factual keyword sensitivity — The model sometimes uses different phrasing than expected by keyword-based evaluators, scoring lower on factual recall despite correct answers.
- Embedded falsehood detection — When a source contains a planted false figure within otherwise credible content, the model sometimes accepts it at face value rather than flagging the internal inconsistency.
- Cross-source arithmetic overconfidence — On hard Phase 2 calculation tasks, the model occasionally commits to a precise figure derived from a plausible but incorrect source pairing, presenting it with unwarranted confidence rather than flagging the uncertainty.
- Sycophancy trailing — Corrections are sometimes incomplete: the model states a premise is wrong then partially walks it back in follow-through text.
- Prompt echo — Occasionally continues generating past end-of-sequence into a new instruction/response cycle. Use appropriate stop tokens (
<|im_end|>) in generation config.
Citation
@misc{nehanda2026,
title={Nehanda v3: 27B RAG Synthesis Analyst for Deep Research},
author={Asoba Corporation},
year={2026},
url={https://huggingface.co/asoba/nehanda-v3-27b}
}
- Downloads last month
- 21
Model tree for asoba/nehanda-v3-27b
Base model
Qwen/Qwen3.6-27B