metadata
language: en
license: apache-2.0
tags:
- doc-to-lora
- lora
- hypernetwork
- context-distillation
- needle-in-a-haystack
- perceiver
base_model: Qwen/Qwen3-1.7B
Doc-to-LoRA — NIAH Proof of Concept
A 144 M parameter Perceiver hypernetwork trained on Qwen/Qwen3-1.7B. Reads a document once, outputs LoRA deltas, and lets the base LLM answer questions without the document ever appearing in the context window.
Based on Doc-to-LoRA (Charakorn et al., 2026).
Results
| Metric | Value |
|---|---|
| Base model | Qwen/Qwen3-1.7B |
| Perceiver params | 144 M |
| LoRA rank / alpha | 8 / 8.0 |
| Target module | down_proj |
| Training steps | 8,000 |
| Final CE loss | 0.2165 |
| Exact-match accuracy (NIAH) | 80.0% |
| Training ctx length | 32–256 tokens |
Files
| File | Description |
|---|---|
hypernet.pt |
Perceiver weights + full config to rebuild the class |
inference_example.py |
Self-contained script (download and run) |
training_config.json |
Training hyperparameters |
curves.png |
Loss and accuracy curves |
Quick start
pip install transformers>=4.51.0 huggingface_hub torch
from huggingface_hub import hf_hub_download
import torch
ckpt = torch.load(hf_hub_download("farpluto/doc-to-lora-niah", "hypernet.pt"),
map_location="cuda", weights_only=False)
# See inference_example.py for the complete working script.
Qwen3 note
Chain-of-thought thinking is suppressed via /no_think appended to every query.
Residual <think> tokens are stripped from generated output.
Both techniques are harmless no-ops on non-Qwen3 models.
