--- language: en license: apache-2.0 tags: [doc-to-lora, lora, hypernetwork, context-distillation, needle-in-a-haystack, perceiver] base_model: Qwen/Qwen3-1.7B --- # Doc-to-LoRA — NIAH Proof of Concept A **144 M parameter Perceiver hypernetwork** trained on [Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B). Reads a document once, outputs LoRA deltas, and lets the base LLM answer questions without the document ever appearing in the context window. > Based on [Doc-to-LoRA (Charakorn et al., 2026)](https://arxiv.org/abs/2602.15902). ![curves](curves.png) ## Results | Metric | Value | |---|---| | Base model | `Qwen/Qwen3-1.7B` | | Perceiver params | 144 M | | LoRA rank / alpha | 8 / 8.0 | | Target module | `down_proj` | | Training steps | 8,000 | | Final CE loss | 0.2165 | | Exact-match accuracy (NIAH) | **80.0%** | | Training ctx length | 32–256 tokens | ## Files | File | Description | |---|---| | `hypernet.pt` | Perceiver weights + full config to rebuild the class | | `inference_example.py` | Self-contained script (download and run) | | `training_config.json` | Training hyperparameters | | `curves.png` | Loss and accuracy curves | ## Quick start ```bash pip install transformers>=4.51.0 huggingface_hub torch ``` ```python from huggingface_hub import hf_hub_download import torch ckpt = torch.load(hf_hub_download("farpluto/doc-to-lora-niah", "hypernet.pt"), map_location="cuda", weights_only=False) # See inference_example.py for the complete working script. ``` ## Qwen3 note Chain-of-thought thinking is suppressed via `/no_think` appended to every query. Residual `` tokens are stripped from generated output. Both techniques are harmless no-ops on non-Qwen3 models.