doc-to-lora-niah / README.md
farpluto's picture
Add README
17b78bf verified
---
language: en
license: apache-2.0
tags: [doc-to-lora, lora, hypernetwork, context-distillation, needle-in-a-haystack, perceiver]
base_model: Qwen/Qwen3-1.7B
---
# Doc-to-LoRA — NIAH Proof of Concept
A **144 M parameter Perceiver hypernetwork** trained on [Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B).
Reads a document once, outputs LoRA deltas, and lets the base LLM answer
questions without the document ever appearing in the context window.
> Based on [Doc-to-LoRA (Charakorn et al., 2026)](https://arxiv.org/abs/2602.15902).
![curves](curves.png)
## Results
| Metric | Value |
|---|---|
| Base model | `Qwen/Qwen3-1.7B` |
| Perceiver params | 144 M |
| LoRA rank / alpha | 8 / 8.0 |
| Target module | `down_proj` |
| Training steps | 8,000 |
| Final CE loss | 0.2165 |
| Exact-match accuracy (NIAH) | **80.0%** |
| Training ctx length | 32–256 tokens |
## Files
| File | Description |
|---|---|
| `hypernet.pt` | Perceiver weights + full config to rebuild the class |
| `inference_example.py` | Self-contained script (download and run) |
| `training_config.json` | Training hyperparameters |
| `curves.png` | Loss and accuracy curves |
## Quick start
```bash
pip install transformers>=4.51.0 huggingface_hub torch
```
```python
from huggingface_hub import hf_hub_download
import torch
ckpt = torch.load(hf_hub_download("farpluto/doc-to-lora-niah", "hypernet.pt"),
map_location="cuda", weights_only=False)
# See inference_example.py for the complete working script.
```
## Qwen3 note
Chain-of-thought thinking is suppressed via `/no_think` appended to every query.
Residual `<think>` tokens are stripped from generated output.
Both techniques are harmless no-ops on non-Qwen3 models.