Add README
Browse files
README.md
ADDED
|
@@ -0,0 +1,59 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language: en
|
| 3 |
+
license: apache-2.0
|
| 4 |
+
tags: [doc-to-lora, lora, hypernetwork, context-distillation, needle-in-a-haystack, perceiver]
|
| 5 |
+
base_model: Qwen/Qwen3-1.7B
|
| 6 |
+
---
|
| 7 |
+
|
| 8 |
+
# Doc-to-LoRA — NIAH Proof of Concept
|
| 9 |
+
|
| 10 |
+
A **144 M parameter Perceiver hypernetwork** trained on [Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B).
|
| 11 |
+
Reads a document once, outputs LoRA deltas, and lets the base LLM answer
|
| 12 |
+
questions without the document ever appearing in the context window.
|
| 13 |
+
|
| 14 |
+
> Based on [Doc-to-LoRA (Charakorn et al., 2026)](https://arxiv.org/abs/2602.15902).
|
| 15 |
+
|
| 16 |
+

|
| 17 |
+
|
| 18 |
+
## Results
|
| 19 |
+
|
| 20 |
+
| Metric | Value |
|
| 21 |
+
|---|---|
|
| 22 |
+
| Base model | `Qwen/Qwen3-1.7B` |
|
| 23 |
+
| Perceiver params | 144 M |
|
| 24 |
+
| LoRA rank / alpha | 8 / 8.0 |
|
| 25 |
+
| Target module | `down_proj` |
|
| 26 |
+
| Training steps | 8,000 |
|
| 27 |
+
| Final CE loss | 0.2165 |
|
| 28 |
+
| Exact-match accuracy (NIAH) | **80.0%** |
|
| 29 |
+
| Training ctx length | 32–256 tokens |
|
| 30 |
+
|
| 31 |
+
## Files
|
| 32 |
+
|
| 33 |
+
| File | Description |
|
| 34 |
+
|---|---|
|
| 35 |
+
| `hypernet.pt` | Perceiver weights + full config to rebuild the class |
|
| 36 |
+
| `inference_example.py` | Self-contained script (download and run) |
|
| 37 |
+
| `training_config.json` | Training hyperparameters |
|
| 38 |
+
| `curves.png` | Loss and accuracy curves |
|
| 39 |
+
|
| 40 |
+
## Quick start
|
| 41 |
+
|
| 42 |
+
```bash
|
| 43 |
+
pip install transformers>=4.51.0 huggingface_hub torch
|
| 44 |
+
```
|
| 45 |
+
|
| 46 |
+
```python
|
| 47 |
+
from huggingface_hub import hf_hub_download
|
| 48 |
+
import torch
|
| 49 |
+
|
| 50 |
+
ckpt = torch.load(hf_hub_download("farpluto/doc-to-lora-niah", "hypernet.pt"),
|
| 51 |
+
map_location="cuda", weights_only=False)
|
| 52 |
+
# See inference_example.py for the complete working script.
|
| 53 |
+
```
|
| 54 |
+
|
| 55 |
+
## Qwen3 note
|
| 56 |
+
|
| 57 |
+
Chain-of-thought thinking is suppressed via `/no_think` appended to every query.
|
| 58 |
+
Residual `<think>` tokens are stripped from generated output.
|
| 59 |
+
Both techniques are harmless no-ops on non-Qwen3 models.
|