farpluto
/

doc-to-lora-niah

context-distillation

needle-in-a-haystack

Model card Files Files and versions

farpluto commited on 28 days ago

Commit

17b78bf

·

verified ·

1 Parent(s): 0134c04

Add README

Files changed (1) hide show

README.md +59 -0

README.md ADDED Viewed

	@@ -0,0 +1,59 @@

+---
+language: en
+license: apache-2.0
+tags: [doc-to-lora, lora, hypernetwork, context-distillation, needle-in-a-haystack, perceiver]
+base_model: Qwen/Qwen3-1.7B
+---
+# Doc-to-LoRA — NIAH Proof of Concept
+A **144 M parameter Perceiver hypernetwork** trained on [Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B).
+Reads a document once, outputs LoRA deltas, and lets the base LLM answer
+questions without the document ever appearing in the context window.
+> Based on [Doc-to-LoRA (Charakorn et al., 2026)](https://arxiv.org/abs/2602.15902).
+![curves](curves.png)
+## Results
+| Metric | Value |
+|---|---|
+| Base model | `Qwen/Qwen3-1.7B` |
+| Perceiver params | 144 M |
+| LoRA rank / alpha | 8 / 8.0 |
+| Target module | `down_proj` |
+| Training steps | 8,000 |
+| Final CE loss | 0.2165 |
+| Exact-match accuracy (NIAH) | **80.0%** |
+| Training ctx length | 32–256 tokens |
+## Files
+| File | Description |
+|---|---|
+| `hypernet.pt` | Perceiver weights + full config to rebuild the class |
+| `inference_example.py` | Self-contained script (download and run) |
+| `training_config.json` | Training hyperparameters |
+| `curves.png` | Loss and accuracy curves |
+## Quick start
+```bash
+pip install transformers>=4.51.0 huggingface_hub torch
+```
+```python
+from huggingface_hub import hf_hub_download
+import torch
+ckpt = torch.load(hf_hub_download("farpluto/doc-to-lora-niah", "hypernet.pt"),
+                   map_location="cuda", weights_only=False)
+# See inference_example.py for the complete working script.
+```
+## Qwen3 note
+Chain-of-thought thinking is suppressed via `/no_think` appended to every query.
+Residual `<think>` tokens are stripped from generated output.
+Both techniques are harmless no-ops on non-Qwen3 models.