farpluto commited on
Commit
17b78bf
·
verified ·
1 Parent(s): 0134c04

Add README

Browse files
Files changed (1) hide show
  1. README.md +59 -0
README.md ADDED
@@ -0,0 +1,59 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: apache-2.0
4
+ tags: [doc-to-lora, lora, hypernetwork, context-distillation, needle-in-a-haystack, perceiver]
5
+ base_model: Qwen/Qwen3-1.7B
6
+ ---
7
+
8
+ # Doc-to-LoRA — NIAH Proof of Concept
9
+
10
+ A **144 M parameter Perceiver hypernetwork** trained on [Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B).
11
+ Reads a document once, outputs LoRA deltas, and lets the base LLM answer
12
+ questions without the document ever appearing in the context window.
13
+
14
+ > Based on [Doc-to-LoRA (Charakorn et al., 2026)](https://arxiv.org/abs/2602.15902).
15
+
16
+ ![curves](curves.png)
17
+
18
+ ## Results
19
+
20
+ | Metric | Value |
21
+ |---|---|
22
+ | Base model | `Qwen/Qwen3-1.7B` |
23
+ | Perceiver params | 144 M |
24
+ | LoRA rank / alpha | 8 / 8.0 |
25
+ | Target module | `down_proj` |
26
+ | Training steps | 8,000 |
27
+ | Final CE loss | 0.2165 |
28
+ | Exact-match accuracy (NIAH) | **80.0%** |
29
+ | Training ctx length | 32–256 tokens |
30
+
31
+ ## Files
32
+
33
+ | File | Description |
34
+ |---|---|
35
+ | `hypernet.pt` | Perceiver weights + full config to rebuild the class |
36
+ | `inference_example.py` | Self-contained script (download and run) |
37
+ | `training_config.json` | Training hyperparameters |
38
+ | `curves.png` | Loss and accuracy curves |
39
+
40
+ ## Quick start
41
+
42
+ ```bash
43
+ pip install transformers>=4.51.0 huggingface_hub torch
44
+ ```
45
+
46
+ ```python
47
+ from huggingface_hub import hf_hub_download
48
+ import torch
49
+
50
+ ckpt = torch.load(hf_hub_download("farpluto/doc-to-lora-niah", "hypernet.pt"),
51
+ map_location="cuda", weights_only=False)
52
+ # See inference_example.py for the complete working script.
53
+ ```
54
+
55
+ ## Qwen3 note
56
+
57
+ Chain-of-thought thinking is suppressed via `/no_think` appended to every query.
58
+ Residual `<think>` tokens are stripped from generated output.
59
+ Both techniques are harmless no-ops on non-Qwen3 models.