farpluto
/

doc-to-lora-niah

context-distillation

needle-in-a-haystack

Model card Files Files and versions

doc-to-lora-niah / README.md

farpluto's picture

Add README

17b78bf verified 13 days ago

|

history blame contribute delete

1.74 kB

	---
	language: en
	license: apache-2.0
	tags: [doc-to-lora, lora, hypernetwork, context-distillation, needle-in-a-haystack, perceiver]
	base_model: Qwen/Qwen3-1.7B
	---

	# Doc-to-LoRA — NIAH Proof of Concept

	A 144 M parameter Perceiver hypernetwork trained on [Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B).
	Reads a document once, outputs LoRA deltas, and lets the base LLM answer
	questions without the document ever appearing in the context window.

	> Based on [Doc-to-LoRA (Charakorn et al., 2026)](https://arxiv.org/abs/2602.15902).

	![curves](curves.png)

	## Results

	\| Metric \| Value \|
	\|---\|---\|
	\| Base model \| `Qwen/Qwen3-1.7B` \|
	\| Perceiver params \| 144 M \|
	\| LoRA rank / alpha \| 8 / 8.0 \|
	\| Target module \| `down_proj` \|
	\| Training steps \| 8,000 \|
	\| Final CE loss \| 0.2165 \|
	\| Exact-match accuracy (NIAH) \| 80.0% \|
	\| Training ctx length \| 32–256 tokens \|

	## Files

	\| File \| Description \|
	\|---\|---\|
	\| `hypernet.pt` \| Perceiver weights + full config to rebuild the class \|
	\| `inference_example.py` \| Self-contained script (download and run) \|
	\| `training_config.json` \| Training hyperparameters \|
	\| `curves.png` \| Loss and accuracy curves \|

	## Quick start

	```bash
	pip install transformers>=4.51.0 huggingface_hub torch
	```

	```python
	from huggingface_hub import hf_hub_download
	import torch

	ckpt = torch.load(hf_hub_download("farpluto/doc-to-lora-niah", "hypernet.pt"),
	map_location="cuda", weights_only=False)
	# See inference_example.py for the complete working script.
	```

	## Qwen3 note

	Chain-of-thought thinking is suppressed via `/no_think` appended to every query.
	Residual `<think>` tokens are stripped from generated output.
	Both techniques are harmless no-ops on non-Qwen3 models.