---
language: en
license: apache-2.0
tags: [doc-to-lora, lora, hypernetwork, context-distillation, needle-in-a-haystack, perceiver]
base_model: Qwen/Qwen3-1.7B
---

# Doc-to-LoRA — NIAH Proof of Concept

A **144 M parameter Perceiver hypernetwork** trained on [Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B).
Reads a document once, outputs LoRA deltas, and lets the base LLM answer
questions without the document ever appearing in the context window.

> Based on [Doc-to-LoRA (Charakorn et al., 2026)](https://arxiv.org/abs/2602.15902).

![curves](curves.png)

## Results

| Metric | Value |
|---|---|
| Base model | `Qwen/Qwen3-1.7B` |
| Perceiver params | 144 M |
| LoRA rank / alpha | 8 / 8.0 |
| Target module | `down_proj` |
| Training steps | 8,000 |
| Final CE loss | 0.2165 |
| Exact-match accuracy (NIAH) | **80.0%** |
| Training ctx length | 32–256 tokens |

## Files

| File | Description |
|---|---|
| `hypernet.pt` | Perceiver weights + full config to rebuild the class |
| `inference_example.py` | Self-contained script (download and run) |
| `training_config.json` | Training hyperparameters |
| `curves.png` | Loss and accuracy curves |

## Quick start

```bash
pip install transformers>=4.51.0 huggingface_hub torch
```

```python
from huggingface_hub import hf_hub_download
import torch

ckpt = torch.load(hf_hub_download("farpluto/doc-to-lora-niah", "hypernet.pt"),
                   map_location="cuda", weights_only=False)
# See inference_example.py for the complete working script.
```

## Qwen3 note

Chain-of-thought thinking is suppressed via `/no_think` appended to every query.
Residual `<think>` tokens are stripped from generated output.
Both techniques are harmless no-ops on non-Qwen3 models.