File size: 2,449 Bytes
d9f346a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
---
language:
- en
license: apache-2.0
base_model:
- Qwen/Qwen2.5-3B-Instruct
pipeline_tag: text-generation
tags:
- distillation
- agentic-rag
- qasper
- scientific-qa
- react
- lora
datasets:
- allenai/qasper
---

# DistillAgent-PaperQA-3B

DistillAgent-PaperQA-3B is a compact agentic QA model distilled from tool-using trajectories for question answering over scientific papers (QASPER).

It is fine-tuned from `Qwen/Qwen2.5-3B-Instruct` using LoRA/rsLoRA with constrained Thought/Action/Observation/Final Answer trajectories.

## Highlights

- Small model with practical agentic behavior on research-paper QA.
- Outperforms base model in our QASPER 200-sample evaluation.

## Model Details

- Base model: `Qwen/Qwen2.5-3B-Instruct`
- Training: LoRA / rsLoRA SFT
- Domain: scientific paper QA (QASPER)
- Inference style: constrained ReAct + section lookup

## Evaluation Summary (QASPER, 200 samples)

| Model | EM | Mean F1 | Mean hops | Mean latency |
|---|---:|---:|---:|---:|
| DistillAgent-PaperQA-3B (SFT) | 14.5% | 0.2425 | 2.36 | 37.28s |
| Base Qwen2.5-3B-Instruct | 9.0% | 0.1650 | 3.00 | 20.04s |

Notes:
- Hops and latency depend on runtime harness and hardware.
- Main quality outcome: SFT > base on EM and F1.

## Intended Use

- QA over scientific/technical papers with section-level lookup or retrieval.
- Research and educational workflows for compact agentic model distillation.

## Limitations

- Sensitive to runtime prompt/harness format.
- Multi-hop behavior can increase latency.
- Should not be used as sole source for high-stakes scientific or medical decisions.

## Usage (Transformers)

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

repo_id = "QuantumCuddle/DistillAgent-PaperQA-3B"

tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForCausalLM.from_pretrained(
    repo_id,
    torch_dtype=torch.float16,
    device_map="auto",
)

prompt = "QUESTION: What baseline method is used?\nAVAILABLE PAPER SECTIONS:\n1. Abstract\n2. Methods\n..."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=256, temperature=0.0)
print(tokenizer.decode(out[0], skip_special_tokens=True))
```

## Citation

```bibtex
@misc{distillagent_paperqa_3b_2026,
  title={DistillAgent-PaperQA-3B},
  author={QuantumCuddle},
  year={2026},
  howpublished={\url{https://huggingface.co/QuantumCuddle/DistillAgent-PaperQA-3B}}
}
```