DistillAgent-PaperQA-3B

DistillAgent-PaperQA-3B is a compact agentic QA model distilled from tool-using trajectories for question answering over scientific papers (QASPER).

It is fine-tuned from Qwen/Qwen2.5-3B-Instruct using LoRA/rsLoRA with constrained Thought/Action/Observation/Final Answer trajectories.

Highlights

Small model with practical agentic behavior on research-paper QA.
Outperforms base model in our QASPER 200-sample evaluation.

Model Details

Base model: Qwen/Qwen2.5-3B-Instruct
Training: LoRA / rsLoRA SFT
Domain: scientific paper QA (QASPER)
Inference style: constrained ReAct + section lookup

Evaluation Summary (QASPER, 200 samples)

Model	EM	Mean F1	Mean hops	Mean latency
DistillAgent-PaperQA-3B (SFT)	14.5%	0.2425	2.36	37.28s
Base Qwen2.5-3B-Instruct	9.0%	0.1650	3.00	20.04s

Notes:

Hops and latency depend on runtime harness and hardware.
Main quality outcome: SFT > base on EM and F1.

Intended Use

QA over scientific/technical papers with section-level lookup or retrieval.
Research and educational workflows for compact agentic model distillation.

Limitations

Sensitive to runtime prompt/harness format.
Multi-hop behavior can increase latency.
Should not be used as sole source for high-stakes scientific or medical decisions.

Usage (Transformers)

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

repo_id = "QuantumCuddle/DistillAgent-PaperQA-3B"

tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForCausalLM.from_pretrained(
    repo_id,
    torch_dtype=torch.float16,
    device_map="auto",
)

prompt = "QUESTION: What baseline method is used?\nAVAILABLE PAPER SECTIONS:\n1. Abstract\n2. Methods\n..."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=256, temperature=0.0)
print(tokenizer.decode(out[0], skip_special_tokens=True))

Citation

@misc{distillagent_paperqa_3b_2026,
  title={DistillAgent-PaperQA-3B},
  author={QuantumCuddle},
  year={2026},
  howpublished={\url{https://huggingface.co/QuantumCuddle/DistillAgent-PaperQA-3B}}
}

Downloads last month: 16

Safetensors

Model size

3B params

Tensor type

F32

Model tree for QuantumCuddle/DistillAgent-PaperQA-3B

Base model

Qwen/Qwen2.5-3B

Finetuned

Qwen/Qwen2.5-3B-Instruct

Adapter

(1267)

this model

Finetunes

1 model

QuantumCuddle
/

DistillAgent-PaperQA-3B