QuantumCuddle's picture
Upload README.md with huggingface_hub
d9f346a verified
metadata
language:
  - en
license: apache-2.0
base_model:
  - Qwen/Qwen2.5-3B-Instruct
pipeline_tag: text-generation
tags:
  - distillation
  - agentic-rag
  - qasper
  - scientific-qa
  - react
  - lora
datasets:
  - allenai/qasper

DistillAgent-PaperQA-3B

DistillAgent-PaperQA-3B is a compact agentic QA model distilled from tool-using trajectories for question answering over scientific papers (QASPER).

It is fine-tuned from Qwen/Qwen2.5-3B-Instruct using LoRA/rsLoRA with constrained Thought/Action/Observation/Final Answer trajectories.

Highlights

  • Small model with practical agentic behavior on research-paper QA.
  • Outperforms base model in our QASPER 200-sample evaluation.

Model Details

  • Base model: Qwen/Qwen2.5-3B-Instruct
  • Training: LoRA / rsLoRA SFT
  • Domain: scientific paper QA (QASPER)
  • Inference style: constrained ReAct + section lookup

Evaluation Summary (QASPER, 200 samples)

Model EM Mean F1 Mean hops Mean latency
DistillAgent-PaperQA-3B (SFT) 14.5% 0.2425 2.36 37.28s
Base Qwen2.5-3B-Instruct 9.0% 0.1650 3.00 20.04s

Notes:

  • Hops and latency depend on runtime harness and hardware.
  • Main quality outcome: SFT > base on EM and F1.

Intended Use

  • QA over scientific/technical papers with section-level lookup or retrieval.
  • Research and educational workflows for compact agentic model distillation.

Limitations

  • Sensitive to runtime prompt/harness format.
  • Multi-hop behavior can increase latency.
  • Should not be used as sole source for high-stakes scientific or medical decisions.

Usage (Transformers)

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

repo_id = "QuantumCuddle/DistillAgent-PaperQA-3B"

tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForCausalLM.from_pretrained(
    repo_id,
    torch_dtype=torch.float16,
    device_map="auto",
)

prompt = "QUESTION: What baseline method is used?\nAVAILABLE PAPER SECTIONS:\n1. Abstract\n2. Methods\n..."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=256, temperature=0.0)
print(tokenizer.decode(out[0], skip_special_tokens=True))

Citation

@misc{distillagent_paperqa_3b_2026,
  title={DistillAgent-PaperQA-3B},
  author={QuantumCuddle},
  year={2026},
  howpublished={\url{https://huggingface.co/QuantumCuddle/DistillAgent-PaperQA-3B}}
}