| --- |
| language: |
| - en |
| license: apache-2.0 |
| base_model: |
| - Qwen/Qwen2.5-3B-Instruct |
| pipeline_tag: text-generation |
| tags: |
| - distillation |
| - agentic-rag |
| - qasper |
| - scientific-qa |
| - react |
| - lora |
| datasets: |
| - allenai/qasper |
| --- |
| |
| # DistillAgent-PaperQA-3B |
|
|
| DistillAgent-PaperQA-3B is a compact agentic QA model distilled from tool-using trajectories for question answering over scientific papers (QASPER). |
|
|
| It is fine-tuned from `Qwen/Qwen2.5-3B-Instruct` using LoRA/rsLoRA with constrained Thought/Action/Observation/Final Answer trajectories. |
|
|
| ## Highlights |
|
|
| - Small model with practical agentic behavior on research-paper QA. |
| - Outperforms base model in our QASPER 200-sample evaluation. |
|
|
| ## Model Details |
|
|
| - Base model: `Qwen/Qwen2.5-3B-Instruct` |
| - Training: LoRA / rsLoRA SFT |
| - Domain: scientific paper QA (QASPER) |
| - Inference style: constrained ReAct + section lookup |
|
|
| ## Evaluation Summary (QASPER, 200 samples) |
|
|
| | Model | EM | Mean F1 | Mean hops | Mean latency | |
| |---|---:|---:|---:|---:| |
| | DistillAgent-PaperQA-3B (SFT) | 14.5% | 0.2425 | 2.36 | 37.28s | |
| | Base Qwen2.5-3B-Instruct | 9.0% | 0.1650 | 3.00 | 20.04s | |
|
|
| Notes: |
| - Hops and latency depend on runtime harness and hardware. |
| - Main quality outcome: SFT > base on EM and F1. |
|
|
| ## Intended Use |
|
|
| - QA over scientific/technical papers with section-level lookup or retrieval. |
| - Research and educational workflows for compact agentic model distillation. |
|
|
| ## Limitations |
|
|
| - Sensitive to runtime prompt/harness format. |
| - Multi-hop behavior can increase latency. |
| - Should not be used as sole source for high-stakes scientific or medical decisions. |
|
|
| ## Usage (Transformers) |
|
|
| ```python |
| from transformers import AutoTokenizer, AutoModelForCausalLM |
| import torch |
| |
| repo_id = "QuantumCuddle/DistillAgent-PaperQA-3B" |
| |
| tokenizer = AutoTokenizer.from_pretrained(repo_id) |
| model = AutoModelForCausalLM.from_pretrained( |
| repo_id, |
| torch_dtype=torch.float16, |
| device_map="auto", |
| ) |
| |
| prompt = "QUESTION: What baseline method is used?\nAVAILABLE PAPER SECTIONS:\n1. Abstract\n2. Methods\n..." |
| inputs = tokenizer(prompt, return_tensors="pt").to(model.device) |
| out = model.generate(**inputs, max_new_tokens=256, temperature=0.0) |
| print(tokenizer.decode(out[0], skip_special_tokens=True)) |
| ``` |
|
|
| ## Citation |
|
|
| ```bibtex |
| @misc{distillagent_paperqa_3b_2026, |
| title={DistillAgent-PaperQA-3B}, |
| author={QuantumCuddle}, |
| year={2026}, |
| howpublished={\url{https://huggingface.co/QuantumCuddle/DistillAgent-PaperQA-3B}} |
| } |
| ``` |
|
|