QuantumCuddle
/

DistillAgent-PaperQA-3B

Text Generation

Model card Files Files and versions

DistillAgent-PaperQA-3B / README.md

QuantumCuddle's picture

Upload README.md with huggingface_hub

d9f346a verified 6 days ago

|

history blame contribute delete

2.45 kB

	---
	language:
	- en
	license: apache-2.0
	base_model:
	- Qwen/Qwen2.5-3B-Instruct
	pipeline_tag: text-generation
	tags:
	- distillation
	- agentic-rag
	- qasper
	- scientific-qa
	- react
	- lora
	datasets:
	- allenai/qasper
	---

	# DistillAgent-PaperQA-3B

	DistillAgent-PaperQA-3B is a compact agentic QA model distilled from tool-using trajectories for question answering over scientific papers (QASPER).

	It is fine-tuned from `Qwen/Qwen2.5-3B-Instruct` using LoRA/rsLoRA with constrained Thought/Action/Observation/Final Answer trajectories.

	## Highlights

	- Small model with practical agentic behavior on research-paper QA.
	- Outperforms base model in our QASPER 200-sample evaluation.

	## Model Details

	- Base model: `Qwen/Qwen2.5-3B-Instruct`
	- Training: LoRA / rsLoRA SFT
	- Domain: scientific paper QA (QASPER)
	- Inference style: constrained ReAct + section lookup

	## Evaluation Summary (QASPER, 200 samples)

	\| Model \| EM \| Mean F1 \| Mean hops \| Mean latency \|
	\|---\|---:\|---:\|---:\|---:\|
	\| DistillAgent-PaperQA-3B (SFT) \| 14.5% \| 0.2425 \| 2.36 \| 37.28s \|
	\| Base Qwen2.5-3B-Instruct \| 9.0% \| 0.1650 \| 3.00 \| 20.04s \|

	Notes:
	- Hops and latency depend on runtime harness and hardware.
	- Main quality outcome: SFT > base on EM and F1.

	## Intended Use

	- QA over scientific/technical papers with section-level lookup or retrieval.
	- Research and educational workflows for compact agentic model distillation.

	## Limitations

	- Sensitive to runtime prompt/harness format.
	- Multi-hop behavior can increase latency.
	- Should not be used as sole source for high-stakes scientific or medical decisions.

	## Usage (Transformers)

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM
	import torch

	repo_id = "QuantumCuddle/DistillAgent-PaperQA-3B"

	tokenizer = AutoTokenizer.from_pretrained(repo_id)
	model = AutoModelForCausalLM.from_pretrained(
	repo_id,
	torch_dtype=torch.float16,
	device_map="auto",
	)

	prompt = "QUESTION: What baseline method is used?\nAVAILABLE PAPER SECTIONS:\n1. Abstract\n2. Methods\n..."
	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
	out = model.generate(**inputs, max_new_tokens=256, temperature=0.0)
	print(tokenizer.decode(out[0], skip_special_tokens=True))
	```

	## Citation

	```bibtex
	@misc{distillagent_paperqa_3b_2026,
	title={DistillAgent-PaperQA-3B},
	author={QuantumCuddle},
	year={2026},
	howpublished={\url{https://huggingface.co/QuantumCuddle/DistillAgent-PaperQA-3B}}
	}
	```