B1ade-0.5B

Part of the B1ade family of efficient RAG models. B1ade-0.5B is a full fine-tuning of Qwen2.5-0.5B-Instruct using Group Relative Policy Optimization (GRPO) on a curated subset of the simpleCoT dataset.

Model Description

B1ade-0.5B is trained as a lightweight generation component for retrieval-augmented generation (RAG) pipelines, designed to work alongside B1ade-embed for end-to-end RAG. Despite its small size, GRPO training with a ROUGE-L reward improves factual QA performance over the base model across multiple benchmarks.

Training Details

Parameter	Value
Base model	Qwen/Qwen2.5-0.5B-Instruct
Training method	GRPO (full fine-tuning)
Reward function	ROUGE-L (answer similarity)
Dataset	w601sxs/simplecot_subset_50k (50k examples)
Epochs	1
Learning rate	2e-6 (cosine decay)
Batch size	16 (grad_accum=2, effective=32)
Num generations	2
Hardware	Apple M4 Pro (48GB unified memory)
Training time	~67 hours

Benchmark Results

Results on standard QA benchmarks (Exact Match):

Benchmark	B1ade-0.5B (this model)	Qwen-0.5B base	Llama-1B base
PopQA (14,267)	8.8%	7.9%	9.6%
TriviaQA (17,944)	14.1%	13.4%	31.0%
NQ Open (3,610)	1.3%	1.9%	4.8%
ARC-Challenge (1,172)	32.8%	33.4%	37.5%
PubMedQA (1,000)	10.3%	7.4%	31.4%
FEVER (9,583)	7.4%	20.5%	14.3%
WikiMultiHopQA (7,405)	7.1%	6.0%	9.6%

GRPO training improves over base on PopQA (+11%), TriviaQA (+35%), PubMedQA (+39%).

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained("w601sxs/b1ade_0.5B", dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained("w601sxs/b1ade_0.5B")

context = "Your retrieved passage here..."
question = "Your question here?"
prompt = f"Context: {context}\n\nQuestion: {question}\nAnswer:"

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=200, eos_token_id=[151643, 151645])
answer = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(answer)

B1ade Family

Model	Parameters	Role
b1ade-embed	335M	Dense retrieval
b1ade_0.5B	494M	Generation (this model)
b1ade_1B	1B	Generation (Llama-1B GRPO, in progress)

Citation

@article{b1ade2026,
  title={Applying Occam's Razor to RAG: B1ade 335M Embedding and 1B Small Language Model},
  author={Anonymous},
  year={2026}
}

Downloads last month: 25

Safetensors

Model size

0.5B params

Tensor type

BF16

Video Preview

Reinforcement Learning

Model tree for w601sxs/b1ade_0.5B

Base model

Qwen/Qwen2.5-0.5B

Finetuned

Qwen/Qwen2.5-0.5B-Instruct

Finetuned

(667)

this model

Dataset used to train w601sxs/b1ade_0.5B

Collection including w601sxs/b1ade_0.5B

b1ade

Collection

SLMs for RAG • 8 items • Updated 11 days ago