B1ade-0.5B

Part of the B1ade family of efficient RAG models. B1ade-0.5B is a full fine-tuning of Qwen2.5-0.5B-Instruct using Group Relative Policy Optimization (GRPO) on a curated subset of the simpleCoT dataset.

Model Description

B1ade-0.5B is trained as a lightweight generation component for retrieval-augmented generation (RAG) pipelines, designed to work alongside B1ade-embed for end-to-end RAG. Despite its small size, GRPO training with a ROUGE-L reward improves factual QA performance over the base model across multiple benchmarks.

Training Details

Parameter Value
Base model Qwen/Qwen2.5-0.5B-Instruct
Training method GRPO (full fine-tuning)
Reward function ROUGE-L (answer similarity)
Dataset w601sxs/simplecot_subset_50k (50k examples)
Epochs 1
Learning rate 2e-6 (cosine decay)
Batch size 16 (grad_accum=2, effective=32)
Num generations 2
Hardware Apple M4 Pro (48GB unified memory)
Training time ~67 hours

Benchmark Results

Results on standard QA benchmarks (Exact Match):

Benchmark B1ade-0.5B (this model) Qwen-0.5B base Llama-1B base
PopQA (14,267) 8.8% 7.9% 9.6%
TriviaQA (17,944) 14.1% 13.4% 31.0%
NQ Open (3,610) 1.3% 1.9% 4.8%
ARC-Challenge (1,172) 32.8% 33.4% 37.5%
PubMedQA (1,000) 10.3% 7.4% 31.4%
FEVER (9,583) 7.4% 20.5% 14.3%
WikiMultiHopQA (7,405) 7.1% 6.0% 9.6%

GRPO training improves over base on PopQA (+11%), TriviaQA (+35%), PubMedQA (+39%).

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained("w601sxs/b1ade_0.5B", dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained("w601sxs/b1ade_0.5B")

context = "Your retrieved passage here..."
question = "Your question here?"
prompt = f"Context: {context}\n\nQuestion: {question}\nAnswer:"

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=200, eos_token_id=[151643, 151645])
answer = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(answer)

B1ade Family

Model Parameters Role
b1ade-embed 335M Dense retrieval
b1ade_0.5B 494M Generation (this model)
b1ade_1B 1B Generation (Llama-1B GRPO, in progress)

Citation

@article{b1ade2026,
  title={Applying Occam's Razor to RAG: B1ade 335M Embedding and 1B Small Language Model},
  author={Anonymous},
  year={2026}
}
Downloads last month
25
Safetensors
Model size
0.5B params
Tensor type
BF16
·
Video Preview
loading

Model tree for w601sxs/b1ade_0.5B

Finetuned
(667)
this model

Dataset used to train w601sxs/b1ade_0.5B

Collection including w601sxs/b1ade_0.5B