b1ade
Collection
SLMs for RAG • 8 items • Updated
Part of the B1ade family of efficient RAG models. B1ade-0.5B is a full fine-tuning of Qwen2.5-0.5B-Instruct using Group Relative Policy Optimization (GRPO) on a curated subset of the simpleCoT dataset.
B1ade-0.5B is trained as a lightweight generation component for retrieval-augmented generation (RAG) pipelines, designed to work alongside B1ade-embed for end-to-end RAG. Despite its small size, GRPO training with a ROUGE-L reward improves factual QA performance over the base model across multiple benchmarks.
| Parameter | Value |
|---|---|
| Base model | Qwen/Qwen2.5-0.5B-Instruct |
| Training method | GRPO (full fine-tuning) |
| Reward function | ROUGE-L (answer similarity) |
| Dataset | w601sxs/simplecot_subset_50k (50k examples) |
| Epochs | 1 |
| Learning rate | 2e-6 (cosine decay) |
| Batch size | 16 (grad_accum=2, effective=32) |
| Num generations | 2 |
| Hardware | Apple M4 Pro (48GB unified memory) |
| Training time | ~67 hours |
Results on standard QA benchmarks (Exact Match):
| Benchmark | B1ade-0.5B (this model) | Qwen-0.5B base | Llama-1B base |
|---|---|---|---|
| PopQA (14,267) | 8.8% | 7.9% | 9.6% |
| TriviaQA (17,944) | 14.1% | 13.4% | 31.0% |
| NQ Open (3,610) | 1.3% | 1.9% | 4.8% |
| ARC-Challenge (1,172) | 32.8% | 33.4% | 37.5% |
| PubMedQA (1,000) | 10.3% | 7.4% | 31.4% |
| FEVER (9,583) | 7.4% | 20.5% | 14.3% |
| WikiMultiHopQA (7,405) | 7.1% | 6.0% | 9.6% |
GRPO training improves over base on PopQA (+11%), TriviaQA (+35%), PubMedQA (+39%).
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model = AutoModelForCausalLM.from_pretrained("w601sxs/b1ade_0.5B", dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained("w601sxs/b1ade_0.5B")
context = "Your retrieved passage here..."
question = "Your question here?"
prompt = f"Context: {context}\n\nQuestion: {question}\nAnswer:"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=200, eos_token_id=[151643, 151645])
answer = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(answer)
| Model | Parameters | Role |
|---|---|---|
| b1ade-embed | 335M | Dense retrieval |
| b1ade_0.5B | 494M | Generation (this model) |
| b1ade_1B | 1B | Generation (Llama-1B GRPO, in progress) |
@article{b1ade2026,
title={Applying Occam's Razor to RAG: B1ade 335M Embedding and 1B Small Language Model},
author={Anonymous},
year={2026}
}