Luganda Translation Reward Model

A 1B parameter Gemma 3 reward model that scores English→Luganda translation quality. Outputs a scalar reward — higher = better translation.

2026-04-09 update: This repo was previously uploaded as a TRL AutoModelForCausalLMWithValueHead PEFT checkpoint, which required manual LoRA merging + value-head wiring before it could be used. It has now been replaced with a merged Gemma3ForSequenceClassification so users can load it with one line. If you have an old checkout, run git pull to get the new format.

Quick start

from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

tok = AutoTokenizer.from_pretrained("CraneAILabs/luganda-reward-model")
model = AutoModelForSequenceClassification.from_pretrained(
    "CraneAILabs/luganda-reward-model",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
model.eval()

def score(prompt: str, response: str) -> float:
    """Higher score = better Luganda translation."""
    text = f"{prompt}\n\n{response}"
    inputs = tok(text, return_tensors="pt", truncation=True, max_length=512).to(model.device)
    with torch.no_grad():
        out = model(**inputs)
    return out.logits[0].item()

# Examples
print(score("Translate to Luganda: The children are playing.", "Abaana bazannya."))      # +8.0  ← good
print(score("Translate to Luganda: I love my mother.",        "Njagala maama wange."))   # +5.5  ← good
print(score("Translate to Luganda: I love my mother.",        "Mama love I."))           # +1.7  ← garbled
print(score("Translate to Luganda: I love my mother.",        "Sssss xxxxx zzzzz."))     # +1.1  ← gibberish

What changed in the 2026-04-09 update

	Old format (removed)	New format (current)
Class	TRL `AutoModelForCausalLMWithValueHead` + PEFT LoRA wrapper	`Gemma3ForSequenceClassification`
Loading	Manual PEFT load + LoRA merge + custom value_head wrapper	One line: `AutoModelForSequenceClassification.from_pretrained(...)`
State dict prefix	`base_model.base_model.model.model.layers.{N}.{module}.{base_layer\|lora_A.default\|lora_B.default}.weight`	Standard `model.layers.{N}.{module}.weight`
Score head	Loose `value_head.weight` tensor (shape `[1, 1152]`)	Wired in as `model.score`
Dtype	float32 weights	bfloat16 weights (half the size, same precision at inference)
File	`pytorch_model.bin` (4.0 GB)	`model.safetensors` (2.0 GB)

Score interpretation

After running on a small held-out set:

Reward range	Interpretation
> 5.0	Coherent, fluent Luganda translation
2.0 – 5.0	Luganda-shaped but possibly wrong meaning or partially correct
< 2.0	Garbled, gibberish, or grossly incorrect

Known weakness: untranslated English text scores moderately high (~+6), because the training data did not explicitly penalize untranslated input. Don't use this model alone to detect "did the LLM actually translate?" — pair with a language detector.

Training details


Base model	`CraneAILabs/ganda-gemma-1b` (Luganda CPT of `google/gemma-3-1b-it`)
Dataset	`CraneAILabs/pedagogy-luganda-reviewed` (299 reviewed translation rows → 1,490 rated examples)
Eval set	`Sunbird/salt` (200 examples × 5 quality levels via rule-based degradation)
Method	LoRA SFT regression (rank=32, α=64), then merged into base
Loss	Weighted MSE on 1–5 ratings
Hyperparameters	LR 2e-5, bs 4 (effective 8), 5 epochs, 10% warmup

For the full training writeup including a v1 failure analysis, see TRAINING_REPORT.md in the original repo.

Citation

@misc{craneailabs2026rewardmodel,
  title={Luganda Translation Reward Model},
  author={Bakunga, Bronson and Mubiru, Kato Steven and Tukamushaba, Catherine},
  year={2026},
  publisher={Crane AI Labs},
  url={https://huggingface.co/CraneAILabs/luganda-reward-model}
}