Replace PEFT+ValueHead format with merged Gemma3ForSequenceClassification (one-line load)

Browse files

Files changed (5) hide show

README.md +74 -75
config.json +25 -10
model.safetensors +3 -0
tokenizer.json +2 -2
tokenizer_config.json +0 -0

README.md CHANGED Viewed

@@ -4,106 +4,105 @@ language:
   - lug
   - en
 tags:
-  - reward-model
   - luganda
-  - translation
   - rlhf
-  - pairwise-ranking
-  - low-resource
-  - african-languages
 base_model: CraneAILabs/ganda-gemma-1b
 pipeline_tag: text-classification
 ---
-# Luganda Translation Reward Model V2
-A pairwise margin-ranking reward model for evaluating English-to-Luganda translation quality. Trained on the Ganda Gemma 1B base using LoRA (rank=32). Designed as the RLHF reward signal for improving Luganda translation models.
-## Model Description
-- **Base model:** `CraneAILabs/ganda-gemma-1b`
-- **Method:** Pairwise margin ranking (Llama 2 style)
-- **Loss:** `-log(sigmoid(r_chosen - r_rejected - margin))`
-- **Parameters:** ~2.7% trainable via LoRA (rank=32, alpha=64)
-- **Training:** 1 epoch, LR=1e-5, dropout=0.2, weight_decay=0.1
-- **Best checkpoint:** Step 900 (eval_loss=0.6787)
-## Training Data
-**10,856 pairwise comparisons** constructed from 1,490 rated translation examples:
-- 299 English sentences × 5 translation variants each = 1,490 rated examples
-- Ratings from professional Luganda translators (1-5 scale)
-- 856 additional reviewer correction pairs
-- Cartesian cross-bucket pairing with gap ≥ 2 quality levels
-- Margins: 0.50 (gap=2), 0.75 (gap=3), 1.00 (gap=4)
-- Train/eval split: 9,770 / 1,086
-### Version History
-**V1 (Failed):** Weighted MSE regression. 856 reviewer corrections were all labeled 5.0, inflating that class from 5.4% to ~40%. SALT quality-level separation *decreased* during training (0.51 → 0.30). Abandoned.
-**V2 (Current):** Pairwise margin ranking. Fixed the data distribution issue by using cross-bucket pairing instead of regression targets.
-## Evaluation Results
-Evaluated on 200 SALT translation examples with tanh normalization (`tanh(raw_score / 4.0) * 5.0`):
-| Metric | Value | Target | Status |
-|--------|-------|--------|--------|
-| SALT pairwise accuracy | **87.4%** | >70% | ✓ Passed |
-| Held-out pairwise accuracy | **98.0%** | >80% | ✓ Passed |
-| Spearman correlation | **0.80** (p=5.5e-227) | >0.70 | ✓ Passed |
-| Gold−Wrong separation | **3.86** | >1.50 | ✓ Passed |
-### Quality Level Scores (SALT, tanh normalized)
-| Quality Level | Mean Score |
-|---------------|-----------|
-| Gold (perfect) | 4.94 |
-| Minor errors | 4.89 |
-| Moderate errors | 3.83 |
-| Major errors | 1.22 |
-| Wrong language | 1.08 |
-## Intended Use
-- RLHF reward signal for Luganda translation model training (rejection sampling, GRPO, PPO)
-- Automatic translation quality scoring (RL rejection threshold: score < 3.0)
-- Research on reward modeling for low-resource African languages
-## Limitations
-- Trained on Luganda only — does not generalize to other Bantu languages without fine-tuning
-- Rating data from a limited pool of translators — may not capture all dialect preferences
-- Correction pair accuracy varies: 99.5% on rating pairs but only 61.1% on some correction categories
-- The model evaluates translation quality, not fluency or cultural appropriateness separately
-## How to Use
-```python
-from transformers import AutoModelForSequenceClassification, AutoTokenizer
-from peft import PeftModel
-base = AutoModelForSequenceClassification.from_pretrained("CraneAILabs/ganda-gemma-1b")
-model = PeftModel.from_pretrained(base, "CraneAILabs/luganda-reward-model")
-tokenizer = AutoTokenizer.from_pretrained("CraneAILabs/luganda-reward-model")
-# Score a translation
-text = "<start_of_turn>user\nTranslate to Luganda: Hello\n<end_of_turn>\n<start_of_turn>model\nOli otya<end_of_turn>"
-inputs = tokenizer(text, return_tensors="pt")
-score = model(**inputs).logits.item()
-# Apply tanh normalization: tanh(score / 4.0) * 5.0
-```
 ## Citation
 ```bibtex
-@misc{craneailabs2026reward,
-  title={Pairwise Margin-Ranking Reward Model for Luganda Translation Quality},
   author={Bakunga, Bronson and Mubiru, Kato Steven and Tukamushaba, Catherine},
   year={2026},
   publisher={Crane AI Labs},
-  url={https://huggingface.co/CraneAILabs/luganda-reward-model}
 }
 ```

   - lug
   - en
 tags:
   - luganda
+  - reward-model
+  - reward-modeling
   - rlhf
+  - grpo
+  - dpo
+  - gemma
+  - gemma3
+  - translation-quality
+  - africa
 base_model: CraneAILabs/ganda-gemma-1b
 pipeline_tag: text-classification
+library_name: transformers
 ---
+# Luganda Translation Reward Model (merged)
+A 1B parameter Gemma 3 reward model that scores English→Luganda translation quality.
+Outputs a scalar reward — higher = better translation.
+This is the **merged, ready-to-use version** of [`CraneAILabs/luganda-reward-model`](https://huggingface.co/CraneAILabs/luganda-reward-model). The original repo was uploaded as a TRL `AutoModelForCausalLMWithValueHead` PEFT checkpoint, which required manual LoRA merging + value-head wiring before it could be used. **This repo bakes those fixups in** so users can load it with one line.
+## Quick start
+```python
+from transformers import AutoModelForSequenceClassification, AutoTokenizer
+import torch
+tok = AutoTokenizer.from_pretrained("CraneAILabs/luganda-reward-model-merged")
+model = AutoModelForSequenceClassification.from_pretrained(
+    "CraneAILabs/luganda-reward-model-merged",
+    torch_dtype=torch.bfloat16,
+    device_map="auto",
+)
+model.eval()
+def score(prompt: str, response: str) -> float:
+    """Higher score = better Luganda translation."""
+    text = f"{prompt}\n\n{response}"
+    inputs = tok(text, return_tensors="pt", truncation=True, max_length=512).to(model.device)
+    with torch.no_grad():
+        out = model(**inputs)
+    return out.logits[0].item()
+# Examples
+print(score("Translate to Luganda: The children are playing.", "Abaana bazannya."))      # +8.0  ← good
+print(score("Translate to Luganda: I love my mother.",        "Njagala maama wange."))   # +5.5  ← good
+print(score("Translate to Luganda: I love my mother.",        "Mama love I."))           # +1.7  ← garbled
+print(score("Translate to Luganda: I love my mother.",        "Sssss xxxxx zzzzz."))     # +1.1  ← gibberish
+```
+## How it differs from the original repo
+| | Original | Merged (this repo) |
+|---|---|---|
+| **Format** | TRL `AutoModelForCausalLMWithValueHead` + PEFT LoRA | `Gemma3ForSequenceClassification` |
+| **Loading** | Requires manual PEFT load + LoRA merge + custom value_head wrapper | One line: `AutoModelForSequenceClassification.from_pretrained(...)` |
+| **State dict prefix** | `base_model.base_model.model.model.layers.{N}.{module}.{base_layer\|lora_A.default\|lora_B.default}.weight` | Standard `model.layers.{N}.{module}.weight` |
+| **Score head** | Loose `value_head.weight` tensor (shape `[1, 1152]`) | Wired in as `model.score` |
+| **Quantization** | float32 weights | bfloat16 weights (half the size, same precision for inference) |
+| **File size** | 4.0 GB pytorch_model.bin | 2.0 GB model.safetensors |
+## Score interpretation
+After running on a small held-out set:
+| Reward range | Interpretation |
+|---|---|
+| **> 5.0** | Coherent, fluent Luganda translation |
+| **2.0 – 5.0** | Luganda-shaped but possibly wrong meaning or partially correct |
+| **< 2.0** | Garbled, gibberish, or grossly incorrect |
+**Known weakness**: untranslated English text scores moderately high (~+6), because the training data did not explicitly penalize untranslated input. Don't use this model alone to detect "did the LLM actually translate?" — pair with a language detector.
+## Training details
+| | |
+|---|---|
+| Base model | `CraneAILabs/ganda-gemma-1b` (Luganda CPT of `google/gemma-3-1b-it`) |
+| Dataset | `CraneAILabs/pedagogy-luganda-reviewed` (299 reviewed translation rows → 1,490 rated examples) |
+| Eval set | `Sunbird/salt` (200 examples × 5 quality levels via rule-based degradation) |
+| Method | LoRA SFT regression (rank=32, α=64), then merged into base |
+| Loss | Weighted MSE on 1–5 ratings |
+| Hyperparameters | LR 2e-5, bs 4 (effective 8), 5 epochs, 10% warmup |
+For the full training writeup including a v1 failure analysis, see [`TRAINING_REPORT.md`](https://huggingface.co/CraneAILabs/luganda-reward-model/blob/main/TRAINING_REPORT.md) in the original repo.
 ## Citation
 ```bibtex
+@misc{craneailabs2026rewardmodel,
+  title={Luganda Translation Reward Model},
   author={Bakunga, Bronson and Mubiru, Kato Steven and Tukamushaba, Catherine},
   year={2026},
   publisher={Crane AI Labs},
+  url={https://huggingface.co/CraneAILabs/luganda-reward-model-merged}
 }
 ```
+## License
+Apache 2.0. Built on Gemma 3 — see [Gemma terms of use](https://ai.google.dev/gemma/terms).

config.json CHANGED Viewed

@@ -1,19 +1,27 @@
 {
   "architectures": [
-    "Gemma3ForCausalLM"
   ],
   "attention_bias": false,
   "attention_dropout": 0.0,
   "attn_logit_softcapping": null,
   "bos_token_id": 2,
   "cache_implementation": "hybrid",
   "eos_token_id": 106,
   "final_logit_softcapping": null,
   "head_dim": 256,
   "hidden_activation": "gelu_pytorch_tanh",
   "hidden_size": 1152,
   "initializer_range": 0.02,
   "intermediate_size": 6912,
   "layer_types": [
     "sliding_attention",
     "sliding_attention",
@@ -48,19 +56,26 @@
   "num_hidden_layers": 26,
   "num_key_value_heads": 1,
   "pad_token_id": 0,
   "query_pre_attn_scalar": 256,
   "rms_norm_eps": 1e-06,
-  "rope_local_base_freq": 10000,
-  "rope_scaling": null,
-  "rope_theta": 1000000,
   "sliding_window": 512,
   "sliding_window_pattern": 6,
-  "torch_dtype": "bfloat16",
-  "transformers_version": "4.53.0",
   "unsloth_fixed": true,
   "unsloth_version": "2025.6.7",
   "use_cache": true,
-  "vocab_size": 262144,
-  "num_labels": 1,
-  "problem_type": "regression"
-}

 {
+  "_sliding_window_pattern": 6,
   "architectures": [
+    "Gemma3TextForSequenceClassification"
   ],
   "attention_bias": false,
   "attention_dropout": 0.0,
   "attn_logit_softcapping": null,
   "bos_token_id": 2,
   "cache_implementation": "hybrid",
+  "dtype": "bfloat16",
   "eos_token_id": 106,
   "final_logit_softcapping": null,
   "head_dim": 256,
   "hidden_activation": "gelu_pytorch_tanh",
   "hidden_size": 1152,
+  "id2label": {
+    "0": "LABEL_0"
+  },
   "initializer_range": 0.02,
   "intermediate_size": 6912,
+  "label2id": {
+    "LABEL_0": 0
+  },
   "layer_types": [
     "sliding_attention",
     "sliding_attention",
   "num_hidden_layers": 26,
   "num_key_value_heads": 1,
   "pad_token_id": 0,
+  "problem_type": "regression",
   "query_pre_attn_scalar": 256,
   "rms_norm_eps": 1e-06,
+  "rope_parameters": {
+    "full_attention": {
+      "rope_theta": 1000000,
+      "rope_type": "default"
+    },
+    "sliding_attention": {
+      "rope_theta": 10000,
+      "rope_type": "default"
+    }
+  },
   "sliding_window": 512,
   "sliding_window_pattern": 6,
+  "tie_word_embeddings": true,
+  "transformers_version": "5.5.0",
   "unsloth_fixed": true,
   "unsloth_version": "2025.6.7",
+  "use_bidirectional_attention": false,
   "use_cache": true,
+  "vocab_size": 262144
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e36807f690164ff6c7544f5880f95d4da7f87e2a4fc43861cf3b014eda6135fc
+size 1999813600

tokenizer.json CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:3ff2eb2f470517c6123c6224cd75fa4b373953af438afa4b3114c9d7cf3309d8
-size 33384820

 version https://git-lfs.github.com/spec/v1
+oid sha256:f4708757955e49e5b23494815a523ffa5bdd0a7b67c09d16a093f6151245ec5b
+size 33384665

tokenizer_config.json CHANGED Viewed

The diff for this file is too large to render. See raw diff