Bronsn commited on
Commit
a92f584
·
verified ·
1 Parent(s): f4af713

Replace PEFT+ValueHead format with merged Gemma3ForSequenceClassification (one-line load)

Browse files
Files changed (5) hide show
  1. README.md +74 -75
  2. config.json +25 -10
  3. model.safetensors +3 -0
  4. tokenizer.json +2 -2
  5. tokenizer_config.json +0 -0
README.md CHANGED
@@ -4,106 +4,105 @@ language:
4
  - lug
5
  - en
6
  tags:
7
- - reward-model
8
  - luganda
9
- - translation
 
10
  - rlhf
11
- - pairwise-ranking
12
- - low-resource
13
- - african-languages
 
 
 
14
  base_model: CraneAILabs/ganda-gemma-1b
15
  pipeline_tag: text-classification
 
16
  ---
17
 
18
- # Luganda Translation Reward Model V2
19
-
20
- A pairwise margin-ranking reward model for evaluating English-to-Luganda translation quality. Trained on the Ganda Gemma 1B base using LoRA (rank=32). Designed as the RLHF reward signal for improving Luganda translation models.
21
-
22
- ## Model Description
23
-
24
- - **Base model:** `CraneAILabs/ganda-gemma-1b`
25
- - **Method:** Pairwise margin ranking (Llama 2 style)
26
- - **Loss:** `-log(sigmoid(r_chosen - r_rejected - margin))`
27
- - **Parameters:** ~2.7% trainable via LoRA (rank=32, alpha=64)
28
- - **Training:** 1 epoch, LR=1e-5, dropout=0.2, weight_decay=0.1
29
- - **Best checkpoint:** Step 900 (eval_loss=0.6787)
30
-
31
- ## Training Data
32
-
33
- **10,856 pairwise comparisons** constructed from 1,490 rated translation examples:
34
-
35
- - 299 English sentences × 5 translation variants each = 1,490 rated examples
36
- - Ratings from professional Luganda translators (1-5 scale)
37
- - 856 additional reviewer correction pairs
38
- - Cartesian cross-bucket pairing with gap ≥ 2 quality levels
39
- - Margins: 0.50 (gap=2), 0.75 (gap=3), 1.00 (gap=4)
40
- - Train/eval split: 9,770 / 1,086
41
-
42
- ### Version History
43
-
44
- **V1 (Failed):** Weighted MSE regression. 856 reviewer corrections were all labeled 5.0, inflating that class from 5.4% to ~40%. SALT quality-level separation *decreased* during training (0.51 → 0.30). Abandoned.
45
 
46
- **V2 (Current):** Pairwise margin ranking. Fixed the data distribution issue by using cross-bucket pairing instead of regression targets.
 
47
 
48
- ## Evaluation Results
49
 
50
- Evaluated on 200 SALT translation examples with tanh normalization (`tanh(raw_score / 4.0) * 5.0`):
51
 
52
- | Metric | Value | Target | Status |
53
- |--------|-------|--------|--------|
54
- | SALT pairwise accuracy | **87.4%** | >70% | ✓ Passed |
55
- | Held-out pairwise accuracy | **98.0%** | >80% | ✓ Passed |
56
- | Spearman correlation | **0.80** (p=5.5e-227) | >0.70 | ✓ Passed |
57
- | Gold−Wrong separation | **3.86** | >1.50 | ✓ Passed |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
58
 
59
- ### Quality Level Scores (SALT, tanh normalized)
60
 
61
- | Quality Level | Mean Score |
62
- |---------------|-----------|
63
- | Gold (perfect) | 4.94 |
64
- | Minor errors | 4.89 |
65
- | Moderate errors | 3.83 |
66
- | Major errors | 1.22 |
67
- | Wrong language | 1.08 |
 
68
 
69
- ## Intended Use
70
 
71
- - RLHF reward signal for Luganda translation model training (rejection sampling, GRPO, PPO)
72
- - Automatic translation quality scoring (RL rejection threshold: score < 3.0)
73
- - Research on reward modeling for low-resource African languages
74
 
75
- ## Limitations
 
 
 
 
76
 
77
- - Trained on Luganda only does not generalize to other Bantu languages without fine-tuning
78
- - Rating data from a limited pool of translators — may not capture all dialect preferences
79
- - Correction pair accuracy varies: 99.5% on rating pairs but only 61.1% on some correction categories
80
- - The model evaluates translation quality, not fluency or cultural appropriateness separately
81
 
82
- ## How to Use
83
 
84
- ```python
85
- from transformers import AutoModelForSequenceClassification, AutoTokenizer
86
- from peft import PeftModel
 
 
 
 
 
87
 
88
- base = AutoModelForSequenceClassification.from_pretrained("CraneAILabs/ganda-gemma-1b")
89
- model = PeftModel.from_pretrained(base, "CraneAILabs/luganda-reward-model")
90
- tokenizer = AutoTokenizer.from_pretrained("CraneAILabs/luganda-reward-model")
91
-
92
- # Score a translation
93
- text = "<start_of_turn>user\nTranslate to Luganda: Hello\n<end_of_turn>\n<start_of_turn>model\nOli otya<end_of_turn>"
94
- inputs = tokenizer(text, return_tensors="pt")
95
- score = model(**inputs).logits.item()
96
- # Apply tanh normalization: tanh(score / 4.0) * 5.0
97
- ```
98
 
99
  ## Citation
100
 
101
  ```bibtex
102
- @misc{craneailabs2026reward,
103
- title={Pairwise Margin-Ranking Reward Model for Luganda Translation Quality},
104
  author={Bakunga, Bronson and Mubiru, Kato Steven and Tukamushaba, Catherine},
105
  year={2026},
106
  publisher={Crane AI Labs},
107
- url={https://huggingface.co/CraneAILabs/luganda-reward-model}
108
  }
109
  ```
 
 
 
 
 
4
  - lug
5
  - en
6
  tags:
 
7
  - luganda
8
+ - reward-model
9
+ - reward-modeling
10
  - rlhf
11
+ - grpo
12
+ - dpo
13
+ - gemma
14
+ - gemma3
15
+ - translation-quality
16
+ - africa
17
  base_model: CraneAILabs/ganda-gemma-1b
18
  pipeline_tag: text-classification
19
+ library_name: transformers
20
  ---
21
 
22
+ # Luganda Translation Reward Model (merged)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
 
24
+ A 1B parameter Gemma 3 reward model that scores English→Luganda translation quality.
25
+ Outputs a scalar reward — higher = better translation.
26
 
27
+ This is the **merged, ready-to-use version** of [`CraneAILabs/luganda-reward-model`](https://huggingface.co/CraneAILabs/luganda-reward-model). The original repo was uploaded as a TRL `AutoModelForCausalLMWithValueHead` PEFT checkpoint, which required manual LoRA merging + value-head wiring before it could be used. **This repo bakes those fixups in** so users can load it with one line.
28
 
29
+ ## Quick start
30
 
31
+ ```python
32
+ from transformers import AutoModelForSequenceClassification, AutoTokenizer
33
+ import torch
34
+
35
+ tok = AutoTokenizer.from_pretrained("CraneAILabs/luganda-reward-model-merged")
36
+ model = AutoModelForSequenceClassification.from_pretrained(
37
+ "CraneAILabs/luganda-reward-model-merged",
38
+ torch_dtype=torch.bfloat16,
39
+ device_map="auto",
40
+ )
41
+ model.eval()
42
+
43
+ def score(prompt: str, response: str) -> float:
44
+ """Higher score = better Luganda translation."""
45
+ text = f"{prompt}\n\n{response}"
46
+ inputs = tok(text, return_tensors="pt", truncation=True, max_length=512).to(model.device)
47
+ with torch.no_grad():
48
+ out = model(**inputs)
49
+ return out.logits[0].item()
50
+
51
+ # Examples
52
+ print(score("Translate to Luganda: The children are playing.", "Abaana bazannya.")) # +8.0 ← good
53
+ print(score("Translate to Luganda: I love my mother.", "Njagala maama wange.")) # +5.5 ← good
54
+ print(score("Translate to Luganda: I love my mother.", "Mama love I.")) # +1.7 ← garbled
55
+ print(score("Translate to Luganda: I love my mother.", "Sssss xxxxx zzzzz.")) # +1.1 ← gibberish
56
+ ```
57
 
58
+ ## How it differs from the original repo
59
 
60
+ | | Original | Merged (this repo) |
61
+ |---|---|---|
62
+ | **Format** | TRL `AutoModelForCausalLMWithValueHead` + PEFT LoRA | `Gemma3ForSequenceClassification` |
63
+ | **Loading** | Requires manual PEFT load + LoRA merge + custom value_head wrapper | One line: `AutoModelForSequenceClassification.from_pretrained(...)` |
64
+ | **State dict prefix** | `base_model.base_model.model.model.layers.{N}.{module}.{base_layer\|lora_A.default\|lora_B.default}.weight` | Standard `model.layers.{N}.{module}.weight` |
65
+ | **Score head** | Loose `value_head.weight` tensor (shape `[1, 1152]`) | Wired in as `model.score` |
66
+ | **Quantization** | float32 weights | bfloat16 weights (half the size, same precision for inference) |
67
+ | **File size** | 4.0 GB pytorch_model.bin | 2.0 GB model.safetensors |
68
 
69
+ ## Score interpretation
70
 
71
+ After running on a small held-out set:
 
 
72
 
73
+ | Reward range | Interpretation |
74
+ |---|---|
75
+ | **> 5.0** | Coherent, fluent Luganda translation |
76
+ | **2.0 – 5.0** | Luganda-shaped but possibly wrong meaning or partially correct |
77
+ | **< 2.0** | Garbled, gibberish, or grossly incorrect |
78
 
79
+ **Known weakness**: untranslated English text scores moderately high (~+6), because the training data did not explicitly penalize untranslated input. Don't use this model alone to detect "did the LLM actually translate?" — pair with a language detector.
 
 
 
80
 
81
+ ## Training details
82
 
83
+ | | |
84
+ |---|---|
85
+ | Base model | `CraneAILabs/ganda-gemma-1b` (Luganda CPT of `google/gemma-3-1b-it`) |
86
+ | Dataset | `CraneAILabs/pedagogy-luganda-reviewed` (299 reviewed translation rows → 1,490 rated examples) |
87
+ | Eval set | `Sunbird/salt` (200 examples × 5 quality levels via rule-based degradation) |
88
+ | Method | LoRA SFT regression (rank=32, α=64), then merged into base |
89
+ | Loss | Weighted MSE on 1–5 ratings |
90
+ | Hyperparameters | LR 2e-5, bs 4 (effective 8), 5 epochs, 10% warmup |
91
 
92
+ For the full training writeup including a v1 failure analysis, see [`TRAINING_REPORT.md`](https://huggingface.co/CraneAILabs/luganda-reward-model/blob/main/TRAINING_REPORT.md) in the original repo.
 
 
 
 
 
 
 
 
 
93
 
94
  ## Citation
95
 
96
  ```bibtex
97
+ @misc{craneailabs2026rewardmodel,
98
+ title={Luganda Translation Reward Model},
99
  author={Bakunga, Bronson and Mubiru, Kato Steven and Tukamushaba, Catherine},
100
  year={2026},
101
  publisher={Crane AI Labs},
102
+ url={https://huggingface.co/CraneAILabs/luganda-reward-model-merged}
103
  }
104
  ```
105
+
106
+ ## License
107
+
108
+ Apache 2.0. Built on Gemma 3 — see [Gemma terms of use](https://ai.google.dev/gemma/terms).
config.json CHANGED
@@ -1,19 +1,27 @@
1
  {
 
2
  "architectures": [
3
- "Gemma3ForCausalLM"
4
  ],
5
  "attention_bias": false,
6
  "attention_dropout": 0.0,
7
  "attn_logit_softcapping": null,
8
  "bos_token_id": 2,
9
  "cache_implementation": "hybrid",
 
10
  "eos_token_id": 106,
11
  "final_logit_softcapping": null,
12
  "head_dim": 256,
13
  "hidden_activation": "gelu_pytorch_tanh",
14
  "hidden_size": 1152,
 
 
 
15
  "initializer_range": 0.02,
16
  "intermediate_size": 6912,
 
 
 
17
  "layer_types": [
18
  "sliding_attention",
19
  "sliding_attention",
@@ -48,19 +56,26 @@
48
  "num_hidden_layers": 26,
49
  "num_key_value_heads": 1,
50
  "pad_token_id": 0,
 
51
  "query_pre_attn_scalar": 256,
52
  "rms_norm_eps": 1e-06,
53
- "rope_local_base_freq": 10000,
54
- "rope_scaling": null,
55
- "rope_theta": 1000000,
 
 
 
 
 
 
 
56
  "sliding_window": 512,
57
  "sliding_window_pattern": 6,
58
- "torch_dtype": "bfloat16",
59
- "transformers_version": "4.53.0",
60
  "unsloth_fixed": true,
61
  "unsloth_version": "2025.6.7",
 
62
  "use_cache": true,
63
- "vocab_size": 262144,
64
- "num_labels": 1,
65
- "problem_type": "regression"
66
- }
 
1
  {
2
+ "_sliding_window_pattern": 6,
3
  "architectures": [
4
+ "Gemma3TextForSequenceClassification"
5
  ],
6
  "attention_bias": false,
7
  "attention_dropout": 0.0,
8
  "attn_logit_softcapping": null,
9
  "bos_token_id": 2,
10
  "cache_implementation": "hybrid",
11
+ "dtype": "bfloat16",
12
  "eos_token_id": 106,
13
  "final_logit_softcapping": null,
14
  "head_dim": 256,
15
  "hidden_activation": "gelu_pytorch_tanh",
16
  "hidden_size": 1152,
17
+ "id2label": {
18
+ "0": "LABEL_0"
19
+ },
20
  "initializer_range": 0.02,
21
  "intermediate_size": 6912,
22
+ "label2id": {
23
+ "LABEL_0": 0
24
+ },
25
  "layer_types": [
26
  "sliding_attention",
27
  "sliding_attention",
 
56
  "num_hidden_layers": 26,
57
  "num_key_value_heads": 1,
58
  "pad_token_id": 0,
59
+ "problem_type": "regression",
60
  "query_pre_attn_scalar": 256,
61
  "rms_norm_eps": 1e-06,
62
+ "rope_parameters": {
63
+ "full_attention": {
64
+ "rope_theta": 1000000,
65
+ "rope_type": "default"
66
+ },
67
+ "sliding_attention": {
68
+ "rope_theta": 10000,
69
+ "rope_type": "default"
70
+ }
71
+ },
72
  "sliding_window": 512,
73
  "sliding_window_pattern": 6,
74
+ "tie_word_embeddings": true,
75
+ "transformers_version": "5.5.0",
76
  "unsloth_fixed": true,
77
  "unsloth_version": "2025.6.7",
78
+ "use_bidirectional_attention": false,
79
  "use_cache": true,
80
+ "vocab_size": 262144
81
+ }
 
 
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e36807f690164ff6c7544f5880f95d4da7f87e2a4fc43861cf3b014eda6135fc
3
+ size 1999813600
tokenizer.json CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:3ff2eb2f470517c6123c6224cd75fa4b373953af438afa4b3114c9d7cf3309d8
3
- size 33384820
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f4708757955e49e5b23494815a523ffa5bdd0a7b67c09d16a093f6151245ec5b
3
+ size 33384665
tokenizer_config.json CHANGED
The diff for this file is too large to render. See raw diff