Translation
PEFT
Safetensors
Turkish
Laz
laz
lazuri
turkish
endangered-language
kartvelian
low-resource
Instructions to use CidQuLimited/LazuriMT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use CidQuLimited/LazuriMT with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("unsloth/gemma-4-e4b-it-unsloth-bnb-4bit") model = PeftModel.from_pretrained(base_model, "CidQuLimited/LazuriMT") - Notebooks
- Google Colab
- Kaggle
v0.2: chrF 26.97 on 200 TR->LZ test pairs (LoRA r=64, 18k steps, A100)
Browse files- README.md +20 -19
- adapter_config.json +5 -3
- adapter_model.safetensors +2 -2
README.md
CHANGED
|
@@ -27,11 +27,11 @@ pipeline_tag: translation
|
|
| 27 |
|
| 28 |
---
|
| 29 |
|
| 30 |
-
LoRA adapter for Gemma 4 E4B. **v0.
|
| 31 |
|
| 32 |
## ⚠️ Status: research preview, not production-quality
|
| 33 |
|
| 34 |
-
- **chrF on 200 held-out test pairs (TR→LZ):
|
| 35 |
- Real Laz output for natural sentences, but uneven on rare vocabulary and dialect conditioning.
|
| 36 |
- Built for endangered-language preservation, research, and community use.
|
| 37 |
- Full training pipeline + iteration log: <https://github.com/CidQu/lazca_ai>
|
|
@@ -64,10 +64,10 @@ def translate(text, to="lzz"):
|
|
| 64 |
print(translate("Su içmek istiyorum."))
|
| 65 |
```
|
| 66 |
|
| 67 |
-
Pin to a specific release with `revision="v0.1"`:
|
| 68 |
|
| 69 |
```python
|
| 70 |
-
model = PeftModel.from_pretrained(model, "CidQuLimited/LazuriMT", revision="v0.
|
| 71 |
```
|
| 72 |
|
| 73 |
## Performance
|
|
@@ -77,7 +77,8 @@ chrF computed on 200 held-out TR→LZ pairs from the corpus's test split (5%), w
|
|
| 77 |
| Version | chrF (TR→LZ) | Notes |
|
| 78 |
|---|---:|---|
|
| 79 |
| baseline Gemma 4 E4B (no adapter) | ≈ 0 | does not translate Laz |
|
| 80 |
-
| v0.1
|
|
|
|
| 81 |
|
| 82 |
For context, chrF roughly maps:
|
| 83 |
- ~10: garbled
|
|
@@ -85,29 +86,29 @@ For context, chrF roughly maps:
|
|
| 85 |
- ~40+: useful translations
|
| 86 |
- ~50+: professional-level
|
| 87 |
|
| 88 |
-
LazuriMT v0.
|
| 89 |
|
| 90 |
## Training setup
|
| 91 |
|
| 92 |
- **Base model**: `unsloth/gemma-4-e4b-it-unsloth-bnb-4bit` (Gemma 4 E4B, pre-quantized to 4-bit)
|
| 93 |
-
- **Adapter**: LoRA on language layers (attention + MLP), `r=
|
| 94 |
-
- **Trainable params**:
|
| 95 |
- **Loss masking**: response-only (loss computed on Laz output tokens, instruction prompt masked)
|
| 96 |
-
- **Optimizer**: 8-bit AdamW, `lr=2e-4`,
|
| 97 |
-
- **Batch**: 16 effective (8 per-device × 2 grad-accum
|
| 98 |
-
- **Steps**:
|
| 99 |
-
- **Hardware**: 1× NVIDIA
|
| 100 |
-
- **Training time**: ~
|
| 101 |
- **Bidirectional**: every TR↔LZ pair is presented in both directions during training
|
| 102 |
|
| 103 |
-
## Known limitations (and v0.
|
| 104 |
|
| 105 |
-
1. **Dialect conditioning doesn't differentiate output
|
| 106 |
-
"Atina (Pazar)" vs "Xopa (Hopa)" prompts
|
| 107 |
2. **Short single-word queries collapse onto plausible-wrong tokens** (e.g. dictionary-style TR words sometimes yield a wrong Laz lemma). The corpus's still-dominant vocab slice teaches vocabulary lookup imperfectly.
|
| 108 |
-
3. **Long sentences
|
| 109 |
4. **Vocabulary edge cases** — some real Laz words are mistranslated (model emits a wrong-but-plausible Laz word).
|
| 110 |
-
5. **Single dialect bias in output** — the corpus is mostly general-form Laz with the largest single-dialect contribution being Atina (Pazar)
|
| 111 |
|
| 112 |
## Bias and intended use
|
| 113 |
|
|
@@ -130,7 +131,7 @@ The training corpus mixes open-license sources (Wikipedia CC-BY-SA, Mozilla Comm
|
|
| 130 |
year = {2026},
|
| 131 |
publisher = {Hugging Face},
|
| 132 |
howpublished = {\url{https://huggingface.co/CidQuLimited/LazuriMT}},
|
| 133 |
-
note = {v0.
|
| 134 |
}
|
| 135 |
```
|
| 136 |
|
|
|
|
| 27 |
|
| 28 |
---
|
| 29 |
|
| 30 |
+
LoRA adapter for Gemma 4 E4B. **v0.2 research preview.**
|
| 31 |
|
| 32 |
## ⚠️ Status: research preview, not production-quality
|
| 33 |
|
| 34 |
+
- **chrF on 200 held-out test pairs (TR→LZ): 26.97** (v0.1 was 24.66)
|
| 35 |
- Real Laz output for natural sentences, but uneven on rare vocabulary and dialect conditioning.
|
| 36 |
- Built for endangered-language preservation, research, and community use.
|
| 37 |
- Full training pipeline + iteration log: <https://github.com/CidQu/lazca_ai>
|
|
|
|
| 64 |
print(translate("Su içmek istiyorum."))
|
| 65 |
```
|
| 66 |
|
| 67 |
+
Pin to a specific release with `revision="v0.2"` (or `"v0.1"` for the older one):
|
| 68 |
|
| 69 |
```python
|
| 70 |
+
model = PeftModel.from_pretrained(model, "CidQuLimited/LazuriMT", revision="v0.2")
|
| 71 |
```
|
| 72 |
|
| 73 |
## Performance
|
|
|
|
| 77 |
| Version | chrF (TR→LZ) | Notes |
|
| 78 |
|---|---:|---|
|
| 79 |
| baseline Gemma 4 E4B (no adapter) | ≈ 0 | does not translate Laz |
|
| 80 |
+
| v0.1 | 24.66 | LoRA r=32, 10,500 masked-loss steps (~2.15 epochs), Kaggle T4 |
|
| 81 |
+
| **v0.2 (this release)** | **26.97** | LoRA r=64, 18,000 steps (3 epochs), A100, cosine-restart LR, 3× dialect upweight |
|
| 82 |
|
| 83 |
For context, chrF roughly maps:
|
| 84 |
- ~10: garbled
|
|
|
|
| 86 |
- ~40+: useful translations
|
| 87 |
- ~50+: professional-level
|
| 88 |
|
| 89 |
+
LazuriMT v0.2 is in the "readable but flawed" range — a real but early baseline for a language with almost no prior MT.
|
| 90 |
|
| 91 |
## Training setup
|
| 92 |
|
| 93 |
- **Base model**: `unsloth/gemma-4-e4b-it-unsloth-bnb-4bit` (Gemma 4 E4B, pre-quantized to 4-bit)
|
| 94 |
+
- **Adapter**: LoRA on language layers (attention + MLP), `r=64`, `α=64`, dropout 0
|
| 95 |
+
- **Trainable params**: 146,800,640 of 8,142,957,088 (1.80 %)
|
| 96 |
- **Loss masking**: response-only (loss computed on Laz output tokens, instruction prompt masked)
|
| 97 |
+
- **Optimizer**: 8-bit AdamW, `lr=2e-4`, cosine-with-restarts (2 cycles), warmup_ratio 0.03, bf16
|
| 98 |
+
- **Batch**: 16 effective (8 per-device × 2 grad-accum)
|
| 99 |
+
- **Steps**: 18,000 (3 epochs over 102,461 conversations, incl. 3× dialect upweighting + grammar examples)
|
| 100 |
+
- **Hardware**: 1× NVIDIA A100-40GB (Modal), Unsloth runtime
|
| 101 |
+
- **Training time**: ~8 h (full run, no timeout)
|
| 102 |
- **Bidirectional**: every TR↔LZ pair is presented in both directions during training
|
| 103 |
|
| 104 |
+
## Known limitations (and v0.3 roadmap)
|
| 105 |
|
| 106 |
+
1. **Dialect conditioning still doesn't differentiate output.**
|
| 107 |
+
"Atina (Pazar)" vs "Xopa (Hopa)" prompts produce near-identical translations. v0.2 *attempted* a fix — 3× upweighting of dialect-tagged pairs plus a front-loaded `[Laz dialect: X]` label in the prompt — but it did not meaningfully change behavior. The likely cause: even at 3×, dialect-tagged pairs are only ~9 % of the training mix, so the model defaults to general-form Laz. v0.3 will try a **dialect-balanced sampler** (equal exposure per dialect rather than blunt upweighting) plus additional dialect-tagged parallel data.
|
| 108 |
2. **Short single-word queries collapse onto plausible-wrong tokens** (e.g. dictionary-style TR words sometimes yield a wrong Laz lemma). The corpus's still-dominant vocab slice teaches vocabulary lookup imperfectly.
|
| 109 |
+
3. **Long, content-dense sentences degrade** — they can diverge substantially from the reference (more a coverage/data-volume issue than a decoding one).
|
| 110 |
4. **Vocabulary edge cases** — some real Laz words are mistranslated (model emits a wrong-but-plausible Laz word).
|
| 111 |
+
5. **Single dialect bias in output** — the corpus is mostly general-form Laz with the largest single-dialect contribution being Atina (Pazar); expect output to lean general / Atina.
|
| 112 |
|
| 113 |
## Bias and intended use
|
| 114 |
|
|
|
|
| 131 |
year = {2026},
|
| 132 |
publisher = {Hugging Face},
|
| 133 |
howpublished = {\url{https://huggingface.co/CidQuLimited/LazuriMT}},
|
| 134 |
+
note = {v0.2 research preview, chrF 26.97 on 200 TR→LZ test pairs}
|
| 135 |
}
|
| 136 |
```
|
| 137 |
|
adapter_config.json
CHANGED
|
@@ -20,22 +20,24 @@
|
|
| 20 |
"layers_pattern": null,
|
| 21 |
"layers_to_transform": null,
|
| 22 |
"loftq_config": {},
|
| 23 |
-
"lora_alpha":
|
| 24 |
"lora_bias": false,
|
| 25 |
"lora_dropout": 0,
|
|
|
|
| 26 |
"megatron_config": null,
|
| 27 |
"megatron_core": "megatron.core",
|
| 28 |
"modules_to_save": null,
|
| 29 |
"peft_type": "LORA",
|
| 30 |
-
"peft_version": "0.
|
| 31 |
"qalora_group_size": 16,
|
| 32 |
-
"r":
|
| 33 |
"rank_pattern": {},
|
| 34 |
"revision": null,
|
| 35 |
"target_modules": "(?:.*?(?:language|text).*?(?:self_attn|attention|attn|mlp|feed_forward|ffn|dense).*?(?:k_proj|q_proj|v_proj|o_proj|gate_proj|up_proj|down_proj|per_layer_input_gate|per_layer_projection|linear|embedding_projection|relative_k_proj).*?)|(?:\\bmodel\\.layers\\.[\\d]{1,}\\.(?:self_attn|attention|attn|mlp|feed_forward|ffn|dense)\\.(?:(?:k_proj|q_proj|v_proj|o_proj|gate_proj|up_proj|down_proj|per_layer_input_gate|per_layer_projection|linear|embedding_projection|relative_k_proj)))",
|
| 36 |
"target_parameters": null,
|
| 37 |
"task_type": "CAUSAL_LM",
|
| 38 |
"trainable_token_indices": null,
|
|
|
|
| 39 |
"use_dora": false,
|
| 40 |
"use_qalora": false,
|
| 41 |
"use_rslora": false
|
|
|
|
| 20 |
"layers_pattern": null,
|
| 21 |
"layers_to_transform": null,
|
| 22 |
"loftq_config": {},
|
| 23 |
+
"lora_alpha": 64,
|
| 24 |
"lora_bias": false,
|
| 25 |
"lora_dropout": 0,
|
| 26 |
+
"lora_ga_config": null,
|
| 27 |
"megatron_config": null,
|
| 28 |
"megatron_core": "megatron.core",
|
| 29 |
"modules_to_save": null,
|
| 30 |
"peft_type": "LORA",
|
| 31 |
+
"peft_version": "0.19.1",
|
| 32 |
"qalora_group_size": 16,
|
| 33 |
+
"r": 64,
|
| 34 |
"rank_pattern": {},
|
| 35 |
"revision": null,
|
| 36 |
"target_modules": "(?:.*?(?:language|text).*?(?:self_attn|attention|attn|mlp|feed_forward|ffn|dense).*?(?:k_proj|q_proj|v_proj|o_proj|gate_proj|up_proj|down_proj|per_layer_input_gate|per_layer_projection|linear|embedding_projection|relative_k_proj).*?)|(?:\\bmodel\\.layers\\.[\\d]{1,}\\.(?:self_attn|attention|attn|mlp|feed_forward|ffn|dense)\\.(?:(?:k_proj|q_proj|v_proj|o_proj|gate_proj|up_proj|down_proj|per_layer_input_gate|per_layer_projection|linear|embedding_projection|relative_k_proj)))",
|
| 37 |
"target_parameters": null,
|
| 38 |
"task_type": "CAUSAL_LM",
|
| 39 |
"trainable_token_indices": null,
|
| 40 |
+
"use_bdlora": null,
|
| 41 |
"use_dora": false,
|
| 42 |
"use_qalora": false,
|
| 43 |
"use_rslora": false
|
adapter_model.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:2b55e8108a806a355feaea4b55cba23d55ab7261cc1ee3b75b3c56960a66c3e1
|
| 3 |
+
size 587290752
|