NLLB-200 SENCOTEN to English Translation Model
This model is a fine-tuned version of facebook/nllb-200-distilled-600M for translating from SENCOTEN (a Coast Salish language) to English.
Model Description
- Base Model: facebook/nllb-200-distilled-600M
- Custom Tokenizer: jwarrenbc/nllb-200-sencoten-tokenizer
- Language Pair: SENCOTEN → English
- Language Code:
sen_Latn(SENCOTEN),eng_Latn(English)
Training Data
The model was trained on combined SENCOTEN Dictionary phrases and Grammar sentences.
- Training samples: 16,783
- Validation samples: 1,865
Usage
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
model = AutoModelForSeq2SeqLM.from_pretrained("jwarrenbc/nllb-200-sencoten-english")
tokenizer = AutoTokenizer.from_pretrained("jwarrenbc/nllb-200-sencoten-english")
# Set source language to SENCOTEN
tokenizer.src_lang = "sen_Latn"
# Translate SENCOTEN to English
text = "kw'unnuhw sun kwsu sway'qu' i' kwsu slheni'."
inputs = tokenizer(text, return_tensors="pt")
translated_tokens = model.generate(
**inputs,
forced_bos_token_id=tokenizer.convert_tokens_to_ids("eng_Latn"),
max_length=128
)
translation = tokenizer.decode(translated_tokens[0], skip_special_tokens=True)
print(translation)
Training Procedure
- Epochs: 10
- Batch Size: 4 (effective: 32)
- Learning Rate: 5e-05
- Warmup Steps: 500
- FP16: True
Evaluation Results
- Eval Loss: 1.228232741355896
- BLEU Score: 23.406918280118166
Acknowledgments
This model was developed to support SENCOTEN language preservation and revitalization efforts.
- Downloads last month
- 5
Model tree for jwarrenbc/nllb-200-sencoten-english
Base model
facebook/nllb-200-distilled-600M