Instructions to use MosesJoshuaCoker/text-to-text with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use MosesJoshuaCoker/text-to-text with Transformers:
# Use a pipeline as a high-level helper # Warning: Pipeline type "translation" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("translation", model="MosesJoshuaCoker/text-to-text")# Load model directly from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("MosesJoshuaCoker/text-to-text") model = AutoModelForSeq2SeqLM.from_pretrained("MosesJoshuaCoker/text-to-text") - Notebooks
- Google Colab
- Kaggle
mBART-50 fine-tuned for English ⇄ Krio translation
A bidirectional machine-translation model for Krio (the English-lexified
creole spoken in Sierra Leone), fine-tuned from
facebook/mbart-large-50-many-to-many-mmt.
A single model translates English → Krio and Krio → English.
Krio is not one of mBART-50's 50 supported languages, so a dedicated language
token kri_SL was added to the tokenizer (warm-started from the English
embedding en_XX) before fine-tuning.
Model Details
Model Description
- Developed by: Moses Joshua Coker
- Model type: Sequence-to-sequence Transformer (mBART-50) for translation
- Language(s) (NLP): English (
en), Krio (kri) - License: MIT (inherited from the base model)
- Finetuned from model:
facebook/mbart-large-50-many-to-many-mmt
Model Sources
- Repository: https://huggingface.co/MosesJoshuaCoker/mbart-large-50-krio
- Base model: https://huggingface.co/facebook/mbart-large-50-many-to-many-mmt
- Training data: https://huggingface.co/datasets/MosesJoshuaCoker/krio_dataset_novax
Uses
Direct Use
Translating short, everyday text between English and Krio — greetings, common phrases, basic conversational and informational sentences.
Downstream Use
A starting checkpoint for further fine-tuning on larger or domain-specific English–Krio parallel data, or for back-translation pipelines that generate synthetic data to expand Krio resources.
Out-of-Scope Use
Not suitable for high-stakes settings (medical, legal, safety-critical) without human review. Quality degrades on long, technical, or out-of-domain text, and on code-switched input. It does not translate languages other than English and Krio.
Bias, Risks, and Limitations
- Small training set (~1,943 pairs) of mostly short phrases and everyday vocabulary, so coverage is narrow and the model may be fluent-but-wrong on unfamiliar inputs.
- Krio has no fully standardized orthography; the model reflects the
spelling conventions of this dataset (including characters such as
É›,É”) and may not match other written conventions. - Like all NMT models, it can hallucinate, omit content, or carry over social biases present in the training data.
Recommendations
Use human review for anything consequential, prefer short/simple inputs, and report chrF alongside BLEU since chrF is more reliable for low-resource, morphologically varied text.
How to Get Started with the Model
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
REPO = "MosesJoshuaCoker/mbart-large-50-krio" # update to your repo id
EN, KRI = "en_XX", "kri_SL"
tok = AutoTokenizer.from_pretrained(REPO)
model = AutoModelForSeq2SeqLM.from_pretrained(REPO)
# kri_SL is a custom language token, so it is not in lang_code_to_id after a
# fresh load — register it (and pass the id directly to forced_bos_token_id).
if KRI not in tok.lang_code_to_id:
tok.lang_code_to_id[KRI] = tok.convert_tokens_to_ids(KRI)
def translate(text, src_lang, tgt_lang, num_beams=5, max_new_tokens=96):
tok.src_lang = src_lang
enc = tok(text, return_tensors="pt")
out = model.generate(
**enc,
forced_bos_token_id=tok.convert_tokens_to_ids(tgt_lang),
num_beams=num_beams,
max_new_tokens=max_new_tokens,
)
return tok.batch_decode(out, skip_special_tokens=True)[0]
print(translate("Good morning, how are you?", EN, KRI)) # English -> Krio
print(translate("Aw yu de do?", KRI, EN)) # Krio -> English
Training Details
Training Data
MosesJoshuaCoker/krio_dataset_novax
— 1,943 parallel English↔Krio pairs (columns English, Krio), mostly short
phrases and everyday vocabulary.
Training Procedure
Preprocessing
- Empty/whitespace-only rows filtered out.
- Split at the pair level into train / validation / test (≈90% / 5% / 5%) so no pair leaks across splits.
- Each training pair was used in both directions (en→kri and kri→en), so a
single model is bidirectional. Direction is controlled at inference via
forced_bos_token_id. - Added the
kri_SLlanguage token, resized embeddings, and warm-started it fromen_XX. Any orthographic characters not representable by the SentencePiece vocab were added as tokens.
Training Hyperparameters
- Training regime: fp16 mixed precision
- Base model:
facebook/mbart-large-50-many-to-many-mmt(~610M params) - Optimizer: AdamW (fused), learning rate 3e-5, weight decay 0.01, warmup ratio 0.1
- Epochs: 15, best checkpoint selected by validation loss
- Batch: per-device 8 × gradient accumulation 4 = effective batch size 32
- Max sequence length: 96 tokens
- Other: gradient checkpointing enabled, label smoothing 0.0
Speeds, Sizes, Times
- Hardware: single NVIDIA Tesla T4 (16 GB) on a Kaggle notebook.
- Fine-tuning runs in roughly an hour for this dataset size.
Evaluation
Testing Data, Factors & Metrics
Testing Data
The held-out ~5% test split of MosesJoshuaCoker/krio_dataset_novax, evaluated
in both translation directions.
Metrics
- chrF (sacreBLEU) — primary metric; character-level F-score, well suited to low-resource and morphologically varied text.
- BLEU (sacreBLEU) — reported for comparability.
Decoding: beam search (num_beams=5).
Results
TODO: paste the numbers printed by the notebook's evaluation cell.
| Direction | chrF | BLEU |
|---|---|---|
| English → Krio | TODO | TODO |
| Krio → English | TODO | TODO |
Summary
A compact bidirectional EN⇄Krio model. Given the small training set, treat chrF as the headline metric and expect best results on short, in-domain inputs.
Environmental Impact
Carbon emissions can be estimated with the Machine Learning Impact calculator (Lacoste et al., 2019).
- Hardware Type: NVIDIA Tesla T4 (16 GB)
- Hours used: ~1
- Cloud Provider: Kaggle (Google Cloud Platform)
- Compute Region: Unknown
Technical Specifications
Model Architecture and Objective
mBART-50, a multilingual sequence-to-sequence Transformer (12-layer encoder /
12-layer decoder), trained with a token-level cross-entropy translation
objective. Source and target languages are specified with language-code tokens
(en_XX, kri_SL); the target language is forced at decode time via
forced_bos_token_id.
Compute Infrastructure
- Hardware: 1× NVIDIA Tesla T4 (16 GB)
- Software: PyTorch, 🤗 Transformers, Datasets, and Evaluate (sacreBLEU)
Citation
If you use this model, please cite the base model and dataset.
mBART-50 (base model):
@article{tang2020multilingual,
title={Multilingual Translation with Extensible Multilingual Pretraining and Finetuning},
author={Tang, Yuqing and Tran, Chau and Li, Xian and Chen, Peng-Jen and Goyal, Naman and Chaudhary, Vishrav and Gu, Jiatao and Fan, Angela},
journal={arXiv preprint arXiv:2008.00401},
year={2020}
}
Model Card Authors
Moses Joshua Coker
Model Card Contact
Via the Hugging Face repository: https://huggingface.co/MosesJoshuaCoker
- Downloads last month
- 44
Model tree for MosesJoshuaCoker/text-to-text
Base model
facebook/mbart-large-50-many-to-many-mmt