NLLB-200 CoreML (128 tokens)
On-device neural machine translation for 200 languages using CoreML on Apple devices (iPhone, iPad, Mac).
This is a CoreML conversion of facebook/nllb-200-distilled-600M optimized for:
- β Fast on-device inference
- β GPU/Neural Engine acceleration
- β 128-token context (β80-100 words)
π¦ What's Included
.
βββ NLLB_Encoder_128.mlpackage # Encoder model (~1.5 GB)
βββ NLLB_Decoder_128.mlpackage # Decoder model (~1.7 GB)
βββ tokenizer/ # Tokenizer files
βββ example.py # Ready-to-run example
βββ language_codes.json # Language code reference
π Quick Start
Installation
pip install coremltools transformers
Download Models
# Clone this repo
git lfs install
git clone https://huggingface.co/cstr/nllb-200-coreml-128
cd nllb-200-coreml-128
Run Translation
from example import translate_text
# English to German
result = translate_text(
"Hello, how are you today?",
source_lang="eng_Latn",
target_lang="deu_Latn"
)
print(result) # "Hallo, wie geht es dir heute?"
π‘ Usage Examples
Multiple Languages
from example import translate_text
# English β Spanish
translate_text("Good morning!", "eng_Latn", "spa_Latn")
# β "Β‘Buenos dΓas!"
# French β English
translate_text("Bonjour le monde", "fra_Latn", "eng_Latn")
# β "Hello world"
# Japanese β English
translate_text("γγγ«γ‘γ―", "jpn_Jpan", "eng_Latn")
# β "Hello"
Production Usage
import coremltools as ct
from transformers import AutoTokenizer
class Translator:
def __init__(self):
# Load once, reuse for all translations
self.encoder = ct.models.MLModel(
"NLLB_Encoder_128.mlpackage",
compute_units=ct.ComputeUnit.ALL # Use GPU
)
self.decoder = ct.models.MLModel(
"NLLB_Decoder_128.mlpackage",
compute_units=ct.ComputeUnit.ALL
)
self.tokenizer = AutoTokenizer.from_pretrained("./tokenizer")
def translate(self, text, src_lang, tgt_lang):
# Your translation logic here
pass
# Create once
translator = Translator()
# Reuse many times (fast!)
translator.translate("Hello", "eng_Latn", "deu_Latn")
translator.translate("Goodbye", "eng_Latn", "fra_Latn")
π Supported Languages
See language_codes.json for the full list of 200+ languages. Common examples:
| Language | Code |
|---|---|
| English | eng_Latn |
| German | deu_Latn |
| French | fra_Latn |
| Spanish | spa_Latn |
| Chinese (Simplified) | zho_Hans |
| Japanese | jpn_Jpan |
| Arabic | arb_Arab |
| Russian | rus_Cyrl |
Full list: NLLB Language Codes
βοΈ Technical Details
- Max Tokens: 128 (β80-100 words depending on language)
- Precision: FLOAT16
- Compute: CPU + GPU + Neural Engine
- Base Model: facebook/nllb-200-distilled-600M
π§ Advanced Options
CPU-Only Mode
encoder = ct.models.MLModel(
"NLLB_Encoder_128.mlpackage",
compute_units=ct.ComputeUnit.CPU_ONLY
)
Batch Processing
texts = ["Hello", "Goodbye", "Thank you"]
translations = [translate_text(t, "eng_Latn", "deu_Latn") for t in texts]
β οΈ Limitations
- 128 token limit: Longer text is truncated (~80-100 words)
- Quality: Distilled model, slightly lower quality than full NLLB-3.3B
- Low-resource languages: May have reduced accuracy
- No streaming: Complete sentence processing only
π License
- Models: CC-BY-NC-4.0 (inherited from NLLB-200)
- Code: MIT
β οΈ Non-commercial use only per NLLB license
- Downloads last month
- 8
Model tree for cstr/nllb-200-coreml-128
Base model
facebook/nllb-200-distilled-600M