NLLB-200 CoreML (128 tokens)

On-device neural machine translation for 200 languages using CoreML on Apple devices (iPhone, iPad, Mac).

This is a CoreML conversion of facebook/nllb-200-distilled-600M optimized for:

✅ Fast on-device inference
✅ GPU/Neural Engine acceleration
✅ 128-token context (≈80-100 words)

📦 What's Included

.
├── NLLB_Encoder_128.mlpackage    # Encoder model (~1.5 GB)
├── NLLB_Decoder_128.mlpackage    # Decoder model (~1.7 GB)
├── tokenizer/                     # Tokenizer files
├── example.py                     # Ready-to-run example
└── language_codes.json            # Language code reference

🚀 Quick Start

Installation

pip install coremltools transformers

Download Models

# Clone this repo
git lfs install
git clone https://huggingface.co/cstr/nllb-200-coreml-128
cd nllb-200-coreml-128

Run Translation

from example import translate_text

# English to German
result = translate_text(
    "Hello, how are you today?",
    source_lang="eng_Latn",
    target_lang="deu_Latn"
)
print(result)  # "Hallo, wie geht es dir heute?"

💡 Usage Examples

Multiple Languages

from example import translate_text

# English → Spanish
translate_text("Good morning!", "eng_Latn", "spa_Latn")
# → "¡Buenos días!"

# French → English
translate_text("Bonjour le monde", "fra_Latn", "eng_Latn")
# → "Hello world"

# Japanese → English
translate_text("こんにちは", "jpn_Jpan", "eng_Latn")
# → "Hello"

Production Usage

import coremltools as ct
from transformers import AutoTokenizer

class Translator:
    def __init__(self):
        # Load once, reuse for all translations
        self.encoder = ct.models.MLModel(
            "NLLB_Encoder_128.mlpackage",
            compute_units=ct.ComputeUnit.ALL  # Use GPU
        )
        self.decoder = ct.models.MLModel(
            "NLLB_Decoder_128.mlpackage",
            compute_units=ct.ComputeUnit.ALL
        )
        self.tokenizer = AutoTokenizer.from_pretrained("./tokenizer")
        
    def translate(self, text, src_lang, tgt_lang):
        # Your translation logic here
        pass

# Create once
translator = Translator()

# Reuse many times (fast!)
translator.translate("Hello", "eng_Latn", "deu_Latn")
translator.translate("Goodbye", "eng_Latn", "fra_Latn")

🌍 Supported Languages

See language_codes.json for the full list of 200+ languages. Common examples:

Language	Code
English	`eng_Latn`
German	`deu_Latn`
French	`fra_Latn`
Spanish	`spa_Latn`
Chinese (Simplified)	`zho_Hans`
Japanese	`jpn_Jpan`
Arabic	`arb_Arab`
Russian	`rus_Cyrl`

Full list: NLLB Language Codes

⚙️ Technical Details

Max Tokens: 128 (≈80-100 words depending on language)
Precision: FLOAT16
Compute: CPU + GPU + Neural Engine
Base Model: facebook/nllb-200-distilled-600M

🔧 Advanced Options

CPU-Only Mode

encoder = ct.models.MLModel(
    "NLLB_Encoder_128.mlpackage",
    compute_units=ct.ComputeUnit.CPU_ONLY
)

Batch Processing

texts = ["Hello", "Goodbye", "Thank you"]
translations = [translate_text(t, "eng_Latn", "deu_Latn") for t in texts]

⚠️ Limitations

128 token limit: Longer text is truncated (~80-100 words)
Quality: Distilled model, slightly lower quality than full NLLB-3.3B
Low-resource languages: May have reduced accuracy
No streaming: Complete sentence processing only

📝 License

Models: CC-BY-NC-4.0 (inherited from NLLB-200)
Code: MIT

⚠️ Non-commercial use only per NLLB license

Downloads last month: 4

Model tree for cstr/nllb-200-coreml-128

Base model

facebook/nllb-200-distilled-600M

Quantized

(14)

this model