NLLB-200 CoreML (128 tokens)

On-device neural machine translation for 200 languages using CoreML on Apple devices (iPhone, iPad, Mac).

This is a CoreML conversion of facebook/nllb-200-distilled-600M optimized for:

  • βœ… Fast on-device inference
  • βœ… GPU/Neural Engine acceleration
  • βœ… 128-token context (β‰ˆ80-100 words)

πŸ“¦ What's Included

.
β”œβ”€β”€ NLLB_Encoder_128.mlpackage    # Encoder model (~1.5 GB)
β”œβ”€β”€ NLLB_Decoder_128.mlpackage    # Decoder model (~1.7 GB)
β”œβ”€β”€ tokenizer/                     # Tokenizer files
β”œβ”€β”€ example.py                     # Ready-to-run example
└── language_codes.json            # Language code reference

πŸš€ Quick Start

Installation

pip install coremltools transformers

Download Models

# Clone this repo
git lfs install
git clone https://huggingface.co/cstr/nllb-200-coreml-128
cd nllb-200-coreml-128

Run Translation

from example import translate_text

# English to German
result = translate_text(
    "Hello, how are you today?",
    source_lang="eng_Latn",
    target_lang="deu_Latn"
)
print(result)  # "Hallo, wie geht es dir heute?"

πŸ’‘ Usage Examples

Multiple Languages

from example import translate_text

# English β†’ Spanish
translate_text("Good morning!", "eng_Latn", "spa_Latn")
# β†’ "Β‘Buenos dΓ­as!"

# French β†’ English
translate_text("Bonjour le monde", "fra_Latn", "eng_Latn")
# β†’ "Hello world"

# Japanese β†’ English
translate_text("こんにけは", "jpn_Jpan", "eng_Latn")
# β†’ "Hello"

Production Usage

import coremltools as ct
from transformers import AutoTokenizer

class Translator:
    def __init__(self):
        # Load once, reuse for all translations
        self.encoder = ct.models.MLModel(
            "NLLB_Encoder_128.mlpackage",
            compute_units=ct.ComputeUnit.ALL  # Use GPU
        )
        self.decoder = ct.models.MLModel(
            "NLLB_Decoder_128.mlpackage",
            compute_units=ct.ComputeUnit.ALL
        )
        self.tokenizer = AutoTokenizer.from_pretrained("./tokenizer")
        
    def translate(self, text, src_lang, tgt_lang):
        # Your translation logic here
        pass

# Create once
translator = Translator()

# Reuse many times (fast!)
translator.translate("Hello", "eng_Latn", "deu_Latn")
translator.translate("Goodbye", "eng_Latn", "fra_Latn")

🌍 Supported Languages

See language_codes.json for the full list of 200+ languages. Common examples:

Language Code
English eng_Latn
German deu_Latn
French fra_Latn
Spanish spa_Latn
Chinese (Simplified) zho_Hans
Japanese jpn_Jpan
Arabic arb_Arab
Russian rus_Cyrl

Full list: NLLB Language Codes

βš™οΈ Technical Details

  • Max Tokens: 128 (β‰ˆ80-100 words depending on language)
  • Precision: FLOAT16
  • Compute: CPU + GPU + Neural Engine
  • Base Model: facebook/nllb-200-distilled-600M

πŸ”§ Advanced Options

CPU-Only Mode

encoder = ct.models.MLModel(
    "NLLB_Encoder_128.mlpackage",
    compute_units=ct.ComputeUnit.CPU_ONLY
)

Batch Processing

texts = ["Hello", "Goodbye", "Thank you"]
translations = [translate_text(t, "eng_Latn", "deu_Latn") for t in texts]

⚠️ Limitations

  • 128 token limit: Longer text is truncated (~80-100 words)
  • Quality: Distilled model, slightly lower quality than full NLLB-3.3B
  • Low-resource languages: May have reduced accuracy
  • No streaming: Complete sentence processing only

πŸ“ License

  • Models: CC-BY-NC-4.0 (inherited from NLLB-200)
  • Code: MIT

⚠️ Non-commercial use only per NLLB license


Downloads last month
8
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for cstr/nllb-200-coreml-128

Quantized
(9)
this model