---
license: cc-by-nc-4.0
tags:
- coreml
- translation
- nllb
- multilingual
- on-device
- iOS
- macOS
library_name: coremltools
base_model: facebook/nllb-200-distilled-600M
---

# NLLB-200 CoreML (128 tokens)

On-device neural machine translation for **200 languages** using CoreML on Apple devices (iPhone, iPad, Mac).

This is a CoreML conversion of [facebook/nllb-200-distilled-600M](https://huggingface.co/facebook/nllb-200-distilled-600M) optimized for:
- ✅ Fast on-device inference
- ✅ GPU/Neural Engine acceleration
- ✅ 128-token context (≈80-100 words)

## 📦 What's Included
```
.
├── NLLB_Encoder_128.mlpackage    # Encoder model (~1.5 GB)
├── NLLB_Decoder_128.mlpackage    # Decoder model (~1.7 GB)
├── tokenizer/                     # Tokenizer files
├── example.py                     # Ready-to-run example
└── language_codes.json            # Language code reference
```

## 🚀 Quick Start

### Installation
```bash
pip install coremltools transformers
```

### Download Models
```bash
# Clone this repo
git lfs install
git clone https://huggingface.co/cstr/nllb-200-coreml-128
cd nllb-200-coreml-128
```

### Run Translation
```python
from example import translate_text

# English to German
result = translate_text(
    "Hello, how are you today?",
    source_lang="eng_Latn",
    target_lang="deu_Latn"
)
print(result)  # "Hallo, wie geht es dir heute?"
```

## 💡 Usage Examples

### Multiple Languages
```python
from example import translate_text

# English → Spanish
translate_text("Good morning!", "eng_Latn", "spa_Latn")
# → "¡Buenos días!"

# French → English
translate_text("Bonjour le monde", "fra_Latn", "eng_Latn")
# → "Hello world"

# Japanese → English
translate_text("こんにちは", "jpn_Jpan", "eng_Latn")
# → "Hello"
```

### Production Usage
```python
import coremltools as ct
from transformers import AutoTokenizer

class Translator:
    def __init__(self):
        # Load once, reuse for all translations
        self.encoder = ct.models.MLModel(
            "NLLB_Encoder_128.mlpackage",
            compute_units=ct.ComputeUnit.ALL  # Use GPU
        )
        self.decoder = ct.models.MLModel(
            "NLLB_Decoder_128.mlpackage",
            compute_units=ct.ComputeUnit.ALL
        )
        self.tokenizer = AutoTokenizer.from_pretrained("./tokenizer")
        
    def translate(self, text, src_lang, tgt_lang):
        # Your translation logic here
        pass

# Create once
translator = Translator()

# Reuse many times (fast!)
translator.translate("Hello", "eng_Latn", "deu_Latn")
translator.translate("Goodbye", "eng_Latn", "fra_Latn")
```

## 🌍 Supported Languages

See `language_codes.json` for the full list of 200+ languages. Common examples:

| Language | Code |
|----------|------|
| English | `eng_Latn` |
| German | `deu_Latn` |
| French | `fra_Latn` |
| Spanish | `spa_Latn` |
| Chinese (Simplified) | `zho_Hans` |
| Japanese | `jpn_Jpan` |
| Arabic | `arb_Arab` |
| Russian | `rus_Cyrl` |

Full list: [NLLB Language Codes](https://github.com/facebookresearch/flores/blob/main/flores200/README.md#languages-in-flores-200)

## ⚙️ Technical Details

- **Max Tokens**: 128 (≈80-100 words depending on language)
- **Precision**: FLOAT16
- **Compute**: CPU + GPU + Neural Engine
- **Base Model**: facebook/nllb-200-distilled-600M

## 🔧 Advanced Options

### CPU-Only Mode
```python
encoder = ct.models.MLModel(
    "NLLB_Encoder_128.mlpackage",
    compute_units=ct.ComputeUnit.CPU_ONLY
)
```

### Batch Processing
```python
texts = ["Hello", "Goodbye", "Thank you"]
translations = [translate_text(t, "eng_Latn", "deu_Latn") for t in texts]
```

## ⚠️ Limitations

- **128 token limit**: Longer text is truncated (~80-100 words)
- **Quality**: Distilled model, slightly lower quality than full NLLB-3.3B
- **Low-resource languages**: May have reduced accuracy
- **No streaming**: Complete sentence processing only

## 📝 License

- **Models**: CC-BY-NC-4.0 (inherited from NLLB-200)
- **Code**: MIT

⚠️ **Non-commercial use only** per NLLB license

```