Upload NLLB-200 CoreML 128-token models with tokenizer

eac04e2 verified 10 days ago

4.09 kB

	---
	license: cc-by-nc-4.0
	tags:
	- coreml
	- translation
	- nllb
	- multilingual
	- on-device
	- iOS
	- macOS
	library_name: coremltools
	base_model: facebook/nllb-200-distilled-600M
	---

	# NLLB-200 CoreML (128 tokens)

	On-device neural machine translation for 200 languages using CoreML on Apple devices (iPhone, iPad, Mac).

	This is a CoreML conversion of [facebook/nllb-200-distilled-600M](https://huggingface.co/facebook/nllb-200-distilled-600M) optimized for:
	- ✅ Fast on-device inference
	- ✅ GPU/Neural Engine acceleration
	- ✅ 128-token context (≈80-100 words)

	## 📦 What's Included
	```
	.
	├── NLLB_Encoder_128.mlpackage # Encoder model (~1.5 GB)
	├── NLLB_Decoder_128.mlpackage # Decoder model (~1.7 GB)
	├── tokenizer/ # Tokenizer files
	├── example.py # Ready-to-run example
	└── language_codes.json # Language code reference
	```

	## 🚀 Quick Start

	### Installation
	```bash
	pip install coremltools transformers
	```

	### Download Models
	```bash
	# Clone this repo
	git lfs install
	git clone https://huggingface.co/cstr/nllb-200-coreml-128
	cd nllb-200-coreml-128
	```

	### Run Translation
	```python
	from example import translate_text

	# English to German
	result = translate_text(
	"Hello, how are you today?",
	source_lang="eng_Latn",
	target_lang="deu_Latn"
	)
	print(result) # "Hallo, wie geht es dir heute?"
	```

	## 💡 Usage Examples

	### Multiple Languages
	```python
	from example import translate_text

	# English → Spanish
	translate_text("Good morning!", "eng_Latn", "spa_Latn")
	# → "¡Buenos días!"

	# French → English
	translate_text("Bonjour le monde", "fra_Latn", "eng_Latn")
	# → "Hello world"

	# Japanese → English
	translate_text("こんにちは", "jpn_Jpan", "eng_Latn")
	# → "Hello"
	```

	### Production Usage
	```python
	import coremltools as ct
	from transformers import AutoTokenizer

	class Translator:
	def __init__(self):
	# Load once, reuse for all translations
	self.encoder = ct.models.MLModel(
	"NLLB_Encoder_128.mlpackage",
	compute_units=ct.ComputeUnit.ALL # Use GPU
	)
	self.decoder = ct.models.MLModel(
	"NLLB_Decoder_128.mlpackage",
	compute_units=ct.ComputeUnit.ALL
	)
	self.tokenizer = AutoTokenizer.from_pretrained("./tokenizer")

	def translate(self, text, src_lang, tgt_lang):
	# Your translation logic here
	pass

	# Create once
	translator = Translator()

	# Reuse many times (fast!)
	translator.translate("Hello", "eng_Latn", "deu_Latn")
	translator.translate("Goodbye", "eng_Latn", "fra_Latn")
	```

	## 🌍 Supported Languages

	See `language_codes.json` for the full list of 200+ languages. Common examples:

	\| Language \| Code \|
	\|----------\|------\|
	\| English \| `eng_Latn` \|
	\| German \| `deu_Latn` \|
	\| French \| `fra_Latn` \|
	\| Spanish \| `spa_Latn` \|
	\| Chinese (Simplified) \| `zho_Hans` \|
	\| Japanese \| `jpn_Jpan` \|
	\| Arabic \| `arb_Arab` \|
	\| Russian \| `rus_Cyrl` \|

	Full list: [NLLB Language Codes](https://github.com/facebookresearch/flores/blob/main/flores200/README.md#languages-in-flores-200)

	## ⚙️ Technical Details

	- Max Tokens: 128 (≈80-100 words depending on language)
	- Precision: FLOAT16
	- Compute: CPU + GPU + Neural Engine
	- Base Model: facebook/nllb-200-distilled-600M

	## 🔧 Advanced Options

	### CPU-Only Mode
	```python
	encoder = ct.models.MLModel(
	"NLLB_Encoder_128.mlpackage",
	compute_units=ct.ComputeUnit.CPU_ONLY
	)
	```

	### Batch Processing
	```python
	texts = ["Hello", "Goodbye", "Thank you"]
	translations = [translate_text(t, "eng_Latn", "deu_Latn") for t in texts]
	```

	## ⚠️ Limitations

	- 128 token limit: Longer text is truncated (~80-100 words)
	- Quality: Distilled model, slightly lower quality than full NLLB-3.3B
	- Low-resource languages: May have reduced accuracy
	- No streaming: Complete sentence processing only

	## 📝 License

	- Models: CC-BY-NC-4.0 (inherited from NLLB-200)
	- Code: MIT

	⚠️ Non-commercial use only per NLLB license

	```