NLLB-200-Distilled-600M: CTranslate2 Optimized (Float16)
This repository contains a converted and quantized version of the NLLB-200-Distilled-600M model, optimized for high-performance inference using the CTranslate2 engine.
Model Description
- Original Model: facebook/nllb-200-distilled-600M
- Architecture: Encoder-Decoder Transformer (Dense)
- Quantization: Float16 (Half-Precision)
- Format: CTranslate2 Binary Serialization
Optimization Details
The model has been converted to the CTranslate2 format, which significantly reduces the Python overhead during inference by utilizing a custom C++ Graph Execution engine. By using Float16 precision, the model size is reduced by 50% (to **1.2 GB**) with negligible loss in BLEU/ChrF++ scores compared to the original FP32 weights.
Technical Specifications
| Parameter | Value |
|---|---|
| Parameters | 600 Million |
| Precision | Float16 |
| Vocabulary Size | 256,206 |
| Inference Engine | CTranslate2 (v4.0+) |
| Hardware Target | NVIDIA GPU (Tensor Core optimized) or modern CPU |
Usage and Implementation
To run this model, you must have the ctranslate2 and transformers libraries installed.
pip install ctranslate2 transformers sentencepiece
The following example demonstrates how to perform translation using the specialized Beam Search implementation in CTranslate2.
import ctranslate2
import transformers
# Load the model and tokenizer
model_path = "AlaminI/nllb-200-600M-ct2-float16"
tokenizer = transformers.AutoTokenizer.from_pretrained(model_path)
translator = ctranslate2.Translator(model_path, device="cpu") # or "cuda"
def translate(text, src_lang="eng_Latn", tgt_lang="hau_Latn"):
# Prepare the input with NLLB language tags
tokenizer.src_lang = src_lang
source = tokenizer.convert_ids_to_tokens(tokenizer.encode(text))
# Execute translation via C++ backend
results = translator.translate_batch(
[source],
target_prefix=[[tgt_lang]],
beam_size=4,
max_decoding_length=128,
repetition_penalty=1.2
)
# Post-process output
output_tokens = results[0].hypotheses[0]
if tgt_lang in output_tokens:
output_tokens.remove(tgt_lang)
return tokenizer.decode(tokenizer.convert_tokens_to_ids(output_tokens))
# Example execution
result = translate("The scientific method is a systematic way of learning about the world.")
print(f"Result: {result}")
- Downloads last month
- 22