NLLB-200-Distilled-600M: CTranslate2 Optimized (Float16)

This repository contains a converted and quantized version of the NLLB-200-Distilled-600M model, optimized for high-performance inference using the CTranslate2 engine.

Model Description

Original Model: facebook/nllb-200-distilled-600M
Architecture: Encoder-Decoder Transformer (Dense)
Quantization: Float16 (Half-Precision)
Format: CTranslate2 Binary Serialization

Optimization Details

The model has been converted to the CTranslate2 format, which significantly reduces the Python overhead during inference by utilizing a custom C++ Graph Execution engine. By using Float16 precision, the model size is reduced by 50% (to **1.2 GB**) with negligible loss in BLEU/ChrF++ scores compared to the original FP32 weights.

Technical Specifications

Parameter	Value
Parameters	600 Million
Precision	Float16
Vocabulary Size	256,206
Inference Engine	CTranslate2 (v4.0+)
Hardware Target	NVIDIA GPU (Tensor Core optimized) or modern CPU

Usage and Implementation

To run this model, you must have the ctranslate2 and transformers libraries installed.

pip install ctranslate2 transformers sentencepiece

The following example demonstrates how to perform translation using the specialized Beam Search implementation in CTranslate2.

import ctranslate2
import transformers

# Load the model and tokenizer
model_path = "AlaminI/nllb-200-600M-ct2-float16"
tokenizer = transformers.AutoTokenizer.from_pretrained(model_path)
translator = ctranslate2.Translator(model_path, device="cpu") # or "cuda"

def translate(text, src_lang="eng_Latn", tgt_lang="hau_Latn"):
    # Prepare the input with NLLB language tags
    tokenizer.src_lang = src_lang
    source = tokenizer.convert_ids_to_tokens(tokenizer.encode(text))
    
    # Execute translation via C++ backend
    results = translator.translate_batch(
        [source],
        target_prefix=[[tgt_lang]],
        beam_size=4,
        max_decoding_length=128,
        repetition_penalty=1.2
    )
    
    # Post-process output
    output_tokens = results[0].hypotheses[0]
    if tgt_lang in output_tokens:
        output_tokens.remove(tgt_lang)
    
    return tokenizer.decode(tokenizer.convert_tokens_to_ids(output_tokens))

# Example execution
result = translate("The scientific method is a systematic way of learning about the world.")
print(f"Result: {result}")

Downloads last month: 22