Instructions to use LocalDoc/nllb-3.3b-en-az with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use LocalDoc/nllb-3.3b-en-az with Transformers:
# Use a pipeline as a high-level helper # Warning: Pipeline type "translation" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("translation", model="LocalDoc/nllb-3.3b-en-az")# Load model directly from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("LocalDoc/nllb-3.3b-en-az") model = AutoModelForSeq2SeqLM.from_pretrained("LocalDoc/nllb-3.3b-en-az") - Notebooks
- Google Colab
- Kaggle
NLLB-3.3B English→Azerbaijani
A fine-tuned NLLB-200-3.3B model for English→Azerbaijani translation. It was trained with LoRA on a ~1.4M-pair corpus combining high-quality distilled translations across multiple domains (formal/encyclopedic, instruction-style, dialogue, and conversational registers), then merged into a self-contained model.
On a 1,012-sentence FLORES-based benchmark it outperforms Google Translate and several commercial LLM APIs on English→Azerbaijani, and approaches the strongest proprietary models — at a fraction of their size and cost.
Benchmark Results
Evaluated on LocalDoc/en_az_translate_benchmark (1,012 sentences, EN→AZ). All metrics are reference-based: chrF++ (word_order=2), BLEU (sacreBLEU), and COMET-DA (Unbabel/wmt22-comet-da). Higher is better.
| Model | chrF++ | BLEU | COMET-DA |
|---|---|---|---|
| GPT-5.4-mini | 70.08 | 45.61 | 92.86 |
| Gemini-2.5-flash | 69.61 | 45.71 | 92.70 |
| This model (NLLB-3.3B EN→AZ) | 69.30 | 44.42 | 92.70 |
| DeepSeek-V4-Pro | 68.67 | 43.88 | 92.78 |
| DeepSeek-V4-Flash | 67.96 | 42.82 | 92.58 |
| GPT-4.1 | 67.76 | 43.03 | 92.71 |
| Google Translate | 66.90 | 41.64 | 92.37 |
| Gemma-4-31B-it | 66.22 | 40.46 | 92.40 |
| GPT-5.4-nano | 62.10 | 33.87 | 91.41 |
| Qwen3.6-35B-A3B | 60.39 | 33.57 | 91.23 |
| NLLB-200-3.3B (base, zero-shot) | 59.03 | 31.76 | 89.86 |
Fine-tuning improved the base NLLB-3.3B by +10.3 chrF++ (59.03 → 69.30) and +2.84 COMET-DA (89.86 → 92.70). The result surpasses Google Translate, DeepSeek-V4-Pro, GPT-4.1, Gemma-3-31B, and Qwen-35B, and comes within 0.8 chrF++ of GPT-5.4-mini and Gemini-2.5-flash.
Example Translations
A few cases where this model produces more accurate or more natural Azerbaijani than the base NLLB-3.3B and/or Google Translate. Full sentences shown.
EN: No worries, take your time. There's really no rush at all. — idiomatic, not literal
- This model: Narahat olmayın, tələsməyin. Həqiqətən tələsmək lazım deyil.
- Google: Narahat olmayın, tələsməyin. Əslində heç bir tələskənlik yoxdur. — literal and stilted
EN: Explain like I'm five: how does the internet actually work? — natural idiom vs. base
- This model: Beş yaşım varmış kimi izah edin: internet əslində necə işləyir?
- Base NLLB-3.3B: Beş yaşında kimi izah edin: İnternet həqiqətən necə işləyir? — Beş yaşında kimi is ungrammatical
Usage
import torch
from transformers import AutoModelForSeq2SeqLM, NllbTokenizer
MODEL = "LocalDoc/nllb-3.3b-en-az"
SRC, TGT = "eng_Latn", "azj_Latn"
tokenizer = NllbTokenizer.from_pretrained(MODEL, src_lang=SRC, tgt_lang=TGT)
model = AutoModelForSeq2SeqLM.from_pretrained(MODEL, torch_dtype=torch.bfloat16).eval()
bos = tokenizer.convert_tokens_to_ids(TGT)
def translate(texts, num_beams=4, max_length=256):
if isinstance(texts, str):
texts = [texts]
tokenizer.src_lang = SRC
enc = tokenizer(texts, return_tensors="pt", padding=True,
truncation=True, max_length=max_length)
with torch.no_grad():
gen = model.generate(**enc, forced_bos_token_id=bos,
num_beams=num_beams, max_length=max_length)
return tokenizer.batch_decode(gen, skip_special_tokens=True)
print(translate("The agreement is expected to be signed by the end of the month."))
For best results, translate one sentence at a time (the model is sentence-level). Split long texts into sentences before translating.
Training Details
- Base model: facebook/nllb-200-3.3B
- Method: LoRA (r=32, alpha=64) on attention and FFN projections, then merged
- Training data: ~1.4M EN→AZ pairs, distilled and filtered across domains:
- Formal / encyclopedic / news
- Instruction-style (assistant tasks, Q&A)
- Dialogue and conversational speech
- Direction: English → Azerbaijani (direct, no pivot language)
- Sequence length: 256 tokens
Limitations
- Sentence-level: translate sentence by sentence; long documents should be split first.
- Direction: trained for English→Azerbaijani only.
- Rare lexical gaps: very specialized vocabulary (e.g. exotic culinary terms) may occasionally be less precise than large general-purpose systems.
- Latin script: outputs standard literary Azerbaijani in Latin script (
azj_Latn).
Citation
If you use this model, please cite the LocalDoc organization on Hugging Face.
- Downloads last month
- 38
Model tree for LocalDoc/nllb-3.3b-en-az
Base model
facebook/nllb-200-3.3B