File size: 1,845 Bytes
063f973
 
2e3f5fb
 
 
 
 
 
 
063f973
 
2e3f5fb
063f973
2e3f5fb
063f973
2e3f5fb
 
 
 
 
063f973
2e3f5fb
 
 
 
 
 
063f973
2e3f5fb
063f973
2e3f5fb
063f973
2e3f5fb
 
063f973
2e3f5fb
 
063f973
2e3f5fb
 
063f973
2e3f5fb
 
063f973
2e3f5fb
 
063f973
2e3f5fb
 
 
 
 
 
063f973
2e3f5fb
 
063f973
2e3f5fb
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
---
library_name: transformers
license: cc-by-nc-4.0
---
---
license: cc-by-nc-4.0
tags:
- translation
- nllb
---

# My NLLB-200 Translator

This repository contains a copy of Meta's (Facebook) **NLLB-200-distilled-600M** model. It has been cloned here for custom personal access and application deployment.

### 🌟 Model Details
- **Original Developer:** Meta AI (Facebook)
- **Model Type:** Seq2Seq Language Model (Machine Translation)
- **Model Size:** 600 Million parameters
- **License:** CC-BY-NC-4.0 (Non-commercial use only)

### 🌍 Language Support
This model supports direct translation between 200+ languages. For example:
- English: `eng_Latn`
- Telugu: `tel_Telu`
- Hindi: `hin_Deva`
- French: `fra_Latn`

### 🚀 How to Get Started

You can use this model directly with the Hugging Face `transformers` library:

```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

# Replace with your actual repository path
model_name = "YOUR_USERNAME/YOUR_REPO_NAME"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

# Set source language
tokenizer.src_lang = "eng_Latn"

text = "Hello, how are you today?"
inputs = tokenizer(text, return_tensors="pt")

# Target translation (Example: Telugu)
translated_tokens = model.generate(
    **inputs,
    forced_bos_token_id=tokenizer.convert_tokens_to_ids("tel_Telu"),
    max_length=50
)

output = tokenizer.batch_decode(translated_tokens, skip_special_tokens=True)[0]
print("Translation:", output)

## Citation
@article{nllbteam2022neglected,
  title={No Language Left Behind: Scaling Human-Centered Machine Translation},
  author={NLLB Team and Marta R. Costa-jussà and James Cross and Onur Çelebi and Maha Elbayad and Kenneth Heafield and others},
  journal={arXiv preprint arXiv:2207.04672},
  year={2022}
}