Update README.md
Browse files
README.md
CHANGED
|
@@ -15,4 +15,34 @@ library_name: transformers
|
|
| 15 |
This is fine tuned NLLB-200 model for Chechen-Russian translation, presented in paper [The first open machine translation system for the Chechen language](https://www.arxiv.org/abs/2507.12672).
|
| 16 |
|
| 17 |
|
| 18 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
This is fine tuned NLLB-200 model for Chechen-Russian translation, presented in paper [The first open machine translation system for the Chechen language](https://www.arxiv.org/abs/2507.12672).
|
| 16 |
|
| 17 |
|
| 18 |
+
The language token for the Chechen language is `ce_Cyrl`, while for all the other languages included in NLLB-200, the tokens are composed of three letters (i.e. rus_Cyrl for Russian).
|
| 19 |
+
|
| 20 |
+
Here is an example of how the model can be used in the code:
|
| 21 |
+
|
| 22 |
+
```python
|
| 23 |
+
import torch
|
| 24 |
+
from transformers import AutoModelForSeq2SeqLM
|
| 25 |
+
from transformers import NllbTokenizer
|
| 26 |
+
|
| 27 |
+
model_nllb = AutoModelForSeq2SeqLM.from_pretrained('NM-development/nllb-ce-rus-v0').cuda()
|
| 28 |
+
tokenizer_nllb = NllbTokenizer.from_pretrained('NM-development/nllb-ce-rus-v0')
|
| 29 |
+
|
| 30 |
+
def translate(text, model, tokenizer, src_lang='rus_Cyrl', tgt_lang='eng_Latn', a=16, b=1.5, max_input_length=1024, **kwargs):
|
| 31 |
+
model.eval()
|
| 32 |
+
with torch.no_grad():
|
| 33 |
+
tokenizer.src_lang = src_lang
|
| 34 |
+
tokenizer.tgt_lang = tgt_lang
|
| 35 |
+
inputs = tokenizer(text, return_tensors='pt', padding=True, truncation=True, max_length=max_input_length)
|
| 36 |
+
result = model.generate(
|
| 37 |
+
**inputs.to(model.device),
|
| 38 |
+
forced_bos_token_id=tokenizer.convert_tokens_to_ids(tgt_lang),
|
| 39 |
+
max_new_tokens=int(a + b * inputs.input_ids.shape[1]),
|
| 40 |
+
**kwargs
|
| 41 |
+
)
|
| 42 |
+
return tokenizer.batch_decode(result, skip_special_tokens=True)
|
| 43 |
+
|
| 44 |
+
|
| 45 |
+
text = "Стигална кӀел къахьоьгуш, ша мел динчу хӀуманах буьсун болу хӀун пайда оьцу адамо?"
|
| 46 |
+
translate(text, model_nllb, tokenizer_nllb, src_lang='ce_Cyrl', tgt_lang='rus_Cyrl')[0]
|
| 47 |
+
# 'Что пользы человеку от того, что он трудился под солнцем и что сделал?'
|
| 48 |
+
```
|