sleepygargoyle commited on
Commit
bf4029c
·
verified ·
1 Parent(s): db2a051

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +31 -1
README.md CHANGED
@@ -15,4 +15,34 @@ library_name: transformers
15
  This is fine tuned NLLB-200 model for Chechen-Russian translation, presented in paper [The first open machine translation system for the Chechen language](https://www.arxiv.org/abs/2507.12672).
16
 
17
 
18
- Language token for the Chechen language is `ce_Cyrl`.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  This is fine tuned NLLB-200 model for Chechen-Russian translation, presented in paper [The first open machine translation system for the Chechen language](https://www.arxiv.org/abs/2507.12672).
16
 
17
 
18
+ The language token for the Chechen language is `ce_Cyrl`, while for all the other languages included in NLLB-200, the tokens are composed of three letters (i.e. rus_Cyrl for Russian).
19
+
20
+ Here is an example of how the model can be used in the code:
21
+
22
+ ```python
23
+ import torch
24
+ from transformers import AutoModelForSeq2SeqLM
25
+ from transformers import NllbTokenizer
26
+
27
+ model_nllb = AutoModelForSeq2SeqLM.from_pretrained('NM-development/nllb-ce-rus-v0').cuda()
28
+ tokenizer_nllb = NllbTokenizer.from_pretrained('NM-development/nllb-ce-rus-v0')
29
+
30
+ def translate(text, model, tokenizer, src_lang='rus_Cyrl', tgt_lang='eng_Latn', a=16, b=1.5, max_input_length=1024, **kwargs):
31
+ model.eval()
32
+ with torch.no_grad():
33
+ tokenizer.src_lang = src_lang
34
+ tokenizer.tgt_lang = tgt_lang
35
+ inputs = tokenizer(text, return_tensors='pt', padding=True, truncation=True, max_length=max_input_length)
36
+ result = model.generate(
37
+ **inputs.to(model.device),
38
+ forced_bos_token_id=tokenizer.convert_tokens_to_ids(tgt_lang),
39
+ max_new_tokens=int(a + b * inputs.input_ids.shape[1]),
40
+ **kwargs
41
+ )
42
+ return tokenizer.batch_decode(result, skip_special_tokens=True)
43
+
44
+
45
+ text = "Стигална кӀел къахьоьгуш, ша мел динчу хӀуманах буьсун болу хӀун пайда оьцу адамо?"
46
+ translate(text, model_nllb, tokenizer_nllb, src_lang='ce_Cyrl', tgt_lang='rus_Cyrl')[0]
47
+ # 'Что пользы человеку от того, что он трудился под солнцем и что сделал?'
48
+ ```