suronek commited on
Commit
c91cd92
·
verified ·
1 Parent(s): 00d9731

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +41 -3
README.md CHANGED
@@ -1,3 +1,41 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - ru
5
+ - sah
6
+ datasets:
7
+ - lab-ii/yakut-translate
8
+ base_model:
9
+ - google/mt5-base
10
+ library_name: transformers
11
+ ---
12
+ ### Model Prefixes
13
+ `"translate Russian to Sakha: "` - Ru-sah
14
+ `"translate Sakha to Russian: "` - sah-Ru
15
+
16
+ ## How to Get Started with the Model
17
+
18
+ ```python
19
+ from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
20
+
21
+ model = AutoModelForSeq2SeqLM.from_pretrained("lab-ii/mt5-yakut")
22
+ tokenizer = AutoTokenizer.from_pretrained("lab-ii/mt5-yakut")
23
+
24
+ def predict(text, prefix, a=32, b=3, max_input_length=1024, num_beams=3, **kwargs):
25
+ inputs = tokenizer(prefix + text, return_tensors='pt', padding=True, truncation=True, max_length=max_input_length)
26
+ result = model.generate(
27
+ **inputs.to(model.device),
28
+ max_new_tokens=int(a + b * inputs.input_ids.shape[1]),
29
+ num_beams=num_beams,
30
+ **kwargs
31
+ )
32
+ return tokenizer.batch_decode(result, skip_special_tokens=True)
33
+
34
+ sentence: str = "Фотограф опубликовал снимки с прошедшего феста."
35
+
36
+ translation = predict(sentence, prefix="translate Russian to Sakha: ")
37
+
38
+ print(translation)
39
+
40
+ # ['Бэрэограф ааспыт фесттан хаартыскалары ыытан көрдөрбүт.']
41
+ ```