IlyaGusev commited on
Commit
d771c97
·
1 Parent(s): ed8f723

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +61 -1
README.md CHANGED
@@ -8,4 +8,64 @@ license: apache-2.0
8
 
9
  # RuBertTelegramHeadlines
10
 
11
- Dataset: https://www.dropbox.com/s/ykqk49a8avlmnaf/ru_all_split.tar.gz
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
 
9
  # RuBertTelegramHeadlines
10
 
11
+
12
+ ## Model description
13
+
14
+ Example model for [Headline generation competition](https://competitions.codalab.org/competitions/29905)
15
+
16
+ ## Intended uses & limitations
17
+
18
+ #### How to use
19
+
20
+ ```python
21
+
22
+ model_name = "IlyaGusev/rubert_telegram_headlines"
23
+
24
+ from transformers import AutoTokenizer, EncoderDecoderModel
25
+
26
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
27
+
28
+ hg_model = EncoderDecoderModel.from_pretrained(model_name)
29
+
30
+ article_text = "..."
31
+
32
+ input_ids = tokenizer.prepare_seq2seq_batch(
33
+
34
+ [article_text],
35
+
36
+ return_tensors="pt",
37
+
38
+ padding="max_length",
39
+
40
+ truncation=True,
41
+
42
+ max_length=256
43
+
44
+ )["input_ids"]
45
+
46
+ output_ids = hg_model.generate(
47
+
48
+ input_ids=input_ids,
49
+
50
+ max_length=64,
51
+
52
+ no_repeat_ngram_size=3,
53
+
54
+ num_beams=10,
55
+
56
+ top_p=0.95
57
+ )
58
+
59
+ headline = tokenizer.decode(output_ids[0], skip_special_tokens=True, clean_up_tokenization_spaces=True)
60
+
61
+ print(headline)
62
+
63
+ ```
64
+
65
+ ## Training data
66
+
67
+ - Dataset: [ru_all_split.tar.gz](https://www.dropbox.com/s/ykqk49a8avlmnaf/ru_all_split.tar.gz)
68
+
69
+ ## Training procedure
70
+
71
+ TBA