bikmish commited on
Commit
510c501
·
verified ·
1 Parent(s): 993920c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +69 -5
README.md CHANGED
@@ -1,9 +1,73 @@
1
  ---
 
 
 
2
  tags:
3
- - model_hub_mixin
4
- - pytorch_model_hub_mixin
 
 
 
 
 
 
 
5
  ---
6
 
7
- This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
8
- - Library: [More Information Needed]
9
- - Docs: [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - ru
4
+ license: apache-2.0
5
  tags:
6
+ - pytorch
7
+ - text-generation
8
+ - transformer
9
+ - russian
10
+ - jokes
11
+ datasets:
12
+ - IgorVolochay/russian_jokes
13
+ widget:
14
+ - text: "Why do programmers"
15
  ---
16
 
17
+ # Russian Jokes Transformer Model
18
+
19
+ A model for generating Russian jokes based on a modified Transformer architecture.
20
+
21
+ ## Model Features
22
+
23
+ 1. **Specialization**: trained on a dataset of Russian jokes (135k examples)
24
+ 2. **Tokenization**: Byte-Level BPE with a vocabulary size of 1024
25
+ 3. **Architecture Features**:
26
+ - ALiBi (Attention with Linear Biases) for positional encoding
27
+ - GQA (Grouped-Query Attention)
28
+ - SwiGLU in FFN layers
29
+ - RMSNorm instead of LayerNorm
30
+ 4. **Configurations**:
31
+ - Nano (3 layers, 4 heads, 96 hidden)
32
+ - Mini (6 layers, 6 heads, 384 hidden)
33
+ - Small (12 layers, 12 heads, 768 hidden)
34
+
35
+ ## Technical Specifications
36
+
37
+ - **Context Window**: 128 tokens
38
+ - **Special Tokens**: [EOS] for sequence end
39
+ - **Average Token Length**: ~70 per example
40
+ - **Regularization**: Dropout 0.1
41
+ - **Optimizer**: AdamW with weight decay 0.01
42
+ - **Training**: 10k steps with linear warmup
43
+
44
+ ## Usage
45
+
46
+ ```python
47
+
48
+ REPO_NAME = 'bikmish/llm-course-hw1'
49
+ device = torch.device("cuda")
50
+
51
+ tokenizer = ByteLevelBPETokenizer.from_pretrained(REPO_NAME)
52
+ check_model = TransformerForCausalLM.from_pretrained(REPO_NAME)
53
+ check_model = check_model.to(device)
54
+ check_model = check_model.eval()
55
+
56
+ text = "Штирлиц пришел домой"
57
+ input_ids = torch.tensor(tokenizer.encode(text), device=device)
58
+ model_output = check_model.generate(
59
+ input_ids[None, :], max_new_tokens=200, eos_token_id=tokenizer.eos_token_id, do_sample=True, top_k=10
60
+ )
61
+ tokenizer.decode(model_output[0].tolist())
62
+ ```
63
+
64
+ ## Example of output (разрыв всего)
65
+ ```
66
+ Штирлиц пришел домой с работы, приехал.
67
+ Преподаватель к себе и вижу: - Давай зайдем сегодня на работу!
68
+ - А как ты думаешь, что мы тебя не пьем?
69
+ - Дык нет.
70
+ - А ты что, тогда находишься?
71
+ - А ты не знаешь - кто?
72
+ - Дверь откроется!
73
+ ```