bikmish
/

llm-course-hw1

Text Generation

Model card Files Files and versions

bikmish commited on Mar 15, 2025

Commit

510c501

·

verified ·

1 Parent(s): 993920c

Update README.md

Files changed (1) hide show

README.md +69 -5

README.md CHANGED Viewed

@@ -1,9 +1,73 @@
 ---
 tags:
-- model_hub_mixin
-- pytorch_model_hub_mixin
 ---
-This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
-- Library: [More Information Needed]
-- Docs: [More Information Needed]

 ---
+language:
+- ru
+license: apache-2.0
 tags:
+- pytorch
+- text-generation
+- transformer
+- russian
+- jokes
+datasets:
+- IgorVolochay/russian_jokes
+widget:
+  - text: "Why do programmers"
 ---
+# Russian Jokes Transformer Model
+A model for generating Russian jokes based on a modified Transformer architecture.
+## Model Features
+1. **Specialization**: trained on a dataset of Russian jokes (135k examples)
+2. **Tokenization**: Byte-Level BPE with a vocabulary size of 1024
+3. **Architecture Features**:
+   - ALiBi (Attention with Linear Biases) for positional encoding
+   - GQA (Grouped-Query Attention)
+   - SwiGLU in FFN layers
+   - RMSNorm instead of LayerNorm
+4. **Configurations**:
+   - Nano (3 layers, 4 heads, 96 hidden)
+   - Mini (6 layers, 6 heads, 384 hidden)
+   - Small (12 layers, 12 heads, 768 hidden)
+## Technical Specifications
+- **Context Window**: 128 tokens
+- **Special Tokens**: [EOS] for sequence end
+- **Average Token Length**: ~70 per example
+- **Regularization**: Dropout 0.1
+- **Optimizer**: AdamW with weight decay 0.01
+- **Training**: 10k steps with linear warmup
+## Usage
+```python
+REPO_NAME = 'bikmish/llm-course-hw1'
+device = torch.device("cuda")
+tokenizer = ByteLevelBPETokenizer.from_pretrained(REPO_NAME)
+check_model = TransformerForCausalLM.from_pretrained(REPO_NAME)
+check_model = check_model.to(device)
+check_model = check_model.eval()
+text = "Штирлиц пришел домой"
+input_ids = torch.tensor(tokenizer.encode(text), device=device)
+model_output = check_model.generate(
+    input_ids[None, :], max_new_tokens=200, eos_token_id=tokenizer.eos_token_id, do_sample=True, top_k=10
+)
+tokenizer.decode(model_output[0].tolist())
+```
+## Example of output (разрыв всего)
+```
+Штирлиц пришел домой с работы, приехал.
+Преподаватель к себе и вижу: - Давай зайдем сегодня на работу!
+- А как ты думаешь, что мы тебя не пьем?
+- Дык нет.
+- А ты что, тогда находишься?
+- А ты не знаешь - кто?
+- Дверь откроется!
+```