Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,37 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
datasets:
|
| 4 |
+
- IgorVolochay/russian_jokes
|
| 5 |
+
language:
|
| 6 |
+
- ru
|
| 7 |
+
---
|
| 8 |
+
Маленькая LLM для генерации несмешных шуток (пока что). Обучена на датасете [RussianJokes](https://huggingface.co/datasets/IgorVolochay/russian_jokes). Создана в рамках учебного проекта VK education.
|
| 9 |
+
|
| 10 |
+
# Архитектура:
|
| 11 |
+
10.55M параметров, SwiGLU, GQA, ALiBi, byte-level BPE
|
| 12 |
+
- n_layer=6
|
| 13 |
+
- n_head=6
|
| 14 |
+
- n_kv_head=3
|
| 15 |
+
- hidden_dim=384
|
| 16 |
+
- intermediate_dim=1024
|
| 17 |
+
|
| 18 |
+
# Как использовать
|
| 19 |
+
```
|
| 20 |
+
device = torch.device("cuda")
|
| 21 |
+
|
| 22 |
+
tokenizer = ByteLevelBPETokenizer.from_pretrained(REPO_NAME)
|
| 23 |
+
check_model = TransformerForCausalLM.from_pretrained(REPO_NAME)
|
| 24 |
+
check_model = check_model.to(device)
|
| 25 |
+
check_model = check_model.eval()
|
| 26 |
+
|
| 27 |
+
text = "Штирлиц пришел домой"
|
| 28 |
+
input_ids = torch.tensor(tokenizer.encode(text), device=device)
|
| 29 |
+
model_output = check_model.generate(
|
| 30 |
+
input_ids[None, :], max_new_tokens=200, eos_token_id=tokenizer.eos_token_id, do_sample=True, top_k=10
|
| 31 |
+
)
|
| 32 |
+
tokenizer.decode(model_output[0].tolist())
|
| 33 |
+
```
|
| 34 |
+
Output:
|
| 35 |
+
```
|
| 36 |
+
Штирлиц пришел домой к врачу и видит, что он пришел с ней.
|
| 37 |
+
```
|