CalmaCatLM-1.5-mini / README.md
ViorikaAI's picture
Update README.md
022371e verified
---
language:
- en
- ru
license: mit
tags:
- causal-lm
- text-generation
- chatbot
- experimental
model_type: gpt
datasets:
...
library_name: transformers
---
(discord https://discord.gg/DUzP7CXqJt , https://discord.gg/jzwR7jFfSB)
Website: https://calmacatai.draklor.ru
## License
This model is licensed under the MIT License.
# CalmaCatLM-1.5-mini
🚧 **Experimental Under-Training Model** (~**12**M parameters) **based on a custom 12-layer/12-head Transformer architecture.**
**Primarily supports English** πŸ‡¬πŸ‡§. **This is my third model.**
## πŸ“– Description
CalmaCatLM is an **experimental generative language model** designed for text generation and dialogue tasks.
The main goal of this project is to test the full pipeline: **from implementing the architecture and training from scratch** to uploading models to the Hugging Face Hub.
## βš™οΈ Model Details
- **Architecture: Custom Transformer Decoder (6 layers, 6 attention heads)**
- **Model size: ~12M parameters** #
- **Training Approach: Pre-trained from scratch on My dataset**
- **Languages: Primarily Russian**
- **License: MIT**
## πŸ‹οΈ Training Details
- **Dataset:** `My`
- **Hardware:** **Single** AMD **RX 7700 XT** (12GB VRAM)
- **Training Status: Very early checkpoint (Under-trained)**
- **Epochs:** 100
- **Batch size:** 32
- **Optimizer:** AdamW, lr = 3e-4
- **Max sequence length:** 128 tokens