--- language: - en - ru license: mit tags: - causal-lm - text-generation - chatbot - experimental model_type: gpt datasets: ... library_name: transformers --- (discord https://discord.gg/DUzP7CXqJt , https://discord.gg/jzwR7jFfSB) Website: https://calmacatai.draklor.ru ## License This model is licensed under the MIT License. # CalmaCatLM-1.5-mini 🚧 **Experimental Under-Training Model** (~**12**M parameters) **based on a custom 12-layer/12-head Transformer architecture.** **Primarily supports English** 🇬🇧. **This is my third model.** ## 📖 Description CalmaCatLM is an **experimental generative language model** designed for text generation and dialogue tasks. The main goal of this project is to test the full pipeline: **from implementing the architecture and training from scratch** to uploading models to the Hugging Face Hub. ## ⚙️ Model Details - **Architecture: Custom Transformer Decoder (6 layers, 6 attention heads)** - **Model size: ~12M parameters** # - **Training Approach: Pre-trained from scratch on My dataset** - **Languages: Primarily Russian** - **License: MIT** ## 🏋️ Training Details - **Dataset:** `My` - **Hardware:** **Single** AMD **RX 7700 XT** (12GB VRAM) - **Training Status: Very early checkpoint (Under-trained)** - **Epochs:** 100 - **Batch size:** 32 - **Optimizer:** AdamW, lr = 3e-4 - **Max sequence length:** 128 tokens