---
language:
- en
- ru
license: mit
tags:
- causal-lm
- text-generation
- chatbot
- experimental
model_type: gpt
datasets:
    ...
library_name: transformers
---

(discord https://discord.gg/DUzP7CXqJt , https://discord.gg/jzwR7jFfSB)
Website: https://calmacatai.draklor.ru

## License

This model is licensed under the MIT License.


# CalmaCatLM-1.5-mini

🚧 **Experimental Under-Training Model** (~**12**M parameters) **based on a custom 12-layer/12-head Transformer architecture.**  
**Primarily supports English** 🇬🇧. **This is my third model.**  

## 📖 Description
CalmaCatLM is an **experimental generative language model** designed for text generation and dialogue tasks.  
The main goal of this project is to test the full pipeline: **from implementing the architecture and training from scratch** to uploading models to the Hugging Face Hub.  

## ⚙️ Model Details
- **Architecture: Custom Transformer Decoder (6 layers, 6 attention heads)**
- **Model size: ~12M parameters**  #
- **Training Approach: Pre-trained from scratch on My dataset**
- **Languages: Primarily Russian**
- **License: MIT** 

## 🏋️ Training Details
- **Dataset:** `My`
- **Hardware:** **Single** AMD **RX 7700 XT** (12GB VRAM)
- **Training Status: Very early checkpoint (Under-trained)**
- **Epochs:** 100
- **Batch size:** 32
- **Optimizer:** AdamW, lr = 3e-4
- **Max sequence length:** 128 tokens