| | --- |
| | language: |
| | - en |
| | - ru |
| | license: mit |
| | tags: |
| | - causal-lm |
| | - text-generation |
| | - chatbot |
| | - experimental |
| | model_type: gpt |
| | datasets: |
| | ... |
| | library_name: transformers |
| | --- |
| | |
| | (discord https://discord.gg/DUzP7CXqJt , https://discord.gg/jzwR7jFfSB) |
| | Website: https://calmacatai.draklor.ru |
| |
|
| | ## License |
| |
|
| | This model is licensed under the MIT License. |
| |
|
| |
|
| | # CalmaCatLM-1.5-mini |
| |
|
| | π§ **Experimental Under-Training Model** (~**12**M parameters) **based on a custom 12-layer/12-head Transformer architecture.** |
| | **Primarily supports English** π¬π§. **This is my third model.** |
| |
|
| | ## π Description |
| | CalmaCatLM is an **experimental generative language model** designed for text generation and dialogue tasks. |
| | The main goal of this project is to test the full pipeline: **from implementing the architecture and training from scratch** to uploading models to the Hugging Face Hub. |
| |
|
| | ## βοΈ Model Details |
| | - **Architecture: Custom Transformer Decoder (6 layers, 6 attention heads)** |
| | - **Model size: ~12M parameters** # |
| | - **Training Approach: Pre-trained from scratch on My dataset** |
| | - **Languages: Primarily Russian** |
| | - **License: MIT** |
| |
|
| | ## ποΈ Training Details |
| | - **Dataset:** `My` |
| | - **Hardware:** **Single** AMD **RX 7700 XT** (12GB VRAM) |
| | - **Training Status: Very early checkpoint (Under-trained)** |
| | - **Epochs:** 100 |
| | - **Batch size:** 32 |
| | - **Optimizer:** AdamW, lr = 3e-4 |
| | - **Max sequence length:** 128 tokens |