Update README.md

022371e verified 27 days ago

1.4 kB

language:
  - en
  - ru
license: mit
tags:
  - causal-lm
  - text-generation
  - chatbot
  - experimental
model_type: gpt
datasets: ...
library_name: transformers

(discord https://discord.gg/DUzP7CXqJt , https://discord.gg/jzwR7jFfSB) Website: https://calmacatai.draklor.ru

License

This model is licensed under the MIT License.

CalmaCatLM-1.5-mini

🚧 Experimental Under-Training Model (~12M parameters) based on a custom 12-layer/12-head Transformer architecture.
Primarily supports English 🇬🇧. This is my third model.

📖 Description

CalmaCatLM is an experimental generative language model designed for text generation and dialogue tasks.
The main goal of this project is to test the full pipeline: from implementing the architecture and training from scratch to uploading models to the Hugging Face Hub.

⚙️ Model Details

Architecture: Custom Transformer Decoder (6 layers, 6 attention heads)
Model size: ~12M parameters #
Training Approach: Pre-trained from scratch on My dataset
Languages: Primarily Russian
License: MIT

🏋️ Training Details

Dataset: My
Hardware: Single AMD RX 7700 XT (12GB VRAM)
Training Status: Very early checkpoint (Under-trained)
Epochs: 100
Batch size: 32
Optimizer: AdamW, lr = 3e-4
Max sequence length: 128 tokens