CalmaCatLM-1.5-mini / README.md
ViorikaAI's picture
Update README.md
022371e verified
metadata
language:
  - en
  - ru
license: mit
tags:
  - causal-lm
  - text-generation
  - chatbot
  - experimental
model_type: gpt
datasets: ...
library_name: transformers

(discord https://discord.gg/DUzP7CXqJt , https://discord.gg/jzwR7jFfSB) Website: https://calmacatai.draklor.ru

License

This model is licensed under the MIT License.

CalmaCatLM-1.5-mini

🚧 Experimental Under-Training Model (~12M parameters) based on a custom 12-layer/12-head Transformer architecture.
Primarily supports English πŸ‡¬πŸ‡§. This is my third model.

πŸ“– Description

CalmaCatLM is an experimental generative language model designed for text generation and dialogue tasks.
The main goal of this project is to test the full pipeline: from implementing the architecture and training from scratch to uploading models to the Hugging Face Hub.

βš™οΈ Model Details

  • Architecture: Custom Transformer Decoder (6 layers, 6 attention heads)
  • Model size: ~12M parameters #
  • Training Approach: Pre-trained from scratch on My dataset
  • Languages: Primarily Russian
  • License: MIT

πŸ‹οΈ Training Details

  • Dataset: My
  • Hardware: Single AMD RX 7700 XT (12GB VRAM)
  • Training Status: Very early checkpoint (Under-trained)
  • Epochs: 100
  • Batch size: 32
  • Optimizer: AdamW, lr = 3e-4
  • Max sequence length: 128 tokens