πŸŒͺ️Khazri β€” Azerbaijani Language Model

A lightweight, efficient, and fully custom Azerbaijani language model designed for text generation, chat applications, education, and research. Khazri is trained from scratch using a custom 10M-sample Azerbaijani dataset and optimized for running on consumer GPUs while maintaining competitive performance.

🌟 Features

  • πŸ‡¦πŸ‡Ώ Native Azerbaijani language support
  • ⚑ Lightweight architecture (β‰ˆ 36M parameter)
  • πŸš€ Supports fast inference with GGUF + llama.cpp
  • πŸ“¦ Available on Hugging Face
  • 🎯 Optimized for chatbots, WebRTC real-time assistants, and low-latency deployment

πŸ—οΈ Model Architecture

Version Parameters Type Context Length Notes
Khazri-36M ~36.6M GPT-2 Small variant 1024 Higher quality

Architecture:

  • Transformer decoder-only
  • Multi-head self-attention
  • Rotary positional embeddings (RoPE)
  • GELU activation
  • Layer normalization
  • Tied embeddings

πŸ“š Dataset

Khazri is trained on a 10 million-sample Azerbaijani dataset including:

  • News, books, conversations, social media, web articles, educational content

Preprocessing:

  • Unicode normalization, deduplication, tokenizer preprocessing, length filtering

πŸ‹οΈ Training Details

Hardware

  • NVIDIA RTX 3090 24GB
  • PyTorch 2.x + CUDA 12
  • bf16 mixed precision

Hyperparameters

epochs = 1
batch_size = 32
gradient_accumulation = 4
learning_rate = 3e-4
warmup_steps = 500
weight_decay = 0.1
sequence_length = 512
optimizer = AdamW
precision = bf16

πŸ“ˆ Training Challenges & Solutions

Bottleneck: Memory Bandwidth

Small models saturate VRAM bandwidth β†’ ~4.2 it/s
Solution: shrink model size, adjust batch/accumulation, optimize data loading

Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
tok = AutoTokenizer.from_pretrained("Yusiko/Khazri")
model = AutoModelForCausalLM.from_pretrained("Yusiko/Khazri")

🌐 Hugging Face

Available at: https://huggingface.co/Yusiko/Khazri

πŸ“¦ License

GPL 3.0 License

🌍 Future Plans

  • 1B+ model
  • Better tokenizer
  • Instruction-tuning
  • WebGPU inference
  • Community fine-tuning tools

🀝 Contact

Created by Yusiko
GitHub: Yusiko99
Website: https://yusi.xo.je
Hugging Face: Yusiko

Downloads last month
38
Safetensors
Model size
47.7M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Yusiko/Khazri

Unable to build the model tree, the base model loops to the model itself. Learn more.

Collection including Yusiko/Khazri