--- license: gpl-3.0 language: - az base_model: - Yusiko/Khazri tags: - aze - mini - yusiko --- # πŸŒͺ️Khazri β€” Azerbaijani Language Model **A lightweight, efficient, and fully custom Azerbaijani language model designed for text generation, chat applications, education, and research.** Khazri is trained from scratch using a custom 10M-sample Azerbaijani dataset and optimized for running on consumer GPUs while maintaining competitive performance. ## 🌟 Features - πŸ‡¦πŸ‡Ώ Native Azerbaijani language support - ⚑ Lightweight architecture (β‰ˆ 36M parameter) - πŸš€ Supports fast inference with GGUF + llama.cpp - πŸ“¦ Available on Hugging Face - 🎯 Optimized for chatbots, WebRTC real-time assistants, and low-latency deployment ## πŸ—οΈ Model Architecture | Version | Parameters | Type | Context Length | Notes | |--------|------------|------|----------------|-------| | Khazri-36M | ~36.6M | GPT-2 Small variant | 1024 | Higher quality | Architecture: - Transformer decoder-only - Multi-head self-attention - Rotary positional embeddings (RoPE) - GELU activation - Layer normalization - Tied embeddings ## πŸ“š Dataset Khazri is trained on a 10 million-sample Azerbaijani dataset including: - News, books, conversations, social media, web articles, educational content Preprocessing: - Unicode normalization, deduplication, tokenizer preprocessing, length filtering ## πŸ‹οΈ Training Details ### Hardware - NVIDIA RTX 3090 24GB - PyTorch 2.x + CUDA 12 - bf16 mixed precision ### Hyperparameters ``` epochs = 1 batch_size = 32 gradient_accumulation = 4 learning_rate = 3e-4 warmup_steps = 500 weight_decay = 0.1 sequence_length = 512 optimizer = AdamW precision = bf16 ``` ## πŸ“ˆ Training Challenges & Solutions ### Bottleneck: Memory Bandwidth Small models saturate VRAM bandwidth β†’ ~4.2 it/s Solution: shrink model size, adjust batch/accumulation, optimize data loading ### Transformers ```python from transformers import AutoModelForCausalLM, AutoTokenizer tok = AutoTokenizer.from_pretrained("Yusiko/Khazri") model = AutoModelForCausalLM.from_pretrained("Yusiko/Khazri") ``` ## 🌐 Hugging Face Available at: https://huggingface.co/Yusiko/Khazri ## πŸ“¦ License GPL 3.0 License ## 🌍 Future Plans - 1B+ model - Better tokenizer - Instruction-tuning - WebGPU inference - Community fine-tuning tools ## 🀝 Contact Created by **Yusiko** GitHub: [Yusiko99](https://github.com/Yusiko99) Website: https://yusi.xo.je Hugging Face: Yusiko