Khazri / README.md
Yusiko's picture
Update README.md
061607c verified
---
license: gpl-3.0
language:
- az
base_model:
- Yusiko/Khazri
tags:
- aze
- mini
- yusiko
---
# πŸŒͺ️Khazri β€” Azerbaijani Language Model
**A lightweight, efficient, and fully custom Azerbaijani language model designed for text generation, chat applications, education, and research.**
Khazri is trained from scratch using a custom 10M-sample Azerbaijani dataset and optimized for running on consumer GPUs while maintaining competitive performance.
## 🌟 Features
- πŸ‡¦πŸ‡Ώ Native Azerbaijani language support
- ⚑ Lightweight architecture (β‰ˆ 36M parameter)
- πŸš€ Supports fast inference with GGUF + llama.cpp
- πŸ“¦ Available on Hugging Face
- 🎯 Optimized for chatbots, WebRTC real-time assistants, and low-latency deployment
## πŸ—οΈ Model Architecture
| Version | Parameters | Type | Context Length | Notes |
|--------|------------|------|----------------|-------|
| Khazri-36M | ~36.6M | GPT-2 Small variant | 1024 | Higher quality |
Architecture:
- Transformer decoder-only
- Multi-head self-attention
- Rotary positional embeddings (RoPE)
- GELU activation
- Layer normalization
- Tied embeddings
## πŸ“š Dataset
Khazri is trained on a 10 million-sample Azerbaijani dataset including:
- News, books, conversations, social media, web articles, educational content
Preprocessing:
- Unicode normalization, deduplication, tokenizer preprocessing, length filtering
## πŸ‹οΈ Training Details
### Hardware
- NVIDIA RTX 3090 24GB
- PyTorch 2.x + CUDA 12
- bf16 mixed precision
### Hyperparameters
```
epochs = 1
batch_size = 32
gradient_accumulation = 4
learning_rate = 3e-4
warmup_steps = 500
weight_decay = 0.1
sequence_length = 512
optimizer = AdamW
precision = bf16
```
## πŸ“ˆ Training Challenges & Solutions
### Bottleneck: Memory Bandwidth
Small models saturate VRAM bandwidth β†’ ~4.2 it/s
Solution: shrink model size, adjust batch/accumulation, optimize data loading
### Transformers
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
tok = AutoTokenizer.from_pretrained("Yusiko/Khazri")
model = AutoModelForCausalLM.from_pretrained("Yusiko/Khazri")
```
## 🌐 Hugging Face
Available at: https://huggingface.co/Yusiko/Khazri
## πŸ“¦ License
GPL 3.0 License
## 🌍 Future Plans
- 1B+ model
- Better tokenizer
- Instruction-tuning
- WebGPU inference
- Community fine-tuning tools
## 🀝 Contact
Created by **Yusiko**
GitHub: [Yusiko99](https://github.com/Yusiko99)
Website: https://yusi.xo.je
Hugging Face: Yusiko