File size: 2,484 Bytes
e7a65ae
 
 
 
 
061607c
e7a65ae
 
 
 
009e290
 
 
061607c
009e290
061607c
009e290
 
 
 
 
 
 
 
 
 
 
061607c
009e290
 
 
 
 
 
 
 
 
 
061607c
009e290
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
061607c
 
009e290
 
 
061607c
009e290
 
 
 
 
 
 
 
 
 
 
 
 
67e8ee5
009e290
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
---
license: gpl-3.0
language:
- az
base_model:
- Yusiko/Khazri
tags:
- aze
- mini
- yusiko
---


# πŸŒͺ️Khazri β€” Azerbaijani Language Model
**A lightweight, efficient, and fully custom Azerbaijani language model designed for text generation, chat applications, education, and research.**
Khazri is trained from scratch using a custom 10M-sample Azerbaijani dataset and optimized for running on consumer GPUs while maintaining competitive performance.

## 🌟 Features
- πŸ‡¦πŸ‡Ώ Native Azerbaijani language support
- ⚑ Lightweight architecture (β‰ˆ 36M parameter)
- πŸš€ Supports fast inference with GGUF + llama.cpp
- πŸ“¦ Available on Hugging Face
- 🎯 Optimized for chatbots, WebRTC real-time assistants, and low-latency deployment

## πŸ—οΈ Model Architecture
| Version | Parameters | Type | Context Length | Notes |
|--------|------------|------|----------------|-------|
| Khazri-36M | ~36.6M | GPT-2 Small variant | 1024 | Higher quality |

Architecture:
- Transformer decoder-only
- Multi-head self-attention
- Rotary positional embeddings (RoPE)
- GELU activation
- Layer normalization
- Tied embeddings

## πŸ“š Dataset
Khazri is trained on a 10 million-sample Azerbaijani dataset including:
- News, books, conversations, social media, web articles, educational content

Preprocessing:
- Unicode normalization, deduplication, tokenizer preprocessing, length filtering

## πŸ‹οΈ Training Details
### Hardware
- NVIDIA RTX 3090 24GB
- PyTorch 2.x + CUDA 12
- bf16 mixed precision

### Hyperparameters
```
epochs = 1
batch_size = 32
gradient_accumulation = 4
learning_rate = 3e-4
warmup_steps = 500
weight_decay = 0.1
sequence_length = 512
optimizer = AdamW
precision = bf16
```

## πŸ“ˆ Training Challenges & Solutions
### Bottleneck: Memory Bandwidth
Small models saturate VRAM bandwidth β†’ ~4.2 it/s  
Solution: shrink model size, adjust batch/accumulation, optimize data loading

### Transformers
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
tok = AutoTokenizer.from_pretrained("Yusiko/Khazri")
model = AutoModelForCausalLM.from_pretrained("Yusiko/Khazri")
```

## 🌐 Hugging Face
Available at: https://huggingface.co/Yusiko/Khazri

## πŸ“¦ License
GPL 3.0 License

## 🌍 Future Plans
- 1B+ model
- Better tokenizer
- Instruction-tuning
- WebGPU inference
- Community fine-tuning tools

## 🀝 Contact
Created by **Yusiko**  
GitHub: [Yusiko99](https://github.com/Yusiko99)  
Website: https://yusi.xo.je  
Hugging Face: Yusiko