Yusiko
/

Khazri

Model card Files Files and versions

Khazri / README.md

Yusiko's picture

Update README.md

061607c verified about 2 months ago

|

history blame contribute delete

2.48 kB

	---
	license: gpl-3.0
	language:
	- az
	base_model:
	- Yusiko/Khazri
	tags:
	- aze
	- mini
	- yusiko
	---


	# 🌪️Khazri — Azerbaijani Language Model
	A lightweight, efficient, and fully custom Azerbaijani language model designed for text generation, chat applications, education, and research.
	Khazri is trained from scratch using a custom 10M-sample Azerbaijani dataset and optimized for running on consumer GPUs while maintaining competitive performance.

	## 🌟 Features
	- 🇦🇿 Native Azerbaijani language support
	- ⚡ Lightweight architecture (≈ 36M parameter)
	- 🚀 Supports fast inference with GGUF + llama.cpp
	- 📦 Available on Hugging Face
	- 🎯 Optimized for chatbots, WebRTC real-time assistants, and low-latency deployment

	## 🏗️ Model Architecture
	\| Version \| Parameters \| Type \| Context Length \| Notes \|
	\|--------\|------------\|------\|----------------\|-------\|
	\| Khazri-36M \| ~36.6M \| GPT-2 Small variant \| 1024 \| Higher quality \|

	Architecture:
	- Transformer decoder-only
	- Multi-head self-attention
	- Rotary positional embeddings (RoPE)
	- GELU activation
	- Layer normalization
	- Tied embeddings

	## 📚 Dataset
	Khazri is trained on a 10 million-sample Azerbaijani dataset including:
	- News, books, conversations, social media, web articles, educational content

	Preprocessing:
	- Unicode normalization, deduplication, tokenizer preprocessing, length filtering

	## 🏋️ Training Details
	### Hardware
	- NVIDIA RTX 3090 24GB
	- PyTorch 2.x + CUDA 12
	- bf16 mixed precision

	### Hyperparameters
	```
	epochs = 1
	batch_size = 32
	gradient_accumulation = 4
	learning_rate = 3e-4
	warmup_steps = 500
	weight_decay = 0.1
	sequence_length = 512
	optimizer = AdamW
	precision = bf16
	```

	## 📈 Training Challenges & Solutions
	### Bottleneck: Memory Bandwidth
	Small models saturate VRAM bandwidth → ~4.2 it/s
	Solution: shrink model size, adjust batch/accumulation, optimize data loading

	### Transformers
	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	tok = AutoTokenizer.from_pretrained("Yusiko/Khazri")
	model = AutoModelForCausalLM.from_pretrained("Yusiko/Khazri")
	```

	## 🌐 Hugging Face
	Available at: https://huggingface.co/Yusiko/Khazri

	## 📦 License
	GPL 3.0 License

	## 🌍 Future Plans
	- 1B+ model
	- Better tokenizer
	- Instruction-tuning
	- WebGPU inference
	- Community fine-tuning tools

	## 🤝 Contact
	Created by Yusiko
	GitHub: [Yusiko99](https://github.com/Yusiko99)
	Website: https://yusi.xo.je
	Hugging Face: Yusiko