Jazeera Alpha — Dhivehi Language Model
Jazeera Alpha is a full fine-tune of Qwen3-8B optimized for the Dhivehi language (ދިވެހި), the official language of the Maldives. This is the first public release in the Jazeera series, developed by Annukh with the help of Javaabu to advance Dhivehi language AI.
This model represents one of the first dedicated efforts to create a high-quality, conversational large language model for Dhivehi — a language spoken by approximately 400,000 people and severely underrepresented in existing LLM training data.
Model Details
| Base Model | Qwen/Qwen3-8B (8.2B parameters) |
| Architecture | Dense Transformer, 36 layers, GQA (32Q / 8KV) |
| Fine-tuning Method | Full fine-tuning (100% of parameters trained) |
| Precision | bfloat16 |
| Context Length | 2,048 tokens (training) / 32,768 tokens (native inference) |
| Training Hardware | 1× NVIDIA H200 (141 GB) |
| Framework | Unsloth + HuggingFace TRL (SFTTrainer) |
| Optimizer | AdamW 8-bit with cosine learning rate schedule |
| License | Apache 2.0 (inherited from Qwen3-8B) |
Training Details
Jazeera Alpha was trained on a large-scale, curated Dhivehi instruction-following dataset compiled from various sources including translated instruction pairs, native Dhivehi conversational data, cultural and historical content, and multilingual parallel text. The dataset comprises over 2.4 million conversation examples in ShareGPT format, covering a wide range of tasks:
- Factual question answering about the Maldives and general knowledge in Dhivehi
- English ↔ Dhivehi translation
- Long-form Dhivehi text generation (stories, essays, articles)
- Reading comprehension and summarization in Dhivehi
- Casual conversation and cultural knowledge
- Grammar and morphology tasks
- Code-switching between Dhivehi and English
Training Configuration
Base Model: Qwen/Qwen3-8B
Max Sequence Length: 2,048
Effective Batch Size: 96 (24 × 4 gradient accumulation)
Learning Rate: 1e-5
LR Schedule: Cosine with 3% warmup
Training Progress
The model was trained starting from the base Qwen3-8B weights. Training loss dropped from ~1.3 to ~0.45 over the training period, indicating strong acquisition of Dhivehi language patterns. The loss curve showed consistent improvement without signs of overfitting.
Capabilities
Jazeera Alpha demonstrates competency in the following areas:
Strong:
- Factual question answering in Dhivehi (e.g., questions about Maldivian geography, history, and culture)
- English to Dhivehi translation
- Dhivehi to English translation
- Reading comprehension — extracting answers from Dhivehi passages
- Summarization of Dhivehi text
- Cultural knowledge about the Maldives (traditional food, customs, history)
Developing:
- Long-form Dhivehi creative writing (stories, essays)
- Casual Dhivehi conversation
- Grammar correction and morphological tasks
- Code-switching between Dhivehi and English
Limitations
- Alpha release: This model has been trained on a partial epoch of the full dataset. Quality will improve in subsequent releases.
- Thinking mode leakage: The base Qwen3 model has a built-in thinking mode (
<think>tags). These may occasionally leak into responses. For best results, use non-thinking mode (enable_thinking=False) or strip<think>blocks from outputs. - Packing artifacts: Since training used sequence packing, some edge cases in very long generations may show minor inconsistencies.
- Thaana script: While the model handles Thaana script (the writing system for Dhivehi) well overall, complex morphological constructions may occasionally produce errors.
- Hallucination: Like all LLMs, the model may generate plausible-sounding but incorrect information, particularly for specific factual claims about the Maldives.
Intended Use
Jazeera Alpha is intended for:
- Research and development of Dhivehi language technology
- Building Dhivehi conversational AI applications
- Dhivehi-English translation tools
- Cultural preservation and accessibility of Dhivehi language content
- Prototyping Dhivehi NLP applications
This model is not intended for production deployment without further evaluation and safety testing.
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "Annukh/jazeera-alpha"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
messages = [
{"role": "user", "content": "ދިވެހިރާއްޖޭގެ ވެރިރަށަކީ ކޮބައި؟"}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
enable_thinking=False
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=256,
temperature=0.7,
top_p=0.8,
top_k=20,
do_sample=True
)
response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)
Recommended Inference Settings
| Mode | Temperature | Top-P | Top-K | Min-P |
|---|---|---|---|---|
| Non-thinking (recommended) | 0.7 | 0.8 | 20 | 0.0 |
| Thinking | 0.6 | 0.95 | 20 | 0.0 |
For suppressing repetitive outputs, use presence_penalty between 0.0 and 1.5.
Citation
@misc{jazeera-alpha-2026,
title={Jazeera Alpha: A Dhivehi Language Model},
author={Annukh},
year={2026},
url={https://huggingface.co/Annukh/jazeera-alpha}
}
Acknowledgments
- Ahmed Yameen (@yaambe) — for helping collect and curate the training datasets and providing ongoing support throughout the model training process
- Athfan Khaleel (@athphane) — for helping with infrastructure setup and training support
- Javaabu (@javaabu) — for contributing to the broader Dhivehi AI ecosystem
- Qwen Team for the excellent Qwen3 base model
- Unsloth for efficient fine-tuning infrastructure
- The Maldivian community for preserving and advancing the Dhivehi language
Jazeera Alpha — bringing AI to the islands. 🏝️