You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Jazeera Alpha — Dhivehi Language Model

Jazeera Alpha is a full fine-tune of Qwen3-8B optimized for the Dhivehi language (ދިވެހި), the official language of the Maldives. This is the first public release in the Jazeera series, developed by Annukh with the help of Javaabu to advance Dhivehi language AI.

This model represents one of the first dedicated efforts to create a high-quality, conversational large language model for Dhivehi — a language spoken by approximately 400,000 people and severely underrepresented in existing LLM training data.

Model Details

Base Model Qwen/Qwen3-8B (8.2B parameters)
Architecture Dense Transformer, 36 layers, GQA (32Q / 8KV)
Fine-tuning Method Full fine-tuning (100% of parameters trained)
Precision bfloat16
Context Length 2,048 tokens (training) / 32,768 tokens (native inference)
Training Hardware 1× NVIDIA H200 (141 GB)
Framework Unsloth + HuggingFace TRL (SFTTrainer)
Optimizer AdamW 8-bit with cosine learning rate schedule
License Apache 2.0 (inherited from Qwen3-8B)

Training Details

Jazeera Alpha was trained on a large-scale, curated Dhivehi instruction-following dataset compiled from various sources including translated instruction pairs, native Dhivehi conversational data, cultural and historical content, and multilingual parallel text. The dataset comprises over 2.4 million conversation examples in ShareGPT format, covering a wide range of tasks:

  • Factual question answering about the Maldives and general knowledge in Dhivehi
  • English ↔ Dhivehi translation
  • Long-form Dhivehi text generation (stories, essays, articles)
  • Reading comprehension and summarization in Dhivehi
  • Casual conversation and cultural knowledge
  • Grammar and morphology tasks
  • Code-switching between Dhivehi and English

Training Configuration

Base Model:              Qwen/Qwen3-8B
Max Sequence Length:     2,048
Effective Batch Size:    96 (24 × 4 gradient accumulation)
Learning Rate:           1e-5
LR Schedule:             Cosine with 3% warmup

Training Progress

The model was trained starting from the base Qwen3-8B weights. Training loss dropped from ~1.3 to ~0.45 over the training period, indicating strong acquisition of Dhivehi language patterns. The loss curve showed consistent improvement without signs of overfitting.

Capabilities

Jazeera Alpha demonstrates competency in the following areas:

Strong:

  • Factual question answering in Dhivehi (e.g., questions about Maldivian geography, history, and culture)
  • English to Dhivehi translation
  • Dhivehi to English translation
  • Reading comprehension — extracting answers from Dhivehi passages
  • Summarization of Dhivehi text
  • Cultural knowledge about the Maldives (traditional food, customs, history)

Developing:

  • Long-form Dhivehi creative writing (stories, essays)
  • Casual Dhivehi conversation
  • Grammar correction and morphological tasks
  • Code-switching between Dhivehi and English

Limitations

  • Alpha release: This model has been trained on a partial epoch of the full dataset. Quality will improve in subsequent releases.
  • Thinking mode leakage: The base Qwen3 model has a built-in thinking mode (<think> tags). These may occasionally leak into responses. For best results, use non-thinking mode (enable_thinking=False) or strip <think> blocks from outputs.
  • Packing artifacts: Since training used sequence packing, some edge cases in very long generations may show minor inconsistencies.
  • Thaana script: While the model handles Thaana script (the writing system for Dhivehi) well overall, complex morphological constructions may occasionally produce errors.
  • Hallucination: Like all LLMs, the model may generate plausible-sounding but incorrect information, particularly for specific factual claims about the Maldives.

Intended Use

Jazeera Alpha is intended for:

  • Research and development of Dhivehi language technology
  • Building Dhivehi conversational AI applications
  • Dhivehi-English translation tools
  • Cultural preservation and accessibility of Dhivehi language content
  • Prototyping Dhivehi NLP applications

This model is not intended for production deployment without further evaluation and safety testing.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Annukh/jazeera-alpha"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

messages = [
    {"role": "user", "content": "ދިވެހިރާއްޖޭގެ ވެރިރަށަކީ ކޮބައި؟"}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=False
)

inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=256,
    temperature=0.7,
    top_p=0.8,
    top_k=20,
    do_sample=True
)

response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)

Recommended Inference Settings

Mode Temperature Top-P Top-K Min-P
Non-thinking (recommended) 0.7 0.8 20 0.0
Thinking 0.6 0.95 20 0.0

For suppressing repetitive outputs, use presence_penalty between 0.0 and 1.5.

Citation

@misc{jazeera-alpha-2026,
    title={Jazeera Alpha: A Dhivehi Language Model},
    author={Annukh},
    year={2026},
    url={https://huggingface.co/Annukh/jazeera-alpha}
}

Acknowledgments

  • Ahmed Yameen (@yaambe) — for helping collect and curate the training datasets and providing ongoing support throughout the model training process
  • Athfan Khaleel (@athphane) — for helping with infrastructure setup and training support
  • Javaabu (@javaabu) — for contributing to the broader Dhivehi AI ecosystem
  • Qwen Team for the excellent Qwen3 base model
  • Unsloth for efficient fine-tuning infrastructure
  • The Maldivian community for preserving and advancing the Dhivehi language

Jazeera Alpha — bringing AI to the islands. 🏝️

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Annukh/jazeera-alpha-checkpoints

Finetuned
Qwen/Qwen3-8B
Finetuned
(1348)
this model