Instructions to use Annukh/jazeera-alpha-checkpoints with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Annukh/jazeera-alpha-checkpoints with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Annukh/jazeera-alpha-checkpoints")

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Annukh/jazeera-alpha-checkpoints", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Annukh/jazeera-alpha-checkpoints with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Annukh/jazeera-alpha-checkpoints"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Annukh/jazeera-alpha-checkpoints",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Annukh/jazeera-alpha-checkpoints

SGLang

How to use Annukh/jazeera-alpha-checkpoints with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Annukh/jazeera-alpha-checkpoints" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Annukh/jazeera-alpha-checkpoints",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Annukh/jazeera-alpha-checkpoints" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Annukh/jazeera-alpha-checkpoints",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use Annukh/jazeera-alpha-checkpoints with Docker Model Runner:
```
docker model run hf.co/Annukh/jazeera-alpha-checkpoints
```

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Jazeera Alpha — Dhivehi Language Model

Jazeera Alpha is a full fine-tune of Qwen3-8B optimized for the Dhivehi language (ދިވެހި), the official language of the Maldives. This is the first public release in the Jazeera series, developed by Annukh with the help of Javaabu to advance Dhivehi language AI.

This model represents one of the first dedicated efforts to create a high-quality, conversational large language model for Dhivehi — a language spoken by approximately 400,000 people and severely underrepresented in existing LLM training data.

Model Details


Base Model	Qwen/Qwen3-8B (8.2B parameters)
Architecture	Dense Transformer, 36 layers, GQA (32Q / 8KV)
Fine-tuning Method	Full fine-tuning (100% of parameters trained)
Precision	bfloat16
Context Length	2,048 tokens (training) / 32,768 tokens (native inference)
Training Hardware	1× NVIDIA H200 (141 GB)
Framework	Unsloth + HuggingFace TRL (SFTTrainer)
Optimizer	AdamW 8-bit with cosine learning rate schedule
License	Apache 2.0 (inherited from Qwen3-8B)

Training Details

Jazeera Alpha was trained on a large-scale, curated Dhivehi instruction-following dataset compiled from various sources including translated instruction pairs, native Dhivehi conversational data, cultural and historical content, and multilingual parallel text. The dataset comprises over 2.4 million conversation examples in ShareGPT format, covering a wide range of tasks:

Factual question answering about the Maldives and general knowledge in Dhivehi
English ↔ Dhivehi translation
Long-form Dhivehi text generation (stories, essays, articles)
Reading comprehension and summarization in Dhivehi
Casual conversation and cultural knowledge
Grammar and morphology tasks
Code-switching between Dhivehi and English

Training Configuration

Base Model:              Qwen/Qwen3-8B
Max Sequence Length:     2,048
Effective Batch Size:    96 (24 × 4 gradient accumulation)
Learning Rate:           1e-5
LR Schedule:             Cosine with 3% warmup

Training Progress

The model was trained starting from the base Qwen3-8B weights. Training loss dropped from ~1.3 to ~0.45 over the training period, indicating strong acquisition of Dhivehi language patterns. The loss curve showed consistent improvement without signs of overfitting.

Capabilities

Jazeera Alpha demonstrates competency in the following areas:

Strong:

Factual question answering in Dhivehi (e.g., questions about Maldivian geography, history, and culture)
English to Dhivehi translation
Dhivehi to English translation
Reading comprehension — extracting answers from Dhivehi passages
Summarization of Dhivehi text
Cultural knowledge about the Maldives (traditional food, customs, history)

Developing:

Long-form Dhivehi creative writing (stories, essays)
Casual Dhivehi conversation
Grammar correction and morphological tasks
Code-switching between Dhivehi and English

Limitations

Alpha release: This model has been trained on a partial epoch of the full dataset. Quality will improve in subsequent releases.
Thinking mode leakage: The base Qwen3 model has a built-in thinking mode (<think> tags). These may occasionally leak into responses. For best results, use non-thinking mode (enable_thinking=False) or strip <think> blocks from outputs.
Packing artifacts: Since training used sequence packing, some edge cases in very long generations may show minor inconsistencies.
Thaana script: While the model handles Thaana script (the writing system for Dhivehi) well overall, complex morphological constructions may occasionally produce errors.
Hallucination: Like all LLMs, the model may generate plausible-sounding but incorrect information, particularly for specific factual claims about the Maldives.

Intended Use

Jazeera Alpha is intended for:

Research and development of Dhivehi language technology
Building Dhivehi conversational AI applications
Dhivehi-English translation tools
Cultural preservation and accessibility of Dhivehi language content
Prototyping Dhivehi NLP applications

This model is not intended for production deployment without further evaluation and safety testing.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Annukh/jazeera-alpha"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

messages = [
    {"role": "user", "content": "ދިވެހިރާއްޖޭގެ ވެރިރަށަކީ ކޮބައި؟"}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=False
)

inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=256,
    temperature=0.7,
    top_p=0.8,
    top_k=20,
    do_sample=True
)

response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)

Recommended Inference Settings

Mode	Temperature	Top-P	Top-K	Min-P
Non-thinking (recommended)	0.7	0.8	20	0.0
Thinking	0.6	0.95	20	0.0

For suppressing repetitive outputs, use presence_penalty between 0.0 and 1.5.

Citation

@misc{jazeera-alpha-2026,
    title={Jazeera Alpha: A Dhivehi Language Model},
    author={Annukh},
    year={2026},
    url={https://huggingface.co/Annukh/jazeera-alpha}
}

Acknowledgments

Ahmed Yameen (@yaambe) — for helping collect and curate the training datasets and providing ongoing support throughout the model training process
Athfan Khaleel (@athphane) — for helping with infrastructure setup and training support
Javaabu (@javaabu) — for contributing to the broader Dhivehi AI ecosystem
Qwen Team for the excellent Qwen3 base model
Unsloth for efficient fine-tuning infrastructure
The Maldivian community for preserving and advancing the Dhivehi language

Jazeera Alpha — bringing AI to the islands. 🏝️

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for Annukh/jazeera-alpha-checkpoints

Base model

Qwen/Qwen3-8B-Base

Finetuned

Qwen/Qwen3-8B

Finetuned

(1869)

this model