Instructions to use yasserrmd/kallamni-4b-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use yasserrmd/kallamni-4b-v1 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="yasserrmd/kallamni-4b-v1")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("yasserrmd/kallamni-4b-v1")
model = AutoModelForMultimodalLM.from_pretrained("yasserrmd/kallamni-4b-v1")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use yasserrmd/kallamni-4b-v1 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "yasserrmd/kallamni-4b-v1"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "yasserrmd/kallamni-4b-v1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/yasserrmd/kallamni-4b-v1

SGLang

How to use yasserrmd/kallamni-4b-v1 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "yasserrmd/kallamni-4b-v1" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "yasserrmd/kallamni-4b-v1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "yasserrmd/kallamni-4b-v1" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "yasserrmd/kallamni-4b-v1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio

How to use yasserrmd/kallamni-4b-v1 with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for yasserrmd/kallamni-4b-v1 to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for yasserrmd/kallamni-4b-v1 to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for yasserrmd/kallamni-4b-v1 to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="yasserrmd/kallamni-4b-v1",
    max_seq_length=2048,
)

Docker Model Runner
How to use yasserrmd/kallamni-4b-v1 with Docker Model Runner:
```
docker model run hf.co/yasserrmd/kallamni-4b-v1
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Kallamni-4B (kallamni-4b-v1)

A conversational Arabic language model fine-tuned for Emirati dialect (اللهجة الإماراتية).

Model Description

Kallamni-4B is a model fine-tuned to understand and generate natural spoken Emirati Arabic. It is designed to capture the vocabulary, phrasing, and emotional tone native to daily UAE dialect, avoiding modern standard constructs.

This version builds upon your previous releases (1.2B, 2.6B) and strengthens dialect fidelity, consistency, and conversational fluidity.

System Prompt & Generation Style

For generating text (posts, dialogues), we use a system instruction that enforces Emirati dialect style:

You are an Emirati assistant who always speaks in authentic Emirati spoken Arabic.
Your responses must sound like daily UAE conversation — not MSA or foreign dialects.
Use words like “وايد”, “هيه”, “سرت”, “عقب”, “الربع”, “القعدة”, “نغير جو”.
Avoid MSA connectors like “ذلك”, “إنه”, “لقد”.
Respond casually, warmly, with cultural references (Ramadan, البحر، البر، العائلة).
Output must remain in Emirati dialect unless asked otherwise.

During generation, the parameters used are:

temperature = 0.7  
top_p = 0.8  
top_k = 20

Data & Training

Training Data: 58,000 synthetic Emirati conversation samples
Data Source: Generated via API (with assistance) + manual filtering for dialect accuracy
Training Framework: Fine-tuned using Unsloth
Instruction Tuning / Conversational Format: Via TRL
Tokenizer: Extended to include Emirati-specific tokens and preserve dialect word merges

Evaluation & Comparisons

Human evaluators consistently rated generated text as > 90% authentic Emirati dialect
Compared to 1.2B and 2.6B versions, Kallamni-4B reduces fallback to MSA and yields more expressive, fluent dialect responses
Performs robustly on conversational benchmarks focused on dialect contexts

Usage Example

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("yasserrmd/kallamni-4b-v1")
model = AutoModelForCausalLM.from_pretrained("yasserrmd/kallamni-4b-v1")

messages = [
    {"role": "system", "content": """ You are an Emirati assistant who always speaks in **authentic Emirati spoken Arabic**.  
 Your responses must sound like daily UAE conversation — not MSA or foreign dialects.  
 Use words like “وايد”, “هيه”, “سرت”, “عقب”, “الربع”, “القعدة”, “نغير جو”.  
 Avoid MSA connectors like “ذلك”, “إنه”, “لقد”.  
 Respond casually, warmly, with cultural references (Ramadan, البحر، البر، العائلة).  
 Output must remain in Emirati dialect unless asked otherwise."""},
 {"role": "user", "content": "ها، وين كنت البارحة؟"}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = model.generate(
    **tokenizer(text, return_tensors="pt").to("cuda"),
    max_new_tokens=100,
    temperature=0.7,
    top_p=0.8,
    top_k=20,
    do_sample=True,
    pad_token_id=tokenizer.eos_token_id
)
reply = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(reply)

Contribution & Feedback

Submit issues or dialog examples where dialect slips
Contribute real Emirati conversation pairs for refinement
Provide evaluation prompts and comparative results

License & Ethical Use

License: CC-BY-NC-4.0
The model does not collect personal user data
Use responsibly; avoid generating misinformation, impersonation, or harmful content
When publishing outputs publicly, cite that the text was AI-generated

Downloads last month: 27

Safetensors

Model size

4B params

Tensor type

BF16

Model tree for yasserrmd/kallamni-4b-v1

Base model

Qwen/Qwen3-4B-Base

Finetuned

Qwen/Qwen3-4B

Finetuned

(706)

this model

Quantizations

2 models

yasserrmd
/

kallamni-4b-v1