Instructions to use yasserrmd/kallamni-4b-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use yasserrmd/kallamni-4b-v1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="yasserrmd/kallamni-4b-v1") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("yasserrmd/kallamni-4b-v1") model = AutoModelForCausalLM.from_pretrained("yasserrmd/kallamni-4b-v1") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use yasserrmd/kallamni-4b-v1 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "yasserrmd/kallamni-4b-v1" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "yasserrmd/kallamni-4b-v1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/yasserrmd/kallamni-4b-v1
- SGLang
How to use yasserrmd/kallamni-4b-v1 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "yasserrmd/kallamni-4b-v1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "yasserrmd/kallamni-4b-v1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "yasserrmd/kallamni-4b-v1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "yasserrmd/kallamni-4b-v1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Unsloth Studio
How to use yasserrmd/kallamni-4b-v1 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for yasserrmd/kallamni-4b-v1 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for yasserrmd/kallamni-4b-v1 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for yasserrmd/kallamni-4b-v1 to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="yasserrmd/kallamni-4b-v1", max_seq_length=2048, ) - Docker Model Runner
How to use yasserrmd/kallamni-4b-v1 with Docker Model Runner:
docker model run hf.co/yasserrmd/kallamni-4b-v1
Kallamni-4B (kallamni-4b-v1)
A conversational Arabic language model fine-tuned for Emirati dialect (اللهجة الإماراتية).
Model Description
Kallamni-4B is a model fine-tuned to understand and generate natural spoken Emirati Arabic. It is designed to capture the vocabulary, phrasing, and emotional tone native to daily UAE dialect, avoiding modern standard constructs.
This version builds upon your previous releases (1.2B, 2.6B) and strengthens dialect fidelity, consistency, and conversational fluidity.
System Prompt & Generation Style
For generating text (posts, dialogues), we use a system instruction that enforces Emirati dialect style:
You are an Emirati assistant who always speaks in authentic Emirati spoken Arabic.
Your responses must sound like daily UAE conversation — not MSA or foreign dialects.
Use words like “وايد”, “هيه”, “سرت”, “عقب”, “الربع”, “القعدة”, “نغير جو”.
Avoid MSA connectors like “ذلك”, “إنه”, “لقد”.
Respond casually, warmly, with cultural references (Ramadan, البحر، البر، العائلة).
Output must remain in Emirati dialect unless asked otherwise.
During generation, the parameters used are:
temperature = 0.7
top_p = 0.8
top_k = 20
Data & Training
- Training Data: 58,000 synthetic Emirati conversation samples
- Data Source: Generated via API (with assistance) + manual filtering for dialect accuracy
- Training Framework: Fine-tuned using Unsloth
- Instruction Tuning / Conversational Format: Via TRL
- Tokenizer: Extended to include Emirati-specific tokens and preserve dialect word merges
Evaluation & Comparisons
- Human evaluators consistently rated generated text as > 90% authentic Emirati dialect
- Compared to 1.2B and 2.6B versions, Kallamni-4B reduces fallback to MSA and yields more expressive, fluent dialect responses
- Performs robustly on conversational benchmarks focused on dialect contexts
Usage Example
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("yasserrmd/kallamni-4b-v1")
model = AutoModelForCausalLM.from_pretrained("yasserrmd/kallamni-4b-v1")
messages = [
{"role": "system", "content": """ You are an Emirati assistant who always speaks in **authentic Emirati spoken Arabic**.
Your responses must sound like daily UAE conversation — not MSA or foreign dialects.
Use words like “وايد”, “هيه”, “سرت”, “عقب”, “الربع”, “القعدة”, “نغير جو”.
Avoid MSA connectors like “ذلك”, “إنه”, “لقد”.
Respond casually, warmly, with cultural references (Ramadan, البحر، البر، العائلة).
Output must remain in Emirati dialect unless asked otherwise."""},
{"role": "user", "content": "ها، وين كنت البارحة؟"}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = model.generate(
**tokenizer(text, return_tensors="pt").to("cuda"),
max_new_tokens=100,
temperature=0.7,
top_p=0.8,
top_k=20,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
reply = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(reply)
Contribution & Feedback
- Submit issues or dialog examples where dialect slips
- Contribute real Emirati conversation pairs for refinement
- Provide evaluation prompts and comparative results
License & Ethical Use
- License: CC-BY-NC-4.0
- The model does not collect personal user data
- Use responsibly; avoid generating misinformation, impersonation, or harmful content
- When publishing outputs publicly, cite that the text was AI-generated
- Downloads last month
- 39