Qalb-1.0-8B-Instruct (Urdu Llama 3.1)
Qalb-1.0-8B-Instruct is a state-of-the-art Urdu language model designed to bridge the gap in low-resource language processing. Built on the powerful Llama-3.1-8B architecture, Qalb has been rigorously adapted for the Urdu language through a two-stage process: Continued Pre-training on a massive Urdu corpus of 1.97 billion tokens followed by Supervised Fine-Tuning for instruction following.
Unlike general multilingual models that struggle with Urdu grammar and cultural nuance, Qalb delivers fluent, culturally accurate, and context-aware responses.
🌟 Key Features
- State-of-the-Art Performance: Outperforms previous best models (Alif-1.0 and LLaMA-3.1 Base) on 6 out of 7 benchmarks.
- Deep Urdu Understanding: Pre-trained on a diverse mix of news, literature, government documents, and social media to capture the depth of the language.
- Ethical & Safe: Fine-tuned to provide helpful, harmless, and honest assistants, refusing to generate toxic or misleading content.
- Reasoning Capable: Excellent performance on logical reasoning, mathematical word problems, and commonsense tasks in Urdu.
- Bilingual Proficiency: Retains strong English capabilities while excelling in Urdu, making it ideal for translation and code-switching tasks.
📊 Performance Benchmarks
Qalb establishes a new standard for Urdu LLMs, achieving an Overall Score of 90.34. It significantly outperforms the base model and the previous state-of-the-art.
🏆 Comparison vs. SOTA Models
| Task | Qalb (Ours) | Alif-1.0-Instruct | LLaMA-3.1-8B-Instruct |
|---|---|---|---|
| Overall Score | 90.34 | 87.1 | 45.7 |
| Translation | 94.41 | 89.3 | 58.9 |
| Classification | 96.38 | 93.9 | 61.4 |
| Sentiment Analysis | 95.79 | 94.3 | 54.3 |
| Ethics | 90.83 | 85.7 | 27.3 |
| Reasoning | 88.59 | 83.5 | 45.6 |
| QA (Question Answering) | 80.40 | 73.8 | 30.5 |
| Generation | 85.97 | 90.2 | 42.8 |
> Note: Scores are on a 0-100 scale. Qalb outperforms the previous best model (Alif) in 6 out of 7 categories.
🚀 How to Use
Google COlab
Method 1: Using Unsloth (Recommended - Fast & Efficient)
The easiest way to run Qalb is using the Unsloth library, which provides 2x faster inference.
from unsloth import FastLanguageModel
import torch
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "enstazao/Qalb-1.0-8B-Instruct",
max_seq_length = 2048,
dtype = None,
load_in_4bit = True, # <--- Currently set to use 4-bit quantization
)
FastLanguageModel.for_inference(model)
urdu_system_prompt = "آپ ایک مددگار اور بے ضرر مصنوعی ذہانت کے اسسٹنٹ ہیں۔ آپ اردو میں سوالات کے درست جوابات دیتے ہیں۔"
questions = [
"پاکستان کا قومی کھیل کیا ہے؟",
"لاہور شہر کیوں مشہور ہے؟ مختصر وضاحت کریں۔",
"سوال: لیاقت علی خان کون تھے؟",
"کراچی کو روشنیوں کا شہر کیوں کہا جاتا ہے؟",
"انگریزی میں ترجمہ کریں: 'محنت کامیابی کی کنجی ہے۔'"
]
print("🚀 Starting Batch Generation...\n")
for user_input in questions:
print(f"🔹 Question: {user_input}")
# Manually Format Prompt (Llama-3 Style)
prompt = f"""<|begin_of_text|><|start_header_id|>system<|end_header_id|>
{urdu_system_prompt}<|eot_id|><|start_header_id|>user<|end_header_id|>
{user_input}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
"""
inputs = tokenizer([prompt], return_tensors = "pt").to("cuda")
outputs = model.generate(
**inputs,
max_new_tokens = 256,
temperature = 0.1,
top_p = 0.9,
repetition_penalty = 1.1,
do_sample = True,
eos_token_id = [tokenizer.eos_token_id, tokenizer.convert_tokens_to_ids("<|eot_id|>")]
)
response = tokenizer.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True)
print(f"✅ Answer: {response}")
print("-" * 50)
Method 2: Using Hugging Face Transformers
Compatible with standard transformers if Unsloth is not available.
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch
model_name = "enstazao/Qalb-1.0-8B-Instruct"
urdu_system_prompt = "آپ ایک مددگار اور بے ضرر مصنوعی ذہانت کے اسسٹنٹ ہیں۔ آپ اردو میں سوالات کے درست جوابات دیتے ہیں۔"
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
)
print("⏳ Loading model in 4-bit...")
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
quantization_config=bnb_config, # <--- Apply 4-bit here
device_map="auto" # <--- Required for quantization
)
terminators = [
tokenizer.eos_token_id,
tokenizer.convert_tokens_to_ids("<|eot_id|>")
]
questions = [
"پاکستان کا قومی کھیل کیا ہے؟",
"لاہور شہر کیوں مشہور ہے؟ مختصر وضاحت کریں۔",
"سوال: لیاقت علی خان کون تھے؟",
"سوال: اسلام آباد شہر کے بارے میں بتائیں۔",
"انگریزی میں ترجمہ کریں: 'محنت کامیابی کی کنجی ہے۔'"
]
print("Model Loaded. Starting Generation...\n")
# 5. Loop through questions
for user_input in questions:
print(f"🔹 Question: {user_input}")
prompt = f"""<|begin_of_text|><|start_header_id|>system<|end_header_id|>
{urdu_system_prompt}<|eot_id|><|start_header_id|>user<|end_header_id|>
{user_input}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
"""
input_ids = tokenizer([prompt], return_tensors="pt").to("cuda")
outputs = model.generate(
**input_ids,
max_new_tokens = 256,
temperature = 0.1,
top_p = 0.9,
repetition_penalty = 1.1,
do_sample = True,
eos_token_id = terminators
)
response = tokenizer.decode(outputs[0][input_ids['input_ids'].shape[1]:], skip_special_tokens=True)
print(f"✅ Answer: {response}")
print("-" * 50)
Limitation & Bias
While Qalb has been trained to be helpful and harmless, it may still reflect biases present in the training data. Users should fact-check critical information, especially in medical, legal, or religious contexts.
Citation
If you use QALB in your research, please cite:
@article{qalb2025,
title={Qalb: Largest State-of-the-Art Urdu Large Language Model for 230M Speakers with Systematic Continued Pre-training},
author={Hassan, Muhammad Taimoor and Ahmed, Jawad and Awais, Muhammad},
journal={arXiv preprint arXiv:2601.08141},
year={2026},
eprint={2601.08141},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={[https://arxiv.org/abs/2601.08141](https://arxiv.org/abs/2601.08141)},
doi={10.48550/arXiv.2601.08141}
}
- Downloads last month
- 34