Qalb-1.0-8B-Instruct (Urdu Llama 3.1)

Unsloth Urdu License

Qalb-1.0-8B-Instruct is a state-of-the-art Urdu language model designed to bridge the gap in low-resource language processing. Built on the powerful Llama-3.1-8B architecture, Qalb has been rigorously adapted for the Urdu language through a two-stage process: Continued Pre-training on a massive Urdu corpus of 1.97 billion tokens followed by Supervised Fine-Tuning for instruction following.

Unlike general multilingual models that struggle with Urdu grammar and cultural nuance, Qalb delivers fluent, culturally accurate, and context-aware responses.

🌟 Key Features

  • State-of-the-Art Performance: Outperforms previous best models (Alif-1.0 and LLaMA-3.1 Base) on 6 out of 7 benchmarks.
  • Deep Urdu Understanding: Pre-trained on a diverse mix of news, literature, government documents, and social media to capture the depth of the language.
  • Ethical & Safe: Fine-tuned to provide helpful, harmless, and honest assistants, refusing to generate toxic or misleading content.
  • Reasoning Capable: Excellent performance on logical reasoning, mathematical word problems, and commonsense tasks in Urdu.
  • Bilingual Proficiency: Retains strong English capabilities while excelling in Urdu, making it ideal for translation and code-switching tasks.

📊 Performance Benchmarks

Qalb establishes a new standard for Urdu LLMs, achieving an Overall Score of 90.34. It significantly outperforms the base model and the previous state-of-the-art.

🏆 Comparison vs. SOTA Models

Task Qalb (Ours) Alif-1.0-Instruct LLaMA-3.1-8B-Instruct
Overall Score 90.34 87.1 45.7
Translation 94.41 89.3 58.9
Classification 96.38 93.9 61.4
Sentiment Analysis 95.79 94.3 54.3
Ethics 90.83 85.7 27.3
Reasoning 88.59 83.5 45.6
QA (Question Answering) 80.40 73.8 30.5
Generation 85.97 90.2 42.8

> Note: Scores are on a 0-100 scale. Qalb outperforms the previous best model (Alif) in 6 out of 7 categories.

🚀 How to Use

Google COlab

Open In Colab

Method 1: Using Unsloth (Recommended - Fast & Efficient)

The easiest way to run Qalb is using the Unsloth library, which provides 2x faster inference.

from unsloth import FastLanguageModel
import torch

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "enstazao/Qalb-1.0-8B-Instruct",
    max_seq_length = 2048,
    dtype = None,
    load_in_4bit = True, # <--- Currently set to use 4-bit quantization
)
FastLanguageModel.for_inference(model)


urdu_system_prompt = "آپ ایک مددگار اور بے ضرر مصنوعی ذہانت کے اسسٹنٹ ہیں۔ آپ اردو میں سوالات کے درست جوابات دیتے ہیں۔"

questions = [
    "پاکستان کا قومی کھیل کیا ہے؟",                         
    "لاہور شہر کیوں مشہور ہے؟ مختصر وضاحت کریں۔",
    "سوال: لیاقت علی خان کون تھے؟",
    "کراچی کو روشنیوں کا شہر کیوں کہا جاتا ہے؟",             
    "انگریزی میں ترجمہ کریں: 'محنت کامیابی کی کنجی ہے۔'"
]

print("🚀 Starting Batch Generation...\n")


for user_input in questions:
    print(f"🔹 Question: {user_input}")

    # Manually Format Prompt (Llama-3 Style)
    prompt = f"""<|begin_of_text|><|start_header_id|>system<|end_header_id|>

{urdu_system_prompt}<|eot_id|><|start_header_id|>user<|end_header_id|>

{user_input}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
"""

    inputs = tokenizer([prompt], return_tensors = "pt").to("cuda")

    outputs = model.generate(
        **inputs,
        max_new_tokens = 256,
        temperature = 0.1,
        top_p = 0.9,
        repetition_penalty = 1.1,
        do_sample = True,
        eos_token_id = [tokenizer.eos_token_id, tokenizer.convert_tokens_to_ids("<|eot_id|>")]
    )

    response = tokenizer.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True)
    
    print(f"✅ Answer: {response}")
    print("-" * 50)

Method 2: Using Hugging Face Transformers

Compatible with standard transformers if Unsloth is not available.

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch


model_name = "enstazao/Qalb-1.0-8B-Instruct"
urdu_system_prompt = "آپ ایک مددگار اور بے ضرر مصنوعی ذہانت کے اسسٹنٹ ہیں۔ آپ اردو میں سوالات کے درست جوابات دیتے ہیں۔"


bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
)



print("⏳ Loading model in 4-bit...")
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config, # <--- Apply 4-bit here
    device_map="auto"               # <--- Required for quantization
)

terminators = [
    tokenizer.eos_token_id,
    tokenizer.convert_tokens_to_ids("<|eot_id|>")
]


questions = [
    "پاکستان کا قومی کھیل کیا ہے؟",                         
    "لاہور شہر کیوں مشہور ہے؟ مختصر وضاحت کریں۔",            
    "سوال: لیاقت علی خان کون تھے؟",                     
    "سوال: اسلام آباد شہر کے بارے میں بتائیں۔",  
    "انگریزی میں ترجمہ کریں: 'محنت کامیابی کی کنجی ہے۔'"
]

print("Model Loaded. Starting Generation...\n")

# 5. Loop through questions
for user_input in questions:
    print(f"🔹 Question: {user_input}")
    
    prompt = f"""<|begin_of_text|><|start_header_id|>system<|end_header_id|>

{urdu_system_prompt}<|eot_id|><|start_header_id|>user<|end_header_id|>

{user_input}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
"""

    input_ids = tokenizer([prompt], return_tensors="pt").to("cuda")

    outputs = model.generate(
        **input_ids,
        max_new_tokens = 256,
        temperature = 0.1,
        top_p = 0.9,
        repetition_penalty = 1.1,
        do_sample = True,
        eos_token_id = terminators
    )

    response = tokenizer.decode(outputs[0][input_ids['input_ids'].shape[1]:], skip_special_tokens=True)
    
    print(f"✅ Answer: {response}")
    print("-" * 50)

Limitation & Bias

While Qalb has been trained to be helpful and harmless, it may still reflect biases present in the training data. Users should fact-check critical information, especially in medical, legal, or religious contexts.

Citation

If you use QALB in your research, please cite:

@article{qalb2025,
  title={Qalb: Largest State-of-the-Art Urdu Large Language Model for 230M Speakers with Systematic Continued Pre-training},
  author={Hassan, Muhammad Taimoor and Ahmed, Jawad and Awais, Muhammad},
  journal={arXiv preprint arXiv:2601.08141},
  year={2026},
  eprint={2601.08141},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={[https://arxiv.org/abs/2601.08141](https://arxiv.org/abs/2601.08141)},
  doi={10.48550/arXiv.2601.08141}
}
Downloads last month
34
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for enstazao/Qalb-1.0-8B-Instruct

Finetuned
(290)
this model
Quantizations
2 models

Space using enstazao/Qalb-1.0-8B-Instruct 1

Paper for enstazao/Qalb-1.0-8B-Instruct