Phi3-UrduInstruct

A fine-tuned version of Microsoft's Phi-3-mini-4k-instruct for Urdu language instruction following.

Model Description

Phi3-UrduInstruct is fine-tuned on a custom Urdu instruction dataset of 578 manually curated and verified examples. The model is designed to follow instructions in Urdu across multiple NLP tasks.

This work addresses the lack of instruction-tuned language models for Urdu, a low-resource language spoken by over 230 million people worldwide.

Training Data

A custom dataset of 578 Urdu instruction-response pairs was created for this project, covering 6 task categories:

Category	Examples
Translation (Urdu → English)	105
Grammar Correction	100
Question Answering	100
Text Summarization	107
Text Completion	91
Formal/Informal Conversion	75
Total	578

All examples were manually written and verified by a native Urdu speaker to ensure linguistic quality and cultural accuracy.

Training Details

Parameter	Value
Base Model	Phi-3-mini-4k-instruct (4-bit)
Fine-tuning Method	LoRA (r=16, alpha=16)
Training Epochs	3
Learning Rate	2e-4
Training Loss	1.37 → 0.47
Framework	Unsloth + TRL
Hardware	Google Colab T4 GPU

Usage

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "Almanships/Phi3-UrduInstruct",
    max_seq_length = 2048,
    dtype = None,
    load_in_4bit = True,
)

FastLanguageModel.for_inference(model)

messages = [
    {"role": "user", "content": 
     "اس جملے کا انگریزی میں ترجمہ کریں\nپاکستان ایک خوبصورت ملک ہے"}
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
).to("cuda")

outputs = model.generate(input_ids=inputs, max_new_tokens=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Example Outputs

Translation:

Input: پاکستان ایک خوبصورت ملک ہے
Output: Pakistan is a beautiful country

Grammar Correction:

Input: وہ گیا بازار آج
Output: وہ آج بازار گیا

Question Answering:

Input: پاکستان کا دارالحکومت کون سا ہے؟
Output: پاکستان کا دارالحکومت اسلام آباد ہے

Limitations

Trained on 578 examples — larger dataset would improve performance
Evaluation is currently qualitative; formal benchmarks pending
Best performance on the 6 trained task categories

Future Work

Expand dataset to 2000+ examples
Add formal evaluation benchmarks for Urdu NLP
Extend to Punjabi language instruction tuning
Compare against other multilingual models on Urdu tasks

Citation

If you use this model, please cite:

@misc{phi3-urduinstruct-2026,
  author = {Almanships},
  title = {Phi3-UrduInstruct: Instruction Tuning of 
           Phi-3 for Urdu Language},
  year = {2026},
  publisher = {HuggingFace},
  url = {https://huggingface.co/Almanships/Phi3-UrduInstruct}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Almanships/Phi3-UrduInstruct

Base model

unsloth/Phi-3-mini-4k-instruct-bnb-4bit

Adapter

(42)

this model