SindhiLM-Qwen-0.5B (v1.0)
Developed by Aakash Meghwar (Founder, Text Tech Solutions)
π Project Overview
This model is a specialized Large Language Model (LLM) for the Sindhi Language, developed as part of the broader SindhiLM Project. It is a fine-tuned version of the Qwen2.5-0.5B architecture, specifically optimized to understand and generate Sindhi text using a custom-built BPE tokenizer.
The model was trained for 10,000 steps using a 4-bit LoRA (Low-Rank Adaptation) approach. This project adheres to the "Green AI" philosophy, providing a high-performance, compact model that is accessible for low-resource languages and deployable on consumer-grade hardware.
π About the Author
I am Aakash Meghwar, a Computational Linguist specializing in South Asian languages.
- π M.S. in Applied Linguistics & Text Analytics (HSE, Russia - Graduating June 2026)
- π B.S. in English Language & Literature (NUML, Islamabad)
- π‘ Founder: Text Tech Solutions
- βοΈ Published Researcher: Author of "Compact Transformer Models for Classical Urdu Poetry" (Corporum Journal).
π€ Open for Collaboration & PhD Opportunities
I am actively seeking PhD opportunities and Research Collaborations in the following areas:
- Low-Resource NLP: Developing efficient models for Sindhi, Urdu, and Siraiki.
- Literary Informatics: Computational stylistics and affective registers in classical poetry.
- SindhiLM Evolution: I am currently developing SindhiLM-v2 (featuring an improved Sindhi-BPE tokenizer and knowledge distillation).
Are you looking for a researcher to join your NLP lab or organization? I am open to discussing projects involving Model Compression, Cross-Lingual Transfer, and South Asian Language Tech.
π© Contact & Links
- Email: aakashmeghwar01@gmail.com
- LinkedIn: Aakash Meghwar
- Projects: SindhiLM | Urdu Poetry Research
π» How to Use
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
model_id = "Qwen/Qwen2.5-0.5B"
adapter_id = "aakashMeghwar01/SindhiLM-Qwen-0.5B"
# Load Tokenizer and Base Model
tokenizer = AutoTokenizer.from_pretrained(adapter_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")
# Load Sindhi Adapters
model = PeftModel.from_pretrained(model, adapter_id)
# Example Sindhi Input
prompt = "Ψ³ΩΪΩ Ω»ΩΩΩΨ‘Ω Ψ¬Ω Ψ§ΩΩ
ΩΨͺ"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
with torch.no_grad():
outputs = model.generate(**inputs, max_new_tokens=50, temperature=0.7, do_sample=True)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
- Downloads last month
- -
Model tree for aakashMeghwar01/SindhiLM-Qwen-0.5B
Base model
Qwen/Qwen2.5-0.5B