SindhiLM-Qwen-0.5B (v1.0)

Developed by Aakash Meghwar (Founder, Text Tech Solutions)

πŸ“Œ Project Overview

This model is a specialized Large Language Model (LLM) for the Sindhi Language, developed as part of the broader SindhiLM Project. It is a fine-tuned version of the Qwen2.5-0.5B architecture, specifically optimized to understand and generate Sindhi text using a custom-built BPE tokenizer.

The model was trained for 10,000 steps using a 4-bit LoRA (Low-Rank Adaptation) approach. This project adheres to the "Green AI" philosophy, providing a high-performance, compact model that is accessible for low-resource languages and deployable on consumer-grade hardware.

πŸš€ About the Author

I am Aakash Meghwar, a Computational Linguist specializing in South Asian languages.

  • πŸŽ“ M.S. in Applied Linguistics & Text Analytics (HSE, Russia - Graduating June 2026)
  • πŸŽ“ B.S. in English Language & Literature (NUML, Islamabad)
  • πŸ’‘ Founder: Text Tech Solutions
  • ✍️ Published Researcher: Author of "Compact Transformer Models for Classical Urdu Poetry" (Corporum Journal).

🀝 Open for Collaboration & PhD Opportunities

I am actively seeking PhD opportunities and Research Collaborations in the following areas:

  • Low-Resource NLP: Developing efficient models for Sindhi, Urdu, and Siraiki.
  • Literary Informatics: Computational stylistics and affective registers in classical poetry.
  • SindhiLM Evolution: I am currently developing SindhiLM-v2 (featuring an improved Sindhi-BPE tokenizer and knowledge distillation).

Are you looking for a researcher to join your NLP lab or organization? I am open to discussing projects involving Model Compression, Cross-Lingual Transfer, and South Asian Language Tech.

πŸ“© Contact & Links

πŸ’» How to Use

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

model_id = "Qwen/Qwen2.5-0.5B"
adapter_id = "aakashMeghwar01/SindhiLM-Qwen-0.5B"

# Load Tokenizer and Base Model
tokenizer = AutoTokenizer.from_pretrained(adapter_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")

# Load Sindhi Adapters
model = PeftModel.from_pretrained(model, adapter_id)

# Example Sindhi Input
prompt = "Ψ³Ω†ΪŒΩŠ Ω»ΩˆΩ„ΩŠΨ‘ΩŽ جي Ψ§Ω‡Ω…ΩŠΨͺ"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=50, temperature=0.7, do_sample=True)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for aakashMeghwar01/SindhiLM-Qwen-0.5B

Base model

Qwen/Qwen2.5-0.5B
Adapter
(356)
this model