🇮🇳 Hinglish Qwen2.5-7B Instruct — LoRA Fine-Tune

A bilingual (Hindi-English / Hinglish) fine-tune of Qwen2.5-7B-Instruct using QLoRA + Unsloth, designed to fluently converse in Hinglish — the natural code-mixed language spoken by 350M+ people across India.

⚡ Key Results

Metric	Baseline	Fine-tuned	Change
Eval Loss	2.57	0.90	↓ 65%
Training Loss	1.63	0.25	—
Hinglish Response Rate	10%	80%	↑ 8×

🔍 Before vs After

	Baseline (Qwen2.5-7B)	Fine-tuned (Ours)
"Yaar, Python seekhni hai"	Pure English bullet points, broken numbering	Natural Hinglish advice like a friend would give
"Ghar pe pizza banana sikhao"	Textbook English recipe format	Casual Hinglish with Indian context ("Domino's se bhi better!")
"Stock market invest karna chahiye?"	Generic English financial advice	Practical Hinglish with real talk ("Loan lekar mat karna!")

Model Details

Field	Value
Base Model	Qwen2.5-7B-Instruct (4-bit NF4)
Method	QLoRA — LoRA adapters on quantized base
Framework	Unsloth + HuggingFace TRL SFTTrainer
Hardware	NVIDIA RTX 3060 (12GB VRAM)
Precision	BF16 mixed precision
LoRA Config	r=32, α=64, RSLoRA, 7 target modules
Trainable Params	80.7M / 7.7B (1.05%)
Training Time	~2 hours, 675 steps, 3 epochs

Training

Parameter	Value
Epochs	3 (675 total steps)
Effective Batch Size	16 (1 × 16 grad accum)
Learning Rate	2e-4 → cosine decay
Warmup	50 steps
Optimizer	AdamW 8-bit
Eval Strategy	Every 100 steps
Best Checkpoint	Step 400 (eval loss 0.899)

Loss Curve

Dataset

4,000 curated samples from three sources:

Source	Samples
HuggingFace (HydraLM, NebulaByte, findnitai)	~2,000
Gemini Synthetic (5 categories)	~1,000
Alpaca → Hinglish Translation	~1,000

Format: ChatML with Hinglish system prompt. Split: 90% train / 10% eval.

Usage

from unsloth import FastLanguageModel
from peft import PeftModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/Qwen2.5-7B-Instruct-bnb-4bit",
    max_seq_length=2048,
    load_in_4bit=True,
)
model = PeftModel.from_pretrained(model, "psyka-101/hinglish-qwen2.5-7b-instruct")
FastLanguageModel.for_inference(model)

messages = [
    {"role": "system", "content": "Aap ek helpful AI assistant hain jo Hinglish mein baat kar sakta hai."},
    {"role": "user", "content": "Yaar, mujhe Python seekhni hai. Kahan se start karun?"},
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")
outputs = model.generate(inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Limitations

Trained on ~4K samples — may not generalize to all dialects or specialized topics
Checkpoint-400 recommended (eval loss rises in epoch 3)
May still default to pure English on some prompts
Roman-script Hinglish only — no Devanagari
Not tested for safety/toxicity

License

Inherits the Qwen2.5 license.

_{Built with 🦥 Unsloth • 🤗 Transformers • 🇮🇳 Hinglish}

Downloads last month: -

Model tree for psyka-101/hinglish-qwen2.5-7b-instruct

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-7B-Instruct

Adapter

(2130)

this model

Datasets used to train psyka-101/hinglish-qwen2.5-7b-instruct

Evaluation results

Eval Loss
self-reported

0.899
Training Loss (final)
self-reported

0.250