Qwen3-8B Sindhi CPT (Continued Pre-Training)

This is a LoRA adapter for Qwen3-8B, continued pre-trained on ~164M tokens of Sindhi text.

Model Details

Property	Value
Base Model	`unsloth/Qwen3-8B-bnb-4bit`
Training Type	Continued Pre-Training (CPT)
Training Tokens	~164M Sindhi tokens
LoRA Rank	32
LoRA Alpha	64
Sequence Length	2048
Quantization	4-bit (bnb)
Framework	Unsloth + HuggingFace PEFT

Usage

Option 1 — Load with Unsloth (recommended, faster)

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name   = "hellosindh/qwen3-sindhi-cpt",
    load_in_4bit = True,
    max_seq_length = 2048,
)

# Enable fast inference
FastLanguageModel.for_inference(model)

Option 2 — Load base + adapter separately with PEFT

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen3-8B",
    torch_dtype  = torch.bfloat16,
    device_map   = "auto",
    load_in_4bit = True,
)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("hellosindh/qwen3-sindhi-cpt")

# Apply Sindhi adapter on top
model = PeftModel.from_pretrained(base_model, "hellosindh/qwen3-sindhi-cpt")

Generate Sindhi text

inputs = tokenizer("سنڌ جي ماڻهو", return_tensors="pt").to("cuda")

outputs = model.generate(
    **inputs,
    max_new_tokens  = 200,
    temperature     = 0.8,
    do_sample       = True,
    repetition_penalty = 1.1,
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Details

Dataset: ~164M Sindhi tokens from multiple sources
Tokenizer: Qwen3 original tokenizer (no modifications)
Hardware: NVIDIA A100 40GB
Framework: Unsloth for efficient training
Optimizer: AdamW 8-bit
Learning Rate: 5e-5 with cosine scheduler
Final Loss: ~1.20

Intended Use

Sindhi text generation
Synthetic data generation for low-resource Sindhi NLP
Base for further fine-tuning on Sindhi tasks (NER, QA, summarization)
Pretraining data augmentation for encoder models like SindhiBERT

Limitations

This is a continued pre-training adapter, not an instruction-tuned model
Outputs may not be factually accurate — intended for linguistic pattern learning
Best used as a base for task-specific fine-tuning

Downloads last month: 1

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for hellosindh/qwen3-sindhi-cpt

Base model

Qwen/Qwen3-8B-Base

Finetuned

Qwen/Qwen3-8B

Quantized

unsloth/Qwen3-8B-bnb-4bit

Adapter

(7)

this model