Qwen3-8B Sindhi CPT (Continued Pre-Training)

This is a LoRA adapter for Qwen3-8B, continued pre-trained on ~164M tokens of Sindhi text.


Model Details

Property Value
Base Model unsloth/Qwen3-8B-bnb-4bit
Training Type Continued Pre-Training (CPT)
Training Tokens ~164M Sindhi tokens
LoRA Rank 32
LoRA Alpha 64
Sequence Length 2048
Quantization 4-bit (bnb)
Framework Unsloth + HuggingFace PEFT

Usage

Option 1 β€” Load with Unsloth (recommended, faster)

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name   = "hellosindh/qwen3-sindhi-cpt",
    load_in_4bit = True,
    max_seq_length = 2048,
)

# Enable fast inference
FastLanguageModel.for_inference(model)

Option 2 β€” Load base + adapter separately with PEFT

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen3-8B",
    torch_dtype  = torch.bfloat16,
    device_map   = "auto",
    load_in_4bit = True,
)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("hellosindh/qwen3-sindhi-cpt")

# Apply Sindhi adapter on top
model = PeftModel.from_pretrained(base_model, "hellosindh/qwen3-sindhi-cpt")

Generate Sindhi text

inputs = tokenizer("Ψ³Ω†ΪŒ جي Ω…Ψ§Ϊ»Ω‡Ωˆ", return_tensors="pt").to("cuda")

outputs = model.generate(
    **inputs,
    max_new_tokens  = 200,
    temperature     = 0.8,
    do_sample       = True,
    repetition_penalty = 1.1,
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Details

  • Dataset: ~164M Sindhi tokens from multiple sources
  • Tokenizer: Qwen3 original tokenizer (no modifications)
  • Hardware: NVIDIA A100 40GB
  • Framework: Unsloth for efficient training
  • Optimizer: AdamW 8-bit
  • Learning Rate: 5e-5 with cosine scheduler
  • Final Loss: ~1.20

Intended Use

  • Sindhi text generation
  • Synthetic data generation for low-resource Sindhi NLP
  • Base for further fine-tuning on Sindhi tasks (NER, QA, summarization)
  • Pretraining data augmentation for encoder models like SindhiBERT

Limitations

  • This is a continued pre-training adapter, not an instruction-tuned model
  • Outputs may not be factually accurate β€” intended for linguistic pattern learning
  • Best used as a base for task-specific fine-tuning
Downloads last month
22
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for hellosindh/qwen3-sindhi-cpt

Finetuned
Qwen/Qwen3-8B
Adapter
(6)
this model