Model Loading

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

model_id = "iCIIT/mmlu-fine-tuned-model-ris-sinhala-qwen2.5-1.5b-ft"

# Define 4-bit quantization config
quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",  
    bnb_4bit_compute_dtype="float16"  # Use "bfloat16" if your GPU supports it
)

# Load tokenizer and quantized model
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=quant_config,
    device_map="auto"
)
model.eval()
---
Downloads last month
4
Safetensors
Model size
2B params
Tensor type
F32
F16
U8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for iCIIT/mmlu-fine-tuned-model-ris-sinhala-qwen2.5-1.5b-ft

Quantized
(67)
this model