GemmaBoolQ-270M-Finetuned

This model is a fine-tuned version of google/gemma-3-270m on the BoolQ dataset.

It achieves 63.98% accuracy on the validation set, a significant improvement over the baseline accuracy of 37.83%.

🏆 Performance

Metric	Baseline	This Model	Improvement
Accuracy	37.83%	63.98%	+26.15%

🚀 Usage

Installation

pip install transformers peft bitsandbytes accelerate

Inference Pipeline

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel

# 1. Load Base Model (Quantized)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_quant_type="nf4",
)

base_model = AutoModelForCausalLM.from_pretrained(
    "google/gemma-3-270m",
    quantization_config=bnb_config,
    device_map="auto",
)

# 2. Load Fine-tuned Adapter
model = PeftModel.from_pretrained(base_model, "ViswanthSai/GemmaBoolQ-270M-Finetuned")
model.eval()

tokenizer = AutoTokenizer.from_pretrained("ViswanthSai/GemmaBoolQ-270M-Finetuned")

# 3. Define Helper for Yes/No Classification
def classify(question):
    prompt = f"Question: {question}\nAnswer:"
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    
    # Force 'yes'/'no' tokens
    yes_token = tokenizer.encode(" yes", add_special_tokens=False)[0]
    no_token = tokenizer.encode(" no", add_special_tokens=False)[0]
    
    with torch.no_grad():
        outputs = model.generate(
            **inputs, 
            max_new_tokens=1, 
            do_sample=False
        )
    
    # Simple check (in production use constrained decoding)
    token_id = outputs[0, -1].item()
    if token_id == yes_token: return "yes"
    if token_id == no_token: return "no"
    return "unknown"

# 4. Run
print(classify("is the sky blue?"))
# Output: yes

🛠️ Training Details

Method: QLoRA (4-bit quantization + LoRA)
Base Model: google/gemma-3-270m
Dataset: BoolQ
Epochs: 3
Learning Rate: 2e-4
Optimizer: paged_adamw_8bit
Hardware: Trained on single NVIDIA RTX 3050 (4GB VRAM)

Critical Training Fixes

To achieve this performance, the following techniques were used:

Label Masking: Only training on the answer tokens (masking instruction with -100).
FP32 Casting: Using prepare_model_for_kbit_training to prevent NaNs during inference.
Constrained Generation: Forcing output to valid yes/no tokens.

📜 License

Apache 2.0

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ViswanthSai/GemmaBoolQ-270M-Finetuned

Base model

google/gemma-3-270m

Finetuned

(135)

this model

Dataset used to train ViswanthSai/GemmaBoolQ-270M-Finetuned

Evaluation results

Accuracy on BoolQ
self-reported

63.980