GemmaBoolQ-270M-Finetuned

This model is a fine-tuned version of google/gemma-3-270m on the BoolQ dataset.

It achieves 63.98% accuracy on the validation set, a significant improvement over the baseline accuracy of 37.83%.

πŸ† Performance

Metric Baseline This Model Improvement
Accuracy 37.83% 63.98% +26.15%

πŸš€ Usage

Installation

pip install transformers peft bitsandbytes accelerate

Inference Pipeline

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel

# 1. Load Base Model (Quantized)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_quant_type="nf4",
)

base_model = AutoModelForCausalLM.from_pretrained(
    "google/gemma-3-270m",
    quantization_config=bnb_config,
    device_map="auto",
)

# 2. Load Fine-tuned Adapter
model = PeftModel.from_pretrained(base_model, "ViswanthSai/GemmaBoolQ-270M-Finetuned")
model.eval()

tokenizer = AutoTokenizer.from_pretrained("ViswanthSai/GemmaBoolQ-270M-Finetuned")

# 3. Define Helper for Yes/No Classification
def classify(question):
    prompt = f"Question: {question}\nAnswer:"
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    
    # Force 'yes'/'no' tokens
    yes_token = tokenizer.encode(" yes", add_special_tokens=False)[0]
    no_token = tokenizer.encode(" no", add_special_tokens=False)[0]
    
    with torch.no_grad():
        outputs = model.generate(
            **inputs, 
            max_new_tokens=1, 
            do_sample=False
        )
    
    # Simple check (in production use constrained decoding)
    token_id = outputs[0, -1].item()
    if token_id == yes_token: return "yes"
    if token_id == no_token: return "no"
    return "unknown"

# 4. Run
print(classify("is the sky blue?"))
# Output: yes

πŸ› οΈ Training Details

  • Method: QLoRA (4-bit quantization + LoRA)
  • Base Model: google/gemma-3-270m
  • Dataset: BoolQ
  • Epochs: 3
  • Learning Rate: 2e-4
  • Optimizer: paged_adamw_8bit
  • Hardware: Trained on single NVIDIA RTX 3050 (4GB VRAM)

Critical Training Fixes

To achieve this performance, the following techniques were used:

  1. Label Masking: Only training on the answer tokens (masking instruction with -100).
  2. FP32 Casting: Using prepare_model_for_kbit_training to prevent NaNs during inference.
  3. Constrained Generation: Forcing output to valid yes/no tokens.

πŸ“œ License

Apache 2.0

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for ViswanthSai/GemmaBoolQ-270M-Finetuned

Finetuned
(114)
this model

Dataset used to train ViswanthSai/GemmaBoolQ-270M-Finetuned

Evaluation results