GemmaBoolQ-270M-Finetuned
This model is a fine-tuned version of google/gemma-3-270m on the BoolQ dataset.
It achieves 63.98% accuracy on the validation set, a significant improvement over the baseline accuracy of 37.83%.
π Performance
| Metric | Baseline | This Model | Improvement |
|---|---|---|---|
| Accuracy | 37.83% | 63.98% | +26.15% |
π Usage
Installation
pip install transformers peft bitsandbytes accelerate
Inference Pipeline
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
# 1. Load Base Model (Quantized)
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_quant_type="nf4",
)
base_model = AutoModelForCausalLM.from_pretrained(
"google/gemma-3-270m",
quantization_config=bnb_config,
device_map="auto",
)
# 2. Load Fine-tuned Adapter
model = PeftModel.from_pretrained(base_model, "ViswanthSai/GemmaBoolQ-270M-Finetuned")
model.eval()
tokenizer = AutoTokenizer.from_pretrained("ViswanthSai/GemmaBoolQ-270M-Finetuned")
# 3. Define Helper for Yes/No Classification
def classify(question):
prompt = f"Question: {question}\nAnswer:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
# Force 'yes'/'no' tokens
yes_token = tokenizer.encode(" yes", add_special_tokens=False)[0]
no_token = tokenizer.encode(" no", add_special_tokens=False)[0]
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=1,
do_sample=False
)
# Simple check (in production use constrained decoding)
token_id = outputs[0, -1].item()
if token_id == yes_token: return "yes"
if token_id == no_token: return "no"
return "unknown"
# 4. Run
print(classify("is the sky blue?"))
# Output: yes
π οΈ Training Details
- Method: QLoRA (4-bit quantization + LoRA)
- Base Model: google/gemma-3-270m
- Dataset: BoolQ
- Epochs: 3
- Learning Rate: 2e-4
- Optimizer: paged_adamw_8bit
- Hardware: Trained on single NVIDIA RTX 3050 (4GB VRAM)
Critical Training Fixes
To achieve this performance, the following techniques were used:
- Label Masking: Only training on the answer tokens (masking instruction with -100).
- FP32 Casting: Using
prepare_model_for_kbit_trainingto prevent NaNs during inference. - Constrained Generation: Forcing output to valid yes/no tokens.
π License
Apache 2.0
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support
Model tree for ViswanthSai/GemmaBoolQ-270M-Finetuned
Base model
google/gemma-3-270mDataset used to train ViswanthSai/GemmaBoolQ-270M-Finetuned
Evaluation results
- Accuracy on BoolQself-reported63.980