Gemma-2B Dolly QLoRA (LoRA Adapter)

This repository contains LoRA/QLoRA adapter weights fine-tuned from google/gemma-2b on a subset of Databricks Dolly 15k.
It is not a full model checkpoint — you load it on top of the base Gemma-2B model.

Important (Gemma is gated / restricted): You must accept Google’s Gemma terms to access the base model on Hugging Face, and downstream use/distribution must comply with those terms.

Model Details

  • Adapter type: LoRA (PEFT)
  • Base model: google/gemma-2b (gated)
  • Training method: QLoRA (4-bit NF4) + LoRA adapters
  • Language: English
  • Files in this repo: adapter_model.safetensors, adapter_config.json, tokenizer files, run_metadata.json

Intended Uses

Direct use

  • Lightweight instruction-following adaptation for general assistant-style tasks (email writing, summarization, extraction, checklists).

Training Data

  • Dataset: databricks/databricks-dolly-15k
  • Filtering: only rows with empty context (no-context subset)
  • Split: test_size=0.05
  • Subsampled for Colab-friendly run: max_train_samples=2000, max_eval_samples=200

Training Procedure

Prompt format

Each example is formatted as:

Instruction:
{instruction}

Response:
{response}

An EOS token is appended during preprocessing.

Quantization (QLoRA)

  • 4-bit quantization: NF4
  • Double quantization: enabled
  • Compute dtype: bf16 if supported, else fp16

LoRA configuration

LoRA is applied to common attention + MLP projection modules:

  • q_proj, k_proj, v_proj, o_proj
  • gate_proj, up_proj, down_proj

Hyperparameters

From run_metadata.json:

Setting Value
max_seq_length 512
num_train_epochs 1.0
per_device_train_batch_size 1
gradient_accumulation_steps 4
learning_rate 2e-4
warmup_steps 10
logging_steps 10
save_steps 100
LoRA r 8
LoRA alpha 16
LoRA dropout 0.05
optimizer paged_adamw_8bit
seed 42

Evaluation

Quantitative

From run_metadata.json:

  • Train loss: 1.8468
  • Eval loss: 1.8447
  • Perplexity: 6.3264

Qualitative (prompt suite)

A small “before vs after” prompt suite is stored in:

  • run_metadata.json (baseline_outputs and after_outputs)

In general, the adapter improves:

  • Instruction adherence and cleaner formatting for short assistant tasks (emails, lists, extraction).

Known limitation observed in the same suite:

  • May regress on coding-style prompts (always validate code outputs).

How to Use

Load base model + adapter

import os
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel

BASE_MODEL_ID = "google/gemma-2b"
ADAPTER_ID = "ash001/gemma-2b-dolly-qlora-adapter"
HF_TOKEN = os.environ.get("HF_TOKEN")  # required for gated Gemma

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16 if torch.cuda.is_available() else torch.float32,
    bnb_4bit_use_double_quant=True,
)

tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL_ID, token=HF_TOKEN)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

base = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL_ID,
    quantization_config=bnb_config if torch.cuda.is_available() else None,
    device_map="auto" if torch.cuda.is_available() else None,
    token=HF_TOKEN,
)

model = PeftModel.from_pretrained(base, ADAPTER_ID)
model.eval()

prompt = "Instruction:\nExtract action items from: 'Finalize the agenda, book the room, share notes by Friday.'\n\nResponse:\n"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    out = model.generate(**inputs, max_new_tokens=128, temperature=0.7, top_p=0.9)

print(tokenizer.decode(out[0], skip_special_tokens=True))

License / Terms

  • Base model: google/gemma-2b is gated and subject to Google’s Gemma Terms of Use.
  • Dataset: Databricks Dolly 15k is licensed under CC BY-SA 3.0 (see dataset card).
  • Adapter weights: provided under the Gemma license umbrella as a derivative intended for use with the base model.

Links

Downloads last month
16
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ash001/gemma-2b-dolly-qlora-adapter

Base model

google/gemma-2b
Adapter
(23696)
this model