---
base_model: google/gemma-3-270m
library_name: peft
pipeline_tag: text-generation
tags:
- base_model:adapter:google/gemma-3-270m
- lora
- transformers
license: apache-2.0
datasets:
- bio-nlp-umass/bioinstruct
language:
- en
---
# Model Card: Gemma-3-270M BioInstruct LoRA (POC)

## Model Details

### Model Description

This model is a **proof-of-concept fine-tune** of **Gemma-3 270M parameters** on biomedical instruction data.
It was fine-tuned using the [bio-nlp-umass/bioinstruct](https://huggingface.co/datasets/bio-nlp-umass/bioinstruct) dataset, reformatted into a chat-like structure (`Instruction` / `Input` / `Answer`) to align with instruction-following behavior.

* **Developed by:** Kunj Shah
* **Model type:** Decoder-only causal LM (LoRA fine-tuned)
* **Language(s):** English (biomedical domain)
* **License:** Apache-2.0 (inherits from base Gemma-3)
* **Base model:** `google/gemma-3-270M`
* **Finetuning method:** Parameter-efficient LoRA adapters (attention + MLP projections)
* **Status:** Minimal proof of concept (not production-ready)

### Model Sources

* **Repository:** (fill with your HF repo link once pushed)
* **Demo / Endpoint:** Served via [vLLM](https://github.com/vllm-project/vllm) for efficient inference

---

## Uses

### Direct Use

* Biomedical text simplification
* Summarization of clinical notes into lay terms
* Identifying medications or clinical entities
* General instruction-following on medical prompts

### Downstream Use

* Further fine-tuning on specialized biomedical tasks (NER, relation extraction, QA)
* Integration into biomedical RAG (Retrieval-Augmented Generation) systems

### Out-of-Scope Use

* Production clinical decision support
* Any diagnostic or therapeutic use without human oversight
* General domain tasks outside biomedical text (not aligned for non-medical use)

---

## Bias, Risks, and Limitations

* **Domain bias:** Trained only on biomedical instructions; may hallucinate outside domain.
* **Not reliable for clinical care:** Outputs must not be used for patient-facing decisions.
* **Small model size (270M):** Limited reasoning and factual accuracy compared to larger LMs.

### Recommendations

Use strictly for **research and experimentation**. Do **not** deploy in production medical settings. Pair with RAG or external validation for any downstream pipeline.

---

## How to Get Started

### Inference with Transformers + PEFT

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

model_name = "google/gemma-3-270m"
adapter_dir = "kunj/gemma3-270m-bioinstruct-lora"  # replace with your HF repo
tokenizer = AutoTokenizer.from_pretrained(model_name)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

base = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="auto")
model = PeftModel.from_pretrained(base, adapter_dir)
model.eval()

prompt = "Instruction: Summarize this clinical note.\nInput: Patient with hypertension and diabetes admitted with dyspnea. Echocardiogram shows EF 30%.\nAnswer: "

enc = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
    out = model.generate(**enc, max_new_tokens=128)
print(tokenizer.decode(out[0][enc['input_ids'].shape[1]:], skip_special_tokens=True))
```

### Inference with vLLM

```bash
vllm serve kunj/gemma3-270m-bioinstruct-lora \
  --dtype bfloat16 \
  --max-model-len 2048
```

Then query the endpoint with your biomedical instruction.

---

## Training Details

### Training Data

* Dataset: [bio-nlp-umass/bioinstruct](https://huggingface.co/datasets/bio-nlp-umass/bioinstruct)
* Preprocessing: Reformatted into chat-style with explicit `Instruction`, `Input`, and `Answer` fields.

### Training Procedure

* **Method:** LoRA fine-tuning (attention + MLP projections)
* **Sequence length:** 2048 (with packing)
* **Batching:** 16 per device × 8 grad accumulation = effective 128 sequences/step
* **Epochs:** 3
* **Optimizer:** AdamW (fused), cosine LR schedule
* **Learning rate:** 5e-5
* **Precision:** bf16 mixed precision on A100
* **Gradient checkpointing:** Enabled
* **Attention implementation:** FlashAttention-2

---

## Evaluation

This POC was not benchmarked on standard biomedical leaderboards.
Sanity-checked on held-out examples from *bioinstruct*: shows coherent simplifications and medication extraction, but suffers from typical small-LM hallucinations.

---

## Environmental Impact

* **Hardware:** NVIDIA A100 40GB
* **Sequence length:** 2048
* **Training epochs:** 3

---

## Technical Specifications

* **Architecture:** Gemma-3 (decoder-only transformer, 270M parameters)
* **Objective:** Causal LM loss with masked labels (user/system ignored, assistant supervised)
* **Compute Infrastructure:** Single A100 GPU, Hugging Face Transformers + PEFT + FlashAttention-2

---

## Model Card Contact

* **Author:** Kunj Shah
* **Contact:** [Portfolio](https://kunjcr2.github.io)