--- base_model: google/gemma-3-270m library_name: peft pipeline_tag: text-generation tags: - base_model:adapter:google/gemma-3-270m - lora - transformers license: apache-2.0 datasets: - bio-nlp-umass/bioinstruct language: - en --- # Model Card: Gemma-3-270M BioInstruct LoRA (POC) ## Model Details ### Model Description This model is a **proof-of-concept fine-tune** of **Gemma-3 270M parameters** on biomedical instruction data. It was fine-tuned using the [bio-nlp-umass/bioinstruct](https://huggingface.co/datasets/bio-nlp-umass/bioinstruct) dataset, reformatted into a chat-like structure (`Instruction` / `Input` / `Answer`) to align with instruction-following behavior. * **Developed by:** Kunj Shah * **Model type:** Decoder-only causal LM (LoRA fine-tuned) * **Language(s):** English (biomedical domain) * **License:** Apache-2.0 (inherits from base Gemma-3) * **Base model:** `google/gemma-3-270M` * **Finetuning method:** Parameter-efficient LoRA adapters (attention + MLP projections) * **Status:** Minimal proof of concept (not production-ready) ### Model Sources * **Repository:** (fill with your HF repo link once pushed) * **Demo / Endpoint:** Served via [vLLM](https://github.com/vllm-project/vllm) for efficient inference --- ## Uses ### Direct Use * Biomedical text simplification * Summarization of clinical notes into lay terms * Identifying medications or clinical entities * General instruction-following on medical prompts ### Downstream Use * Further fine-tuning on specialized biomedical tasks (NER, relation extraction, QA) * Integration into biomedical RAG (Retrieval-Augmented Generation) systems ### Out-of-Scope Use * Production clinical decision support * Any diagnostic or therapeutic use without human oversight * General domain tasks outside biomedical text (not aligned for non-medical use) --- ## Bias, Risks, and Limitations * **Domain bias:** Trained only on biomedical instructions; may hallucinate outside domain. * **Not reliable for clinical care:** Outputs must not be used for patient-facing decisions. * **Small model size (270M):** Limited reasoning and factual accuracy compared to larger LMs. ### Recommendations Use strictly for **research and experimentation**. Do **not** deploy in production medical settings. Pair with RAG or external validation for any downstream pipeline. --- ## How to Get Started ### Inference with Transformers + PEFT ```python from transformers import AutoTokenizer, AutoModelForCausalLM from peft import PeftModel import torch model_name = "google/gemma-3-270m" adapter_dir = "kunj/gemma3-270m-bioinstruct-lora" # replace with your HF repo tokenizer = AutoTokenizer.from_pretrained(model_name) if tokenizer.pad_token is None: tokenizer.pad_token = tokenizer.eos_token base = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="auto") model = PeftModel.from_pretrained(base, adapter_dir) model.eval() prompt = "Instruction: Summarize this clinical note.\nInput: Patient with hypertension and diabetes admitted with dyspnea. Echocardiogram shows EF 30%.\nAnswer: " enc = tokenizer(prompt, return_tensors="pt").to(model.device) with torch.no_grad(): out = model.generate(**enc, max_new_tokens=128) print(tokenizer.decode(out[0][enc['input_ids'].shape[1]:], skip_special_tokens=True)) ``` ### Inference with vLLM ```bash vllm serve kunj/gemma3-270m-bioinstruct-lora \ --dtype bfloat16 \ --max-model-len 2048 ``` Then query the endpoint with your biomedical instruction. --- ## Training Details ### Training Data * Dataset: [bio-nlp-umass/bioinstruct](https://huggingface.co/datasets/bio-nlp-umass/bioinstruct) * Preprocessing: Reformatted into chat-style with explicit `Instruction`, `Input`, and `Answer` fields. ### Training Procedure * **Method:** LoRA fine-tuning (attention + MLP projections) * **Sequence length:** 2048 (with packing) * **Batching:** 16 per device × 8 grad accumulation = effective 128 sequences/step * **Epochs:** 3 * **Optimizer:** AdamW (fused), cosine LR schedule * **Learning rate:** 5e-5 * **Precision:** bf16 mixed precision on A100 * **Gradient checkpointing:** Enabled * **Attention implementation:** FlashAttention-2 --- ## Evaluation This POC was not benchmarked on standard biomedical leaderboards. Sanity-checked on held-out examples from *bioinstruct*: shows coherent simplifications and medication extraction, but suffers from typical small-LM hallucinations. --- ## Environmental Impact * **Hardware:** NVIDIA A100 40GB * **Sequence length:** 2048 * **Training epochs:** 3 --- ## Technical Specifications * **Architecture:** Gemma-3 (decoder-only transformer, 270M parameters) * **Objective:** Causal LM loss with masked labels (user/system ignored, assistant supervised) * **Compute Infrastructure:** Single A100 GPU, Hugging Face Transformers + PEFT + FlashAttention-2 --- ## Model Card Contact * **Author:** Kunj Shah * **Contact:** [Portfolio](https://kunjcr2.github.io)