---
license: mit
---

### 📘 Model Overview

This model is a **LoRA fine-tuned version** of Microsoft’s [BioGPT](https://huggingface.co/microsoft/biogpt), specialized for **instruction-style question answering and reasoning** in the **biomedical and healthcare domain**.

It was trained using **2,000 medical instruction–response pairs** to enhance BioGPT’s ability to:

* Follow instructions,
* Generate medically coherent explanations,
* Answer clinical or biomedical reasoning questions in natural language.

---

### 🧠 Model Details

| Feature                | Description                                                                                                                                           |
| ---------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Base Model**         | [microsoft/biogpt](https://huggingface.co/microsoft/biogpt)                                                                                           |
| **Fine-tuning Method** | LoRA (Low-Rank Adaptation) using PEFT                                                                                                                 |
| **Dataset Used**       | [FreedomIntelligence/medical-o1-reasoning-SFT](https://huggingface.co/datasets/FreedomIntelligence/medical-o1-reasoning-SFT) (subset of 2000 samples) |
| **Training Objective** | Causal Language Modeling (Instruction → Response)                                                                                                     |
| **Frameworks**         | 🤗 Transformers, PEFT, PyTorch                                                                                                                        |
| **Hardware**           | Trained on a single NVIDIA GPU (e.g., T4 or A100)                                                                                                     |


---

### 💬 Example Usage

```python
import torch
from transformers import BioGptTokenizer, BioGptForCausalLM, set_seed

# Load fine-tuned model
model_name = "CloveAI/clov-bio-0.3b-instruct"
tokenizer = BioGptTokenizer.from_pretrained(model_name)
model = BioGptForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16).to("cuda")

# Function to get a clean model response
def generate_response(instruction):
    # Format the instruction properly
    prompt = f"### Instruction: {instruction}\n### Response:"
    
    # Tokenize
    inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
    
    # Reproducibility
    set_seed(42)
    
    # Generate
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            min_length=100,
            max_length=1024,
            temperature=0.5,   # lower = more factual, less hallucination
            top_p=0.9,
            do_sample=True,
            eos_token_id=tokenizer.eos_token_id,
        )
    
    # Decode and clean output
    text = tokenizer.decode(outputs[0], skip_special_tokens=True)
    if "### Response:" in text:
        text = text.split("### Response:")[-1].strip()
    if "### Instruction:" in text:
        text = text.split("### Instruction:")[0].strip()
    text = text.replace(instruction, "").strip()
    
    return text

# 🧍‍♂️ User Input
print("🧠 BioGPT Instruct — Medical Query Assistant\n")
user_query = input("Enter your medical question or instruction:\n> ")

# Get and display the response
response = generate_response(user_query)
print("\n🧠 Model Response:\n")
print(response)
```