Phi2-drug_data / README.md
Gaykar's picture
Update README.md
79c05cd verified
---
base_model: microsoft/phi-2
library_name: peft
pipeline_tag: text-generation
tags:
- base_model:adapter:microsoft/phi-2
- lora
- transformers
license: cc-by-nc-4.0
datasets:
- Gaykar/DrugData
---
# Model Card for Model ID
This model is a LoRA-based fine-tuned variant of Microsoft Phi-2, designed to generate concise, medical-style textual descriptions of drugs.
Given a drug name as input, the model produces a short, single-paragraph description following an instruction-style prompt format.
The training pipeline consists of two stages:
Continued Pretraining (CPT) on domain-relevant medical and pharmaceutical text to adapt the base model to the language and terminology of the domain.
Supervised Fine-Tuning (SFT) using structured drug name–description pairs to guide the model toward consistent formatting and domain-specific writing style.
This model is intended **strictly for educational and research purposes** and must not be used for real-world medical, clinical, or decision-making applications.
---
## Model Details
### Model Description
This model is a parameter-efficient fine-tuned version of the Microsoft Phi-2 language model, adapted to generate concise medical drug descriptions from drug names. The training pipeline consists of two stages:
1. **Continued Pretraining (CPT)** to adapt the base model to drug and medical terminology.
2. **Supervised Fine-Tuning (SFT)** using instruction-style input–output pairs.
LoRA adapters were used during fine-tuning to reduce memory usage and training cost while preserving base model knowledge.
- **Developed by:** Atharva Gaykar
- **Funded by:** Not applicable
- **Shared by:** Atharva Gaykar
- **Model type:** Causal Language Model (LoRA-adapted)
- **Language(s) (NLP):** English
- **License:** CC-BY-NC 4.0
- **Finetuned from model:** microsoft/phi-2
---
## Uses
This model is designed to generate concise medical-style descriptions of drugs given their names.
### Direct Use
- Educational demonstrations of instruction-following language models
- Academic research on medical-domain adaptation
- Experimentation with CPT + SFT pipelines
- Studying hallucination behavior in domain-specific LLMs
The model should only be used in **non-production, educational, or research settings**.
### Out-of-Scope Use
This model is **not designed or validated** for:
- Medical diagnosis or treatment planning
- Clinical decision support systems
- Dosage recommendations or prescribing guidance
- Patient-facing healthcare applications
- Professional medical, pharmaceutical, or regulatory use
- Any real-world deployment where incorrect medical information could cause harm
---
## Bias, Risks, and Limitations
This model was developed **solely for educational purposes** and **must not be used in real-world medical or clinical decision-making**.
### Known Limitations
- May hallucinate incorrect drug indications or mechanisms
- Generated descriptions may be incomplete or outdated
- Does not verify outputs against authoritative medical sources
- Does not understand patient context, dosage, or drug interactions
- Output quality is sensitive to prompt phrasing
### Risks
- Misinterpretation of outputs as medical advice
- Overconfidence in fluent but inaccurate responses
- Potential propagation of misinformation if misused
### Recommendations
- Always verify outputs using trusted medical references
- Use only in controlled, non-production environments
- Clearly disclose limitations in any downstream use
- Avoid deployment in safety-critical or healthcare systems
---
## How to Get Started with the Model
This repository contains **LoRA adapter weights**, not a full model.
Example usage (conceptual):
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
# Load base model and tokenizer
base_model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2")
tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-2")
# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "Gaykar/Phi2-drug_data")
model.eval()
import torch
# Drug to evaluate
drug_name = "Paracetamol"
# Build evaluation prompt
eval_prompt = (
"Generate exactly ONE sentence describing the drug.\n"
"Do not include headings or extra information.\n\n"
f"Drug Name: {drug_name}\n"
"Description:"
)
# Tokenize prompt
model_input = tokenizer(
eval_prompt,
return_tensors="pt"
).to(model.device)
# Generate output (greedy decoding)
with torch.no_grad():
output = model.generate(
**model_input,
do_sample=False,
num_beams=1, # Greedy decoding (This decision is critical for this model because it operates in the medical domain, where factual consistency and determinism are more important than linguistic diversity.)
max_new_tokens=120,
repetition_penalty=1.1,
eos_token_id=tokenizer.eos_token_id
)
# Remove prompt tokens
prompt_length = model_input["input_ids"].shape[1]
generated_tokens = output[0][prompt_length:]
# Decode generated text only
generated_text = tokenizer.decode(
generated_tokens,
skip_special_tokens=True
).strip()
# Enforce single-sentence output
if "." in generated_text:
generated_text = generated_text.split(".")[0] + "."
print(" DRUG NAME:", drug_name)
print(" MODEL GENERATED DESCRIPTION:")
print(generated_text)
#Example output
DRUG NAME (EVAL): Paracetamol
MODEL GENERATED DESCRIPTION:
Paracetamol (acetaminophen) is a non-narcotic analgesic and antipyretic used to relieve mild to moderate pain and reduce fever.
````
---
## Training Details
### Training Data
* **Dataset:** Gaykar/DrugData
* Structured drug name–description pairs
* Used for both CPT (domain adaptation) and SFT (instruction following)
### Training Procedure
#### Continued Pretraining (CPT)
The base model was further trained on domain-relevant medical and drug-related text to improve familiarity with terminology and style. CPT focused on next-token prediction without instruction formatting.
#### Supervised Fine-Tuning (SFT)
After CPT, the model was fine-tuned using instruction-style prompts to generate concise medical descriptions from drug names.
#### Training Hyperparameters
**CPT Hyperparameters**
| Hyperparameter | Value |
| ----------------------- | ------------------- |
| Batch size (per device) | 1 |
| Effective batch size | 8 |
| Epochs | 4 |
| Learning rate | 2e-4 |
| Precision | FP16 |
| Optimizer | Paged AdamW (8-bit) |
| Logging steps | 10 |
| Checkpoint saving | Every 500 steps |
| Checkpoint limit | 2 |
**SFT Hyperparameters**
| Hyperparameter | Value |
| ----------------------- | ------------------- |
| Batch size (per device) | 4 |
| Gradient accumulation | 1 |
| Effective batch size | 4 |
| Epochs | 5 |
| Learning rate | 2e-5 |
| LR scheduler | Linear |
| Warmup ratio | 6% |
| Weight decay | 1e-4 |
| Max gradient norm | 1.0 |
| Precision | FP16 |
| Optimizer | Paged AdamW (8-bit) |
| Checkpoint saving | Every 50 steps |
| Checkpoint limit | 2 |
| Experiment tracking | Weights & Biases |
---
## Evaluation
### Testing Data
Drug names sampled from the same dataset were used for evaluation. Outputs were assessed for factual correctness using an external LLM-based evaluation approach.
### Metrics
**Evaluation Method:** LLM-as-a-Judge (Chatgpt -Web seacrch available. )
* Binary classification: Factually Correct / Hallucinated
* Three evaluation batches
### Results
**Batch 1**
| Category | Count | Percentage |
| --------------------- | ----- | ---------- |
| Total Drugs Evaluated | 25 | 100% |
| Factually Correct | 24 | 96% |
| Hallucinated / Failed | 1 | 4% |
**Batch 2**
| Category | Count | Percentage |
| --------------------- | ----- | ---------- |
| Total Drugs Evaluated | 25 | 100% |
| Factually Correct | 22 | 88% |
| Hallucinated / Failed | 3 | 12% |
**Batch 3**
| Category | Count | Percentage |
| --------------------- | ----- | ---------- |
| Total Drugs Evaluated | 22 | 100% |
| Factually Correct | 15 | 68% |
| Hallucinated / Failed | 0 | 0% |
#### Summary
Since this model was fine-tuned (SFT+CPT) using LoRA rather than full-parameter fine-tuning, eliminating hallucinations entirely is challenging. While LoRA enables efficient training and strong instruction-following behavior, it does not fully overwrite the base model’s internal knowledge. Despite this limitation, the model performs well for educational and research-oriented drug description generation tasks.
---
## Environmental Impact
* **Hardware Type:** NVIDIA T4 GPU
* **Hours used:** Not recorded
* **Cloud Provider:** Google Colab
* **Compute Region:** Not specified
* **Carbon Emitted:** Not estimated
---
## Technical Specifications
### Model Architecture and Objective
* Base model: Microsoft Phi-2
* Objective: Instruction-following text generation
* Adaptation method: LoRA (PEFT)
### Compute Infrastructure
#### Hardware
* NVIDIA T4 GPU
#### Software
* Transformers
* PEFT
* PyTorch
---
## Model Card Contact
Atharva Gaykar
### Framework Versions
* PEFT 0.18.0