|
|
--- |
|
|
base_model: microsoft/phi-2 |
|
|
library_name: peft |
|
|
pipeline_tag: text-generation |
|
|
tags: |
|
|
- base_model:adapter:microsoft/phi-2 |
|
|
- lora |
|
|
- transformers |
|
|
license: cc-by-nc-4.0 |
|
|
datasets: |
|
|
- Gaykar/DrugData |
|
|
--- |
|
|
|
|
|
# Model Card for Model ID |
|
|
|
|
|
This model is a LoRA-based fine-tuned variant of Microsoft Phi-2, designed to generate concise, medical-style textual descriptions of drugs. |
|
|
Given a drug name as input, the model produces a short, single-paragraph description following an instruction-style prompt format. |
|
|
|
|
|
The training pipeline consists of two stages: |
|
|
|
|
|
Continued Pretraining (CPT) on domain-relevant medical and pharmaceutical text to adapt the base model to the language and terminology of the domain. |
|
|
|
|
|
Supervised Fine-Tuning (SFT) using structured drug name–description pairs to guide the model toward consistent formatting and domain-specific writing style. |
|
|
|
|
|
Importantly, this fine-tuning process is intended to capture the style and structure of medical descriptions only. |
|
|
It is not designed to inject, verify, or guarantee factual medical knowledge, and the model may produce incomplete, outdated, or inaccurate information. |
|
|
|
|
|
This model is intended **strictly for educational and research purposes** and must not be used for real-world medical, clinical, or decision-making applications. |
|
|
|
|
|
--- |
|
|
|
|
|
## Model Details |
|
|
|
|
|
### Model Description |
|
|
|
|
|
This model is a parameter-efficient fine-tuned version of the Microsoft Phi-2 language model, adapted to generate concise medical drug descriptions from drug names. The training pipeline consists of two stages: |
|
|
|
|
|
1. **Continued Pretraining (CPT)** to adapt the base model to drug and medical terminology. |
|
|
2. **Supervised Fine-Tuning (SFT)** using instruction-style input–output pairs. |
|
|
|
|
|
LoRA adapters were used during fine-tuning to reduce memory usage and training cost while preserving base model knowledge. |
|
|
|
|
|
- **Developed by:** Atharva Gaykar |
|
|
- **Funded by:** Not applicable |
|
|
- **Shared by:** Atharva Gaykar |
|
|
- **Model type:** Causal Language Model (LoRA-adapted) |
|
|
- **Language(s) (NLP):** English |
|
|
- **License:** CC-BY-NC 4.0 |
|
|
- **Finetuned from model:** microsoft/phi-2 |
|
|
|
|
|
--- |
|
|
|
|
|
## Uses |
|
|
|
|
|
This model is designed to generate concise medical-style descriptions of drugs given their names. |
|
|
|
|
|
### Direct Use |
|
|
|
|
|
- Educational demonstrations of instruction-following language models |
|
|
- Academic research on medical-domain adaptation |
|
|
- Experimentation with CPT + SFT pipelines |
|
|
- Studying hallucination behavior in domain-specific LLMs |
|
|
|
|
|
The model should only be used in **non-production, educational, or research settings**. |
|
|
|
|
|
### Out-of-Scope Use |
|
|
|
|
|
This model is **not designed or validated** for: |
|
|
|
|
|
- Medical diagnosis or treatment planning |
|
|
- Clinical decision support systems |
|
|
- Dosage recommendations or prescribing guidance |
|
|
- Patient-facing healthcare applications |
|
|
- Professional medical, pharmaceutical, or regulatory use |
|
|
- Any real-world deployment where incorrect medical information could cause harm |
|
|
|
|
|
--- |
|
|
|
|
|
## Bias, Risks, and Limitations |
|
|
|
|
|
This model was developed **solely for educational purposes** and **must not be used in real-world medical or clinical decision-making**. |
|
|
|
|
|
### Known Limitations |
|
|
|
|
|
- May hallucinate incorrect drug indications or mechanisms |
|
|
- Generated descriptions may be incomplete or outdated |
|
|
- Does not verify outputs against authoritative medical sources |
|
|
- Does not understand patient context, dosage, or drug interactions |
|
|
- Output quality is sensitive to prompt phrasing |
|
|
|
|
|
### Risks |
|
|
|
|
|
- Misinterpretation of outputs as medical advice |
|
|
- Overconfidence in fluent but inaccurate responses |
|
|
- Potential propagation of misinformation if misused |
|
|
|
|
|
### Recommendations |
|
|
|
|
|
- Always verify outputs using trusted medical references |
|
|
- Use only in controlled, non-production environments |
|
|
- Clearly disclose limitations in any downstream use |
|
|
- Avoid deployment in safety-critical or healthcare systems |
|
|
|
|
|
--- |
|
|
|
|
|
## How to Get Started with the Model |
|
|
|
|
|
This repository contains **LoRA adapter weights**, not a full model. |
|
|
|
|
|
Example usage (conceptual): |
|
|
|
|
|
```python |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
from peft import PeftModel |
|
|
|
|
|
# Load base model and tokenizer |
|
|
base_model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2") |
|
|
tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-2") |
|
|
|
|
|
# Load LoRA adapter |
|
|
model = PeftModel.from_pretrained(base_model, "Gaykar/phi2-drug-lora") |
|
|
|
|
|
model.eval() |
|
|
|
|
|
|
|
|
import torch |
|
|
|
|
|
# Drug to evaluate |
|
|
drug_name = "Anavip" |
|
|
|
|
|
# Build evaluation prompt |
|
|
eval_prompt = ( |
|
|
"Generate exactly ONE sentence describing the drug.\n" |
|
|
"Do not include headings or extra information.\n\n" |
|
|
f"Drug Name: {drug_name}\n" |
|
|
"Description:" |
|
|
) |
|
|
|
|
|
# Tokenize prompt |
|
|
model_input = tokenizer( |
|
|
eval_prompt, |
|
|
return_tensors="pt" |
|
|
).to(model.device) |
|
|
|
|
|
# Generate output (greedy decoding) |
|
|
with torch.no_grad(): |
|
|
output = model.generate( |
|
|
**model_input, |
|
|
do_sample=False, # Greedy decoding (This decision is critical for this model because it operates in the medical domain, where factual consistency and determinism are more important than linguistic diversity.) |
|
|
max_new_tokens=120, |
|
|
repetition_penalty=1.1, |
|
|
eos_token_id=tokenizer.eos_token_id |
|
|
) |
|
|
|
|
|
# Remove prompt tokens |
|
|
prompt_length = model_input["input_ids"].shape[1] |
|
|
generated_tokens = output[0][prompt_length:] |
|
|
|
|
|
# Decode generated text only |
|
|
generated_text = tokenizer.decode( |
|
|
generated_tokens, |
|
|
skip_special_tokens=True |
|
|
).strip() |
|
|
|
|
|
# Enforce single-sentence output |
|
|
if "." in generated_text: |
|
|
generated_text = generated_text.split(".")[0] + "." |
|
|
|
|
|
print(" DRUG NAME:", drug_name) |
|
|
print(" MODEL GENERATED DESCRIPTION:") |
|
|
print(generated_text) |
|
|
|
|
|
#Example output |
|
|
DRUG NAME (EVAL): Anavip |
|
|
|
|
|
MODEL GENERATED DESCRIPTION: |
|
|
Anavip (Crotalidae immune $F(ab')_{2}$ equine) is an antivenin used to treat adults and children with crotalid snake envenomation (rattlesnake, copperhead, or cottonmouth/water moccasin). |
|
|
|
|
|
```` |
|
|
|
|
|
--- |
|
|
|
|
|
## Training Details |
|
|
|
|
|
### Training Data |
|
|
|
|
|
* **Dataset:** Gaykar/DrugData |
|
|
* Structured drug name–description pairs |
|
|
* Used for both CPT (domain adaptation) and SFT (instruction following) |
|
|
|
|
|
### Training Procedure |
|
|
|
|
|
#### Continued Pretraining (CPT) |
|
|
|
|
|
The base model was further trained on domain-relevant medical and drug-related text to improve familiarity with terminology and style. CPT focused on next-token prediction without instruction formatting. |
|
|
|
|
|
#### Supervised Fine-Tuning (SFT) |
|
|
|
|
|
After CPT, the model was fine-tuned using instruction-style prompts to generate concise medical descriptions from drug names. |
|
|
|
|
|
#### Training Hyperparameters |
|
|
|
|
|
**CPT Hyperparameters** |
|
|
|
|
|
| Hyperparameter | Value | |
|
|
| ----------------------- | ------------------- | |
|
|
| Batch size (per device) | 1 | |
|
|
| Effective batch size | 8 | |
|
|
| Epochs | 4 | |
|
|
| Learning rate | 2e-4 | |
|
|
| Precision | FP16 | |
|
|
| Optimizer | Paged AdamW (8-bit) | |
|
|
| Logging steps | 10 | |
|
|
| Checkpoint saving | Every 500 steps | |
|
|
| Checkpoint limit | 2 | |
|
|
|
|
|
**SFT Hyperparameters** |
|
|
|
|
|
| Hyperparameter | Value | |
|
|
| ----------------------- | ------------------- | |
|
|
| Batch size (per device) | 4 | |
|
|
| Gradient accumulation | 1 | |
|
|
| Effective batch size | 4 | |
|
|
| Epochs | 5 | |
|
|
| Learning rate | 5e-5 | |
|
|
| LR scheduler | Linear | |
|
|
| Warmup ratio | 6% | |
|
|
| Weight decay | 1e-4 | |
|
|
| Max gradient norm | 1.0 | |
|
|
| Precision | FP16 | |
|
|
| Optimizer | Paged AdamW (8-bit) | |
|
|
| Checkpoint saving | Every 50 steps | |
|
|
| Checkpoint limit | 2 | |
|
|
| Experiment tracking | Weights & Biases | |
|
|
|
|
|
--- |
|
|
|
|
|
## Evaluation |
|
|
|
|
|
### Testing Data |
|
|
|
|
|
Drug names sampled from the same dataset were used for evaluation. Outputs were assessed for factual correctness using an external LLM-based evaluation approach. |
|
|
|
|
|
### Metrics |
|
|
|
|
|
**Evaluation Method:** LLM-as-a-Judge (Google Gemini ) |
|
|
|
|
|
* Binary classification: Factually Correct / Hallucinated |
|
|
* Three evaluation batches |
|
|
|
|
|
### Results |
|
|
|
|
|
**Batch 1** |
|
|
|
|
|
| Category | Count | Percentage | |
|
|
| --------------------- | ----- | ---------- | |
|
|
| Total Drugs Evaluated | 25 | 100% | |
|
|
| Factually Correct | 24 | 96% | |
|
|
| Hallucinated / Failed | 1 | 4% | |
|
|
|
|
|
**Batch 2** |
|
|
|
|
|
| Category | Count | Percentage | |
|
|
| --------------------- | ----- | ---------- | |
|
|
| Total Drugs Evaluated | 25 | 100% | |
|
|
| Factually Correct | 22 | 88% | |
|
|
| Hallucinated / Failed | 3 | 12% | |
|
|
|
|
|
**Batch 3** |
|
|
|
|
|
| Category | Count | Percentage | |
|
|
| --------------------- | ----- | ---------- | |
|
|
| Total Drugs Evaluated | 10 | 100% | |
|
|
| Factually Correct | 10 | 100% | |
|
|
| Hallucinated / Failed | 0 | 0% | |
|
|
|
|
|
#### Summary |
|
|
|
|
|
Since this model was fine-tuned using LoRA rather than full-parameter fine-tuning, eliminating hallucinations entirely is challenging. While LoRA enables efficient training and strong instruction-following behavior, it does not fully overwrite the base model’s internal knowledge. Despite this limitation, the model performs well for educational and research-oriented drug description generation tasks. |
|
|
|
|
|
--- |
|
|
|
|
|
## Environmental Impact |
|
|
|
|
|
* **Hardware Type:** NVIDIA T4 GPU |
|
|
* **Hours used:** Not recorded |
|
|
* **Cloud Provider:** Google Colab |
|
|
* **Compute Region:** Not specified |
|
|
* **Carbon Emitted:** Not estimated |
|
|
|
|
|
--- |
|
|
|
|
|
## Technical Specifications |
|
|
|
|
|
### Model Architecture and Objective |
|
|
|
|
|
* Base model: Microsoft Phi-2 |
|
|
* Objective: Instruction-following text generation |
|
|
* Adaptation method: LoRA (PEFT) |
|
|
|
|
|
### Compute Infrastructure |
|
|
|
|
|
#### Hardware |
|
|
|
|
|
* NVIDIA T4 GPU |
|
|
|
|
|
#### Software |
|
|
|
|
|
* Transformers |
|
|
* PEFT |
|
|
* PyTorch |
|
|
|
|
|
--- |
|
|
|
|
|
## Model Card Contact |
|
|
|
|
|
Atharva Gaykar |
|
|
|
|
|
### Framework Versions |
|
|
|
|
|
* PEFT 0.18.0 |
|
|
|
|
|
|
|
|
|