Update README.md

79c05cd verified about 1 month ago

9.85 kB

	---
	base_model: microsoft/phi-2
	library_name: peft
	pipeline_tag: text-generation
	tags:
	- base_model:adapter:microsoft/phi-2
	- lora
	- transformers
	license: cc-by-nc-4.0
	datasets:
	- Gaykar/DrugData
	---

	# Model Card for Model ID

	This model is a LoRA-based fine-tuned variant of Microsoft Phi-2, designed to generate concise, medical-style textual descriptions of drugs.
	Given a drug name as input, the model produces a short, single-paragraph description following an instruction-style prompt format.

	The training pipeline consists of two stages:

	Continued Pretraining (CPT) on domain-relevant medical and pharmaceutical text to adapt the base model to the language and terminology of the domain.

	Supervised Fine-Tuning (SFT) using structured drug name–description pairs to guide the model toward consistent formatting and domain-specific writing style.



	This model is intended strictly for educational and research purposes and must not be used for real-world medical, clinical, or decision-making applications.

	---

	## Model Details

	### Model Description

	This model is a parameter-efficient fine-tuned version of the Microsoft Phi-2 language model, adapted to generate concise medical drug descriptions from drug names. The training pipeline consists of two stages:

	1. Continued Pretraining (CPT) to adapt the base model to drug and medical terminology.
	2. Supervised Fine-Tuning (SFT) using instruction-style input–output pairs.

	LoRA adapters were used during fine-tuning to reduce memory usage and training cost while preserving base model knowledge.

	- Developed by: Atharva Gaykar
	- Funded by: Not applicable
	- Shared by: Atharva Gaykar
	- Model type: Causal Language Model (LoRA-adapted)
	- Language(s) (NLP): English
	- License: CC-BY-NC 4.0
	- Finetuned from model: microsoft/phi-2

	---

	## Uses

	This model is designed to generate concise medical-style descriptions of drugs given their names.

	### Direct Use

	- Educational demonstrations of instruction-following language models
	- Academic research on medical-domain adaptation
	- Experimentation with CPT + SFT pipelines
	- Studying hallucination behavior in domain-specific LLMs

	The model should only be used in non-production, educational, or research settings.

	### Out-of-Scope Use

	This model is not designed or validated for:

	- Medical diagnosis or treatment planning
	- Clinical decision support systems
	- Dosage recommendations or prescribing guidance
	- Patient-facing healthcare applications
	- Professional medical, pharmaceutical, or regulatory use
	- Any real-world deployment where incorrect medical information could cause harm

	---

	## Bias, Risks, and Limitations

	This model was developed solely for educational purposes and must not be used in real-world medical or clinical decision-making.

	### Known Limitations

	- May hallucinate incorrect drug indications or mechanisms
	- Generated descriptions may be incomplete or outdated
	- Does not verify outputs against authoritative medical sources
	- Does not understand patient context, dosage, or drug interactions
	- Output quality is sensitive to prompt phrasing

	### Risks

	- Misinterpretation of outputs as medical advice
	- Overconfidence in fluent but inaccurate responses
	- Potential propagation of misinformation if misused

	### Recommendations

	- Always verify outputs using trusted medical references
	- Use only in controlled, non-production environments
	- Clearly disclose limitations in any downstream use
	- Avoid deployment in safety-critical or healthcare systems

	---

	## How to Get Started with the Model

	This repository contains LoRA adapter weights, not a full model.

	Example usage (conceptual):

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	from peft import PeftModel

	# Load base model and tokenizer
	base_model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2")
	tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-2")

	# Load LoRA adapter
	model = PeftModel.from_pretrained(base_model, "Gaykar/Phi2-drug_data")

	model.eval()


	import torch

	# Drug to evaluate
	drug_name = "Paracetamol"

	# Build evaluation prompt
	eval_prompt = (
	"Generate exactly ONE sentence describing the drug.\n"
	"Do not include headings or extra information.\n\n"
	f"Drug Name: {drug_name}\n"
	"Description:"
	)

	# Tokenize prompt
	model_input = tokenizer(
	eval_prompt,
	return_tensors="pt"
	).to(model.device)

	# Generate output (greedy decoding)
	with torch.no_grad():
	output = model.generate(
	**model_input,
	do_sample=False,
	num_beams=1, # Greedy decoding (This decision is critical for this model because it operates in the medical domain, where factual consistency and determinism are more important than linguistic diversity.)
	max_new_tokens=120,
	repetition_penalty=1.1,
	eos_token_id=tokenizer.eos_token_id
	)

	# Remove prompt tokens
	prompt_length = model_input["input_ids"].shape[1]
	generated_tokens = output[0][prompt_length:]

	# Decode generated text only
	generated_text = tokenizer.decode(
	generated_tokens,
	skip_special_tokens=True
	).strip()

	# Enforce single-sentence output
	if "." in generated_text:
	generated_text = generated_text.split(".")[0] + "."

	print(" DRUG NAME:", drug_name)
	print(" MODEL GENERATED DESCRIPTION:")
	print(generated_text)

	#Example output
	DRUG NAME (EVAL): Paracetamol

	MODEL GENERATED DESCRIPTION:
	Paracetamol (acetaminophen) is a non-narcotic analgesic and antipyretic used to relieve mild to moderate pain and reduce fever.

	````

	---

	## Training Details

	### Training Data

	* Dataset: Gaykar/DrugData
	* Structured drug name–description pairs
	* Used for both CPT (domain adaptation) and SFT (instruction following)

	### Training Procedure

	#### Continued Pretraining (CPT)

	The base model was further trained on domain-relevant medical and drug-related text to improve familiarity with terminology and style. CPT focused on next-token prediction without instruction formatting.

	#### Supervised Fine-Tuning (SFT)

	After CPT, the model was fine-tuned using instruction-style prompts to generate concise medical descriptions from drug names.

	#### Training Hyperparameters

	CPT Hyperparameters

	\| Hyperparameter \| Value \|
	\| ----------------------- \| ------------------- \|
	\| Batch size (per device) \| 1 \|
	\| Effective batch size \| 8 \|
	\| Epochs \| 4 \|
	\| Learning rate \| 2e-4 \|
	\| Precision \| FP16 \|
	\| Optimizer \| Paged AdamW (8-bit) \|
	\| Logging steps \| 10 \|
	\| Checkpoint saving \| Every 500 steps \|
	\| Checkpoint limit \| 2 \|

	SFT Hyperparameters

	\| Hyperparameter \| Value \|
	\| ----------------------- \| ------------------- \|
	\| Batch size (per device) \| 4 \|
	\| Gradient accumulation \| 1 \|
	\| Effective batch size \| 4 \|
	\| Epochs \| 5 \|
	\| Learning rate \| 2e-5 \|
	\| LR scheduler \| Linear \|
	\| Warmup ratio \| 6% \|
	\| Weight decay \| 1e-4 \|
	\| Max gradient norm \| 1.0 \|
	\| Precision \| FP16 \|
	\| Optimizer \| Paged AdamW (8-bit) \|
	\| Checkpoint saving \| Every 50 steps \|
	\| Checkpoint limit \| 2 \|
	\| Experiment tracking \| Weights & Biases \|

	---

	## Evaluation

	### Testing Data

	Drug names sampled from the same dataset were used for evaluation. Outputs were assessed for factual correctness using an external LLM-based evaluation approach.

	### Metrics

	Evaluation Method: LLM-as-a-Judge (Chatgpt -Web seacrch available. )

	* Binary classification: Factually Correct / Hallucinated
	* Three evaluation batches

	### Results

	Batch 1

	\| Category \| Count \| Percentage \|
	\| --------------------- \| ----- \| ---------- \|
	\| Total Drugs Evaluated \| 25 \| 100% \|
	\| Factually Correct \| 24 \| 96% \|
	\| Hallucinated / Failed \| 1 \| 4% \|

	Batch 2

	\| Category \| Count \| Percentage \|
	\| --------------------- \| ----- \| ---------- \|
	\| Total Drugs Evaluated \| 25 \| 100% \|
	\| Factually Correct \| 22 \| 88% \|
	\| Hallucinated / Failed \| 3 \| 12% \|

	Batch 3

	\| Category \| Count \| Percentage \|
	\| --------------------- \| ----- \| ---------- \|
	\| Total Drugs Evaluated \| 22 \| 100% \|
	\| Factually Correct \| 15 \| 68% \|
	\| Hallucinated / Failed \| 0 \| 0% \|

	#### Summary

	Since this model was fine-tuned (SFT+CPT) using LoRA rather than full-parameter fine-tuning, eliminating hallucinations entirely is challenging. While LoRA enables efficient training and strong instruction-following behavior, it does not fully overwrite the base model’s internal knowledge. Despite this limitation, the model performs well for educational and research-oriented drug description generation tasks.

	---

	## Environmental Impact

	* Hardware Type: NVIDIA T4 GPU
	* Hours used: Not recorded
	* Cloud Provider: Google Colab
	* Compute Region: Not specified
	* Carbon Emitted: Not estimated

	---

	## Technical Specifications

	### Model Architecture and Objective

	* Base model: Microsoft Phi-2
	* Objective: Instruction-following text generation
	* Adaptation method: LoRA (PEFT)

	### Compute Infrastructure

	#### Hardware

	* NVIDIA T4 GPU

	#### Software

	* Transformers
	* PEFT
	* PyTorch

	---

	## Model Card Contact

	Atharva Gaykar

	### Framework Versions

	* PEFT 0.18.0