| | --- |
| | base_model: microsoft/phi-2 |
| | library_name: peft |
| | pipeline_tag: text-generation |
| | tags: |
| | - base_model:adapter:microsoft/phi-2 |
| | - lora |
| | - transformers |
| | license: cc-by-nc-4.0 |
| | datasets: |
| | - Gaykar/DrugData |
| | --- |
| | |
| | # Model Card for Model ID |
| |
|
| | This model is a LoRA-based fine-tuned variant of Microsoft Phi-2, designed to generate concise, medical-style textual descriptions of drugs. |
| | Given a drug name as input, the model produces a short, single-paragraph description following an instruction-style prompt format. |
| |
|
| | The training pipeline consists of two stages: |
| |
|
| | Continued Pretraining (CPT) on domain-relevant medical and pharmaceutical text to adapt the base model to the language and terminology of the domain. |
| |
|
| | Supervised Fine-Tuning (SFT) using structured drug name–description pairs to guide the model toward consistent formatting and domain-specific writing style. |
| |
|
| |
|
| |
|
| | This model is intended **strictly for educational and research purposes** and must not be used for real-world medical, clinical, or decision-making applications. |
| |
|
| | --- |
| |
|
| | ## Model Details |
| |
|
| | ### Model Description |
| |
|
| | This model is a parameter-efficient fine-tuned version of the Microsoft Phi-2 language model, adapted to generate concise medical drug descriptions from drug names. The training pipeline consists of two stages: |
| |
|
| | 1. **Continued Pretraining (CPT)** to adapt the base model to drug and medical terminology. |
| | 2. **Supervised Fine-Tuning (SFT)** using instruction-style input–output pairs. |
| |
|
| | LoRA adapters were used during fine-tuning to reduce memory usage and training cost while preserving base model knowledge. |
| |
|
| | - **Developed by:** Atharva Gaykar |
| | - **Funded by:** Not applicable |
| | - **Shared by:** Atharva Gaykar |
| | - **Model type:** Causal Language Model (LoRA-adapted) |
| | - **Language(s) (NLP):** English |
| | - **License:** CC-BY-NC 4.0 |
| | - **Finetuned from model:** microsoft/phi-2 |
| |
|
| | --- |
| |
|
| | ## Uses |
| |
|
| | This model is designed to generate concise medical-style descriptions of drugs given their names. |
| |
|
| | ### Direct Use |
| |
|
| | - Educational demonstrations of instruction-following language models |
| | - Academic research on medical-domain adaptation |
| | - Experimentation with CPT + SFT pipelines |
| | - Studying hallucination behavior in domain-specific LLMs |
| |
|
| | The model should only be used in **non-production, educational, or research settings**. |
| |
|
| | ### Out-of-Scope Use |
| |
|
| | This model is **not designed or validated** for: |
| |
|
| | - Medical diagnosis or treatment planning |
| | - Clinical decision support systems |
| | - Dosage recommendations or prescribing guidance |
| | - Patient-facing healthcare applications |
| | - Professional medical, pharmaceutical, or regulatory use |
| | - Any real-world deployment where incorrect medical information could cause harm |
| |
|
| | --- |
| |
|
| | ## Bias, Risks, and Limitations |
| |
|
| | This model was developed **solely for educational purposes** and **must not be used in real-world medical or clinical decision-making**. |
| |
|
| | ### Known Limitations |
| |
|
| | - May hallucinate incorrect drug indications or mechanisms |
| | - Generated descriptions may be incomplete or outdated |
| | - Does not verify outputs against authoritative medical sources |
| | - Does not understand patient context, dosage, or drug interactions |
| | - Output quality is sensitive to prompt phrasing |
| |
|
| | ### Risks |
| |
|
| | - Misinterpretation of outputs as medical advice |
| | - Overconfidence in fluent but inaccurate responses |
| | - Potential propagation of misinformation if misused |
| |
|
| | ### Recommendations |
| |
|
| | - Always verify outputs using trusted medical references |
| | - Use only in controlled, non-production environments |
| | - Clearly disclose limitations in any downstream use |
| | - Avoid deployment in safety-critical or healthcare systems |
| |
|
| | --- |
| |
|
| | ## How to Get Started with the Model |
| |
|
| | This repository contains **LoRA adapter weights**, not a full model. |
| |
|
| | Example usage (conceptual): |
| |
|
| | ```python |
| | from transformers import AutoModelForCausalLM, AutoTokenizer |
| | from peft import PeftModel |
| | |
| | # Load base model and tokenizer |
| | base_model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2") |
| | tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-2") |
| | |
| | # Load LoRA adapter |
| | model = PeftModel.from_pretrained(base_model, "Gaykar/Phi2-drug_data") |
| | |
| | model.eval() |
| | |
| | |
| | import torch |
| | |
| | # Drug to evaluate |
| | drug_name = "Paracetamol" |
| | |
| | # Build evaluation prompt |
| | eval_prompt = ( |
| | "Generate exactly ONE sentence describing the drug.\n" |
| | "Do not include headings or extra information.\n\n" |
| | f"Drug Name: {drug_name}\n" |
| | "Description:" |
| | ) |
| | |
| | # Tokenize prompt |
| | model_input = tokenizer( |
| | eval_prompt, |
| | return_tensors="pt" |
| | ).to(model.device) |
| | |
| | # Generate output (greedy decoding) |
| | with torch.no_grad(): |
| | output = model.generate( |
| | **model_input, |
| | do_sample=False, |
| | num_beams=1, # Greedy decoding (This decision is critical for this model because it operates in the medical domain, where factual consistency and determinism are more important than linguistic diversity.) |
| | max_new_tokens=120, |
| | repetition_penalty=1.1, |
| | eos_token_id=tokenizer.eos_token_id |
| | ) |
| | |
| | # Remove prompt tokens |
| | prompt_length = model_input["input_ids"].shape[1] |
| | generated_tokens = output[0][prompt_length:] |
| | |
| | # Decode generated text only |
| | generated_text = tokenizer.decode( |
| | generated_tokens, |
| | skip_special_tokens=True |
| | ).strip() |
| | |
| | # Enforce single-sentence output |
| | if "." in generated_text: |
| | generated_text = generated_text.split(".")[0] + "." |
| | |
| | print(" DRUG NAME:", drug_name) |
| | print(" MODEL GENERATED DESCRIPTION:") |
| | print(generated_text) |
| | |
| | #Example output |
| | DRUG NAME (EVAL): Paracetamol |
| | |
| | MODEL GENERATED DESCRIPTION: |
| | Paracetamol (acetaminophen) is a non-narcotic analgesic and antipyretic used to relieve mild to moderate pain and reduce fever. |
| | |
| | ```` |
| |
|
| | --- |
| |
|
| | ## Training Details |
| |
|
| | ### Training Data |
| |
|
| | * **Dataset:** Gaykar/DrugData |
| | * Structured drug name–description pairs |
| | * Used for both CPT (domain adaptation) and SFT (instruction following) |
| |
|
| | ### Training Procedure |
| |
|
| | #### Continued Pretraining (CPT) |
| |
|
| | The base model was further trained on domain-relevant medical and drug-related text to improve familiarity with terminology and style. CPT focused on next-token prediction without instruction formatting. |
| |
|
| | #### Supervised Fine-Tuning (SFT) |
| |
|
| | After CPT, the model was fine-tuned using instruction-style prompts to generate concise medical descriptions from drug names. |
| |
|
| | #### Training Hyperparameters |
| |
|
| | **CPT Hyperparameters** |
| |
|
| | | Hyperparameter | Value | |
| | | ----------------------- | ------------------- | |
| | | Batch size (per device) | 1 | |
| | | Effective batch size | 8 | |
| | | Epochs | 4 | |
| | | Learning rate | 2e-4 | |
| | | Precision | FP16 | |
| | | Optimizer | Paged AdamW (8-bit) | |
| | | Logging steps | 10 | |
| | | Checkpoint saving | Every 500 steps | |
| | | Checkpoint limit | 2 | |
| |
|
| | **SFT Hyperparameters** |
| |
|
| | | Hyperparameter | Value | |
| | | ----------------------- | ------------------- | |
| | | Batch size (per device) | 4 | |
| | | Gradient accumulation | 1 | |
| | | Effective batch size | 4 | |
| | | Epochs | 5 | |
| | | Learning rate | 2e-5 | |
| | | LR scheduler | Linear | |
| | | Warmup ratio | 6% | |
| | | Weight decay | 1e-4 | |
| | | Max gradient norm | 1.0 | |
| | | Precision | FP16 | |
| | | Optimizer | Paged AdamW (8-bit) | |
| | | Checkpoint saving | Every 50 steps | |
| | | Checkpoint limit | 2 | |
| | | Experiment tracking | Weights & Biases | |
| |
|
| | --- |
| |
|
| | ## Evaluation |
| |
|
| | ### Testing Data |
| |
|
| | Drug names sampled from the same dataset were used for evaluation. Outputs were assessed for factual correctness using an external LLM-based evaluation approach. |
| |
|
| | ### Metrics |
| |
|
| | **Evaluation Method:** LLM-as-a-Judge (Chatgpt -Web seacrch available. ) |
| |
|
| | * Binary classification: Factually Correct / Hallucinated |
| | * Three evaluation batches |
| |
|
| | ### Results |
| |
|
| | **Batch 1** |
| |
|
| | | Category | Count | Percentage | |
| | | --------------------- | ----- | ---------- | |
| | | Total Drugs Evaluated | 25 | 100% | |
| | | Factually Correct | 24 | 96% | |
| | | Hallucinated / Failed | 1 | 4% | |
| |
|
| | **Batch 2** |
| |
|
| | | Category | Count | Percentage | |
| | | --------------------- | ----- | ---------- | |
| | | Total Drugs Evaluated | 25 | 100% | |
| | | Factually Correct | 22 | 88% | |
| | | Hallucinated / Failed | 3 | 12% | |
| |
|
| | **Batch 3** |
| |
|
| | | Category | Count | Percentage | |
| | | --------------------- | ----- | ---------- | |
| | | Total Drugs Evaluated | 22 | 100% | |
| | | Factually Correct | 15 | 68% | |
| | | Hallucinated / Failed | 0 | 0% | |
| |
|
| | #### Summary |
| |
|
| | Since this model was fine-tuned (SFT+CPT) using LoRA rather than full-parameter fine-tuning, eliminating hallucinations entirely is challenging. While LoRA enables efficient training and strong instruction-following behavior, it does not fully overwrite the base model’s internal knowledge. Despite this limitation, the model performs well for educational and research-oriented drug description generation tasks. |
| |
|
| | --- |
| |
|
| | ## Environmental Impact |
| |
|
| | * **Hardware Type:** NVIDIA T4 GPU |
| | * **Hours used:** Not recorded |
| | * **Cloud Provider:** Google Colab |
| | * **Compute Region:** Not specified |
| | * **Carbon Emitted:** Not estimated |
| |
|
| | --- |
| |
|
| | ## Technical Specifications |
| |
|
| | ### Model Architecture and Objective |
| |
|
| | * Base model: Microsoft Phi-2 |
| | * Objective: Instruction-following text generation |
| | * Adaptation method: LoRA (PEFT) |
| |
|
| | ### Compute Infrastructure |
| |
|
| | #### Hardware |
| |
|
| | * NVIDIA T4 GPU |
| |
|
| | #### Software |
| |
|
| | * Transformers |
| | * PEFT |
| | * PyTorch |
| |
|
| | --- |
| |
|
| | ## Model Card Contact |
| |
|
| | Atharva Gaykar |
| |
|
| | ### Framework Versions |
| |
|
| | * PEFT 0.18.0 |
| |
|
| |
|
| |
|