--- base_model: microsoft/phi-2 library_name: peft pipeline_tag: text-generation tags: - base_model:adapter:microsoft/phi-2 - lora - transformers license: cc-by-nc-4.0 datasets: - Gaykar/DrugData --- # Model Card for Model ID This model is a LoRA-based fine-tuned variant of Microsoft Phi-2, designed to generate concise, medical-style textual descriptions of drugs. Given a drug name as input, the model produces a short, single-paragraph description following an instruction-style prompt format. The training pipeline consists of two stages: Continued Pretraining (CPT) on domain-relevant medical and pharmaceutical text to adapt the base model to the language and terminology of the domain. Supervised Fine-Tuning (SFT) using structured drug name–description pairs to guide the model toward consistent formatting and domain-specific writing style. This model is intended **strictly for educational and research purposes** and must not be used for real-world medical, clinical, or decision-making applications. --- ## Model Details ### Model Description This model is a parameter-efficient fine-tuned version of the Microsoft Phi-2 language model, adapted to generate concise medical drug descriptions from drug names. The training pipeline consists of two stages: 1. **Continued Pretraining (CPT)** to adapt the base model to drug and medical terminology. 2. **Supervised Fine-Tuning (SFT)** using instruction-style input–output pairs. LoRA adapters were used during fine-tuning to reduce memory usage and training cost while preserving base model knowledge. - **Developed by:** Atharva Gaykar - **Funded by:** Not applicable - **Shared by:** Atharva Gaykar - **Model type:** Causal Language Model (LoRA-adapted) - **Language(s) (NLP):** English - **License:** CC-BY-NC 4.0 - **Finetuned from model:** microsoft/phi-2 --- ## Uses This model is designed to generate concise medical-style descriptions of drugs given their names. ### Direct Use - Educational demonstrations of instruction-following language models - Academic research on medical-domain adaptation - Experimentation with CPT + SFT pipelines - Studying hallucination behavior in domain-specific LLMs The model should only be used in **non-production, educational, or research settings**. ### Out-of-Scope Use This model is **not designed or validated** for: - Medical diagnosis or treatment planning - Clinical decision support systems - Dosage recommendations or prescribing guidance - Patient-facing healthcare applications - Professional medical, pharmaceutical, or regulatory use - Any real-world deployment where incorrect medical information could cause harm --- ## Bias, Risks, and Limitations This model was developed **solely for educational purposes** and **must not be used in real-world medical or clinical decision-making**. ### Known Limitations - May hallucinate incorrect drug indications or mechanisms - Generated descriptions may be incomplete or outdated - Does not verify outputs against authoritative medical sources - Does not understand patient context, dosage, or drug interactions - Output quality is sensitive to prompt phrasing ### Risks - Misinterpretation of outputs as medical advice - Overconfidence in fluent but inaccurate responses - Potential propagation of misinformation if misused ### Recommendations - Always verify outputs using trusted medical references - Use only in controlled, non-production environments - Clearly disclose limitations in any downstream use - Avoid deployment in safety-critical or healthcare systems --- ## How to Get Started with the Model This repository contains **LoRA adapter weights**, not a full model. Example usage (conceptual): ```python from transformers import AutoModelForCausalLM, AutoTokenizer from peft import PeftModel # Load base model and tokenizer base_model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2") tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-2") # Load LoRA adapter model = PeftModel.from_pretrained(base_model, "Gaykar/Phi2-drug_data") model.eval() import torch # Drug to evaluate drug_name = "Paracetamol" # Build evaluation prompt eval_prompt = ( "Generate exactly ONE sentence describing the drug.\n" "Do not include headings or extra information.\n\n" f"Drug Name: {drug_name}\n" "Description:" ) # Tokenize prompt model_input = tokenizer( eval_prompt, return_tensors="pt" ).to(model.device) # Generate output (greedy decoding) with torch.no_grad(): output = model.generate( **model_input, do_sample=False, num_beams=1, # Greedy decoding (This decision is critical for this model because it operates in the medical domain, where factual consistency and determinism are more important than linguistic diversity.) max_new_tokens=120, repetition_penalty=1.1, eos_token_id=tokenizer.eos_token_id ) # Remove prompt tokens prompt_length = model_input["input_ids"].shape[1] generated_tokens = output[0][prompt_length:] # Decode generated text only generated_text = tokenizer.decode( generated_tokens, skip_special_tokens=True ).strip() # Enforce single-sentence output if "." in generated_text: generated_text = generated_text.split(".")[0] + "." print(" DRUG NAME:", drug_name) print(" MODEL GENERATED DESCRIPTION:") print(generated_text) #Example output DRUG NAME (EVAL): Paracetamol MODEL GENERATED DESCRIPTION: Paracetamol (acetaminophen) is a non-narcotic analgesic and antipyretic used to relieve mild to moderate pain and reduce fever. ```` --- ## Training Details ### Training Data * **Dataset:** Gaykar/DrugData * Structured drug name–description pairs * Used for both CPT (domain adaptation) and SFT (instruction following) ### Training Procedure #### Continued Pretraining (CPT) The base model was further trained on domain-relevant medical and drug-related text to improve familiarity with terminology and style. CPT focused on next-token prediction without instruction formatting. #### Supervised Fine-Tuning (SFT) After CPT, the model was fine-tuned using instruction-style prompts to generate concise medical descriptions from drug names. #### Training Hyperparameters **CPT Hyperparameters** | Hyperparameter | Value | | ----------------------- | ------------------- | | Batch size (per device) | 1 | | Effective batch size | 8 | | Epochs | 4 | | Learning rate | 2e-4 | | Precision | FP16 | | Optimizer | Paged AdamW (8-bit) | | Logging steps | 10 | | Checkpoint saving | Every 500 steps | | Checkpoint limit | 2 | **SFT Hyperparameters** | Hyperparameter | Value | | ----------------------- | ------------------- | | Batch size (per device) | 4 | | Gradient accumulation | 1 | | Effective batch size | 4 | | Epochs | 5 | | Learning rate | 2e-5 | | LR scheduler | Linear | | Warmup ratio | 6% | | Weight decay | 1e-4 | | Max gradient norm | 1.0 | | Precision | FP16 | | Optimizer | Paged AdamW (8-bit) | | Checkpoint saving | Every 50 steps | | Checkpoint limit | 2 | | Experiment tracking | Weights & Biases | --- ## Evaluation ### Testing Data Drug names sampled from the same dataset were used for evaluation. Outputs were assessed for factual correctness using an external LLM-based evaluation approach. ### Metrics **Evaluation Method:** LLM-as-a-Judge (Chatgpt -Web seacrch available. ) * Binary classification: Factually Correct / Hallucinated * Three evaluation batches ### Results **Batch 1** | Category | Count | Percentage | | --------------------- | ----- | ---------- | | Total Drugs Evaluated | 25 | 100% | | Factually Correct | 24 | 96% | | Hallucinated / Failed | 1 | 4% | **Batch 2** | Category | Count | Percentage | | --------------------- | ----- | ---------- | | Total Drugs Evaluated | 25 | 100% | | Factually Correct | 22 | 88% | | Hallucinated / Failed | 3 | 12% | **Batch 3** | Category | Count | Percentage | | --------------------- | ----- | ---------- | | Total Drugs Evaluated | 22 | 100% | | Factually Correct | 15 | 68% | | Hallucinated / Failed | 0 | 0% | #### Summary Since this model was fine-tuned (SFT+CPT) using LoRA rather than full-parameter fine-tuning, eliminating hallucinations entirely is challenging. While LoRA enables efficient training and strong instruction-following behavior, it does not fully overwrite the base model’s internal knowledge. Despite this limitation, the model performs well for educational and research-oriented drug description generation tasks. --- ## Environmental Impact * **Hardware Type:** NVIDIA T4 GPU * **Hours used:** Not recorded * **Cloud Provider:** Google Colab * **Compute Region:** Not specified * **Carbon Emitted:** Not estimated --- ## Technical Specifications ### Model Architecture and Objective * Base model: Microsoft Phi-2 * Objective: Instruction-following text generation * Adaptation method: LoRA (PEFT) ### Compute Infrastructure #### Hardware * NVIDIA T4 GPU #### Software * Transformers * PEFT * PyTorch --- ## Model Card Contact Atharva Gaykar ### Framework Versions * PEFT 0.18.0