--- license: apache-2.0 datasets: - qiaojin/PubMedQA - MedAI-COS30018/PubMedQA-map - MedAI-COS30018/PubmedQA-u - MedAI-COS30018/PubMedQA-l - MedAI-COS30018/HealthCareMagic - MedAI-COS30018/iCliniq language: - en base_model: - medalpaca/medalpaca-7b - google/medgemma-27b-it pipeline_tag: question-answering metrics: - bertscore: 0.8441 tags: - medical - knowledge-distillation --- # Model Card ## Model Description **MedSwin-7B-KD** is a high-performance 7B parameter language model for medical question-answering and clinical reasoning. It was created by applying a novel **Dual-Phase Knowledge Distillation (KD)** pipeline to the `medalpaca/medalpaca-7b` base model. Unlike its SFT predecessor, this model leverages the superior knowledge and reasoning capabilities of the larger `google/medgemma-27b-it` model as a "teacher" to guide the training of the smaller, more efficient "student" model. This results in a compact model that captures the clinical acumen of a much larger counterpart. - **Developed by:** Medical Swinburne University of Technology AI Team - **Funded by:** [Swinburne University of Technology](https://www.swinburne.edu.au) - **Base Model (Student):** [medalpaca/medalpaca-7b](https://huggingface.co/medalpaca/medalpaca-7b) - **Teacher Model:** [google/medgemma-27b-it](https://huggingface.co/google/medgemma-27b-it) - **Language(s):** English - **License:** Apache 2.0 ### Intended Use This model is intended for research purposes in the following domains: * AI-assisted medicine and clinical decision support research. * Biomedical natural language processing (NLP). * Exploration of efficient knowledge distillation and model compression in specialized domains. * Generating high-quality, clinically-grounded synthetic data. ## Training Data The model was trained on the same curated and augmented collection of medical QA datasets as the SFT version, but the *target outputs* were generated by the teacher model. - **PubMedQA**: Original and processed (map, u, l) variants for factoid and research-oriented questions. - **HealthCareMagic** & **iCliniq**: Real-world patient-doctor interactions from online portals. ### Data Curation & Knowledge Distillation Pipeline The training pipeline was fundamentally redesigned to center on knowledge distillation, moving beyond simple paraphrasing to focus on transferring deep reasoning patterns. | Stage | Purpose | Methodology & Quality Control | | :--- | :--- | :--- | | **A. Augmented Query Generation** | Create a diverse set of high-quality input prompts. | Utilizes the same multi-model paraphrasing, back-translation, and style standardization pipeline from the SFT model to generate a rich variety of instructions and inputs. | | **B. Teacher Forcing & Output Generation** | Generate "gold-standard" responses using the superior teacher model. | **Teacher Model:** `google/medgemma-27b-it`.
**Generation Strategy:** Low-temperature sampling with contrastive decoding to produce confident, factually-dense, and well-structured answers.
**Input:** The entire augmented set of `(Instruction, Input)` pairs from Stage A. | | **C. Response Filtering & Alignment** | Ensure the teacher's outputs are of the highest quality for student training. | **Factual Consistency Check:** Cross-referencing key medical claims against the original context.
**Style Alignment:** Enforcing the neutral, professional clinical tone.
**Complexity Pruning:** Removing outputs that are overly verbose or rely on reasoning chains too complex for the student model to learn effectively. | | **D. Dual-Phase Knowledge Distillation** | Transfer knowledge from teacher to student. | **Phase 1 (Response Mimicking):** The student model is trained to directly reproduce the teacher's filtered outputs, learning its style and factual presentation.
**Phase 2 (Logit Matching):** The student is trained to align its internal probability distributions (logits) with the teacher's for the same input, capturing the teacher's "thinking process" and confidence calibration. | | **E. Quality Assurance** | Ensure the final training pairs are optimal for distillation. | **F1. Data Cleaning:** PHI removal; MD5-based deduplication.
**F2. KD-Specific Validation:** Checking for alignment between query complexity and response depth; ensuring student-trainable reasoning patterns. | ## Output Format All training data was formatted into the same standardized SFT structure, but the outputs are now teacher-generated: ``` ### Instruction: {Task descriptor and/or user question with context} ### Input: {Additional user question or context, if any} ### Output: {The teacher model's (MedGemma-27b) target response} ``` Each data point includes metadata tags for its augmentation source and a `distilled_from: medgemma-27b` tag. ## Usage You can load and use the model with the Hugging Face `transformers` library, identical to the SFT version but with potentially improved performance. ```python import transformers model_id = "MedAI-COS30018/MedSwin-7B-KD" pipeline = transformers.pipeline( "text-generation", model=model_id, device_map="auto", # Use GPU if available ) # Format your input according to the training template instruction = "Based on the provided context, what is the most likely diagnosis?" context = "A 45-year-old male presents with acute, crushing substernal chest pain radiating to the left arm, associated with diaphoresis and nausea for the past hour." formatted_prompt = f"### Instruction:\n{instruction}\n\n### Input:\n{context}\n\n### Output:\n" # Generate a response sequences = pipeline( formatted_prompt, max_new_tokens=256, do_sample=True, temperature=0.3, top_p=0.9, eos_token_id=pipeline.tokenizer.eos_token_id, ) print(sequences[0]['generated_text']) ``` ## Bias, Risks, and Limitations The model inherits and may amplify biases present in its base model, teacher model, and training data. These can include: * **Demographic Biases:** Biases related to race, gender, age, or socioeconomic status based on patterns in the source data. * **Clinical Biases:** Potential over-representation of certain conditions, treatments, or clinical perspectives. * **Factual Accuracy:** While the teacher model is highly capable, it is not infallible. The distilled model may propagate or even amplify any errors made by the teacher. It is not a certified medical knowledge base and can generate incorrect or outdated information. * **Safe Deployment:** Use a **Human-in-the-Loop** (HITL) system for any real-world application. Outputs **must** be verified by a qualified healthcare professional. **Do not use for direct patient care without rigorous clinical validation.** ## Technical Specifications & Evaluation * **Model Architecture:** Based on LLaMA, fine-tuned via Dual-Phase Knowledge Distillation. * **Model Size:** 7 Billion parameters. * **Teacher Model Size:** 27 Billion parameters. * **Input Format:** Instruction-Input-Output structure. * **Key Metric:** * **BERTScore (F1):** 0.84. * [Benchmark Dataset](https://huggingface.co/datasets/MedSwin/MedQuAD_Benchmark) * [Benchmark Logs](https://github.com/MedSwin/Finetuning/tree/main/benchmarks/MedQuAD_benchmark_runs) * > Review all model metrics benchmark via [Benchmark Document Preview](https://hackmd.io/@ngFNmXW1RVOfNb7b3NYBJg/model_review).