| | --- |
| | license: apache-2.0 |
| | datasets: |
| | - raidium/ECNQA_generated_questions |
| | library_name: transformers |
| | tags: |
| | - medical |
| | --- |
| | |
| | # Model Card for Raidium MQG model |
| |
|
| |
|
| | The model is introduced in the paper "Efficient Medical Question Answering with Knowledge-Augmented Question Generation". |
| |
|
| | Paper: [https://arxiv.org/abs/2405.14654](https://arxiv.org/abs/2405.14654) |
| |
|
| | MQG is is a transformer language model pre-trained on a series of medical textbooks, and medical questions generated by GPT-4. The weights are initialized with |
| | [BioMedLM](https://huggingface.co/stanford-crfm/BioMedLM), then further pre-trained on those datasets. |
| |
|
| | The questions have been generated from prompt containing medical data from the textbooks. |
| | They are available here: [ECNQA_generated_questions](https://huggingface.co/datasets/raidium/ECNQA_generated_questions). |
| |
|
| | MQG is designed to be fine-tuned for Medical Question Answering tasks. |
| |
|
| | ## Model Details |
| |
|
| | ### Model Description |
| |
|
| |  |
| |
|
| | In the expanding field of language model applications, medical knowledge representation remains a significant challenge due to the specialized nature of the domain. |
| | Large language models, such as GPT-4, obtain reasonable scores on medical question answering tasks, but smaller models are far behind. |
| | In this work, we introduce a method to improve the proficiency of a small language model in the medical domain by employing a two-fold approach. |
| | We first fine-tune the model on a corpus of medical textbooks. Then, we use GPT-4 to generate questions similar to the downstream task, prompted with textbook knowledge, and use them to fine-tune the model. |
| | We show the benefits of our training strategy on a medical answering question dataset. |
| | The study's findings highlight the potential of small language models in the medical domain when appropriately fine-tuned. |
| |
|
| |
|
| | - **Developed by:** Raidium |
| | - **Model type:** Transformer |
| | - **License:** Aopache 2.0 |
| | - **Finetuned from model:** [BioMedLM](https://huggingface.co/stanford-crfm/BioMedLM) |
| |
|
| | ### Model Sources [optional] |
| |
|
| | <!-- Provide the basic links for the model. --> |
| |
|
| | - **Repository:** [https://github.com/raidium-med/MQG] |
| | - **Paper:** [https://arxiv.org/abs/2405.14654](https://arxiv.org/abs/2405.14654) |
| |
|
| | ## Uses |
| |
|
| | ### Direct Use |
| |
|
| | MQG is trained using next-token-prediction on generated questions. |
| | Therefore, it can be used out-of-the-box to generate potential answers for medical question answering tasks. |
| | However, the generated questions might contain some errors, so it is advised to fine-tune the model on your dataset, and use the models to rank the potential answers. |
| |
|
| | ### Downstream Use |
| |
|
| | MQG can be fine-tuned for Medical Question Answering tasks. |
| | For multiple choice questions, a classification head should be appended at the end of the model, to rank different proposed answers. |
| |
|
| | ### Out-of-Scope Use |
| |
|
| | This model should not be used for datasets outside medical tasks. |
| |
|
| | ## Bias, Risks, and Limitations |
| |
|
| | There is no guarantee that the model answers medical questions correctly. It should only be used for academic purposes, and not in clinical care. |
| |
|
| | ## Training Details |
| |
|
| | ### Training Data |
| |
|
| | The model is trained on a corpus of medical textbooks, and further pre-trained on generated questions: [ECNQA_generated_questions](https://huggingface.co/datasets/raidium/ECNQA_generated_questions). |
| |
|
| | ### Training Procedure |
| |
|
| | MGQ is trained using next-token-prediction on both datasets. |
| |
|
| | #### Training Hyperparameters |
| |
|
| | - **Training regime:** fp16 mixed-precision training. <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision --> |
| |
|
| | ## Evaluation |
| |
|
| | ### Testing Data, Factors & Metrics |
| |
|
| | #### Testing Data |
| |
|
| | We tested the model on a medical question answering dataset, ECN-QA, based on the french medical residency examination. |
| | It is composed of "single" and "progressive" questions (i.e a serie of multiple related questions). |
| | It is a multiple-choice question dataset, containing 5 propositions for each question. |
| |
|
| | #### Metrics |
| |
|
| | We use the accuracy to evaluate the model on Medical Question Answering. |
| |
|
| | ### Results |
| |
|
| | See paper: [https://arxiv.org/abs/2405.14654](https://arxiv.org/abs/2405.14654) |
| |
|
| | ### Model Architecture and Objective |
| |
|
| | The model is based on BioMedLM's architecture, which is modified from GPT-2 architecture. |
| |
|
| | ### Compute Infrastructure |
| |
|
| | #### Hardware |
| |
|
| | The model was trained on the Jean-Zay supercomputer, on multiple nodes with 4 A100 gpus. |
| |
|
| | #### Software |
| |
|
| | Pytorch, DeepSpeed |
| |
|
| | ## Citation |
| |
|
| |
|
| | **BibTeX:** |
| | ``` |
| | @article{khlaut2024efficient, |
| | title={Efficient Medical Question Answering with Knowledge-Augmented Question Generation}, |
| | author={Khlaut, Julien and Dancette, Corentin and Ferreres, Elodie and Bennani, Alaedine and H{\'e}rent, Paul and Manceron, Pierre}, |
| | journal={Clinical NLP Workshop, NAACL 2024}, |
| | year={2024} |
| | } |
| | ``` |
| |
|
| | ## Model Card Contact |
| |
|
| | julien.khlaut at raidium.fr |
| |
|