Update README.md
Browse files
README.md
CHANGED
|
@@ -39,7 +39,7 @@ Based on previous experimentation with LoRA, I think the method does decently we
|
|
| 39 |
| mistralai/Mistral-7B-Instruct-v0.2 | 0.77 | 0.37 | 0.77 | 0.85 |
|
| 40 |
| | | | | |
|
| 41 |
|
| 42 |
-
To benchmark the model,
|
| 43 |
|
| 44 |
medqa_4options (Domain / task specific): Assess model’s ability to perform medical reasoning when given complex multiple choice questions and be able to understand medical information
|
| 45 |
|
|
|
|
| 39 |
| mistralai/Mistral-7B-Instruct-v0.2 | 0.77 | 0.37 | 0.77 | 0.85 |
|
| 40 |
| | | | | |
|
| 41 |
|
| 42 |
+
To benchmark the model, general, medical reasoning, and summarization specific benchmarks were utilized. For the general benchmark, Massive Multitask Language Understanding (MMLU) Philosophy (Caballar & Stryker, 2025) was chosen and for the summarization specific benchmark, Extreme Summarization (XSum) was used to evaluate the model’s ability to generate effective summaries/abstracts when given long inputs that may be unstructured or have lots of technical language (Narayan et al., 1970). My general benchmark plan was as follows:
|
| 43 |
|
| 44 |
medqa_4options (Domain / task specific): Assess model’s ability to perform medical reasoning when given complex multiple choice questions and be able to understand medical information
|
| 45 |
|