hamsinimk commited on
Commit
3354519
·
verified ·
1 Parent(s): 03875b1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -39,7 +39,7 @@ Based on previous experimentation with LoRA, I think the method does decently we
39
  | mistralai/Mistral-7B-Instruct-v0.2 | 0.77 | 0.37 | 0.77 | 0.85 |
40
  | | | | | |
41
 
42
- To benchmark the model, a general, medical reasoning, and summarization specific benchmarks were utilized. For the general benchmark, Massive Multitask Language Understanding (MMLU) Philosophy (Caballar & Stryker, 2025) was chosen and for the summarization specific benchmark, Extreme Summarization (XSum) was used to evaluate the model’s ability to generate effective summaries/abstracts when given long inputs that may be unstructured or have lots of technical language (Narayan et al., 1970). My general benchmark plan was as follows:
43
 
44
  medqa_4options (Domain / task specific): Assess model’s ability to perform medical reasoning when given complex multiple choice questions and be able to understand medical information
45
 
 
39
  | mistralai/Mistral-7B-Instruct-v0.2 | 0.77 | 0.37 | 0.77 | 0.85 |
40
  | | | | | |
41
 
42
+ To benchmark the model, general, medical reasoning, and summarization specific benchmarks were utilized. For the general benchmark, Massive Multitask Language Understanding (MMLU) Philosophy (Caballar & Stryker, 2025) was chosen and for the summarization specific benchmark, Extreme Summarization (XSum) was used to evaluate the model’s ability to generate effective summaries/abstracts when given long inputs that may be unstructured or have lots of technical language (Narayan et al., 1970). My general benchmark plan was as follows:
43
 
44
  medqa_4options (Domain / task specific): Assess model’s ability to perform medical reasoning when given complex multiple choice questions and be able to understand medical information
45