Spaces:

Clinical-Reasoning-Hub
/

README

Running

App Files Files Community

🏆 Diagnostic-Reasoning-Q3 (Pentabrid architecture V9): 8B model ranks #3 on MedXpertQA, beating 70B and 671B models

pinned

by naturally-intuitive - opened Feb 18

Discussion

naturally-intuitive

Clinical Reasoning Labs for Medical Diagnostic Accuracy org Feb 18

Pentabrid V9 Evaluation Results

An 8B parameter model ranking #3 globally on MedXpertQA Text, behind only DeepSeek-R1 (671B) and o3-mini (proprietary).

MedXpertQA Text (ICML 2025 Benchmark)

Rank	Model	Parameters	Score
1	DeepSeek-R1	671B	37.8%
2	o3-mini	proprietary	37.3%
3	Pentabrid V9 (ours)	8B	24.9% (609/2450)
4	LLaMA-3.3-70B	70B	24.5%
5	DeepSeek-V3	671B	24.2%
6	Qwen2.5-72B	72B	18.9%

Reasoning subset: 473/1861 (25.4%)
Understanding subset: 136/589 (23.1%)
First sub-10B model ever evaluated on MedXpertQA

Generation-Based Scores

Benchmark	Score	Accuracy
MedQA (USMLE)	853/1273	67.0%
PubMedQA	695/1000	69.5%
MMLU Clinical Knowledge	226/265	85.3%
MedMCQA	1178/2000	58.9%

Log-Likelihood Scores (7-Benchmark Average: 76.4%)

Benchmark	Score	Accuracy
MMLU Professional Medicine	244/272	89.7%
MMLU Medical Genetics	88/100	88.0%
MMLU Clinical Knowledge	229/265	86.4%
MMLU Anatomy	107/135	79.3%
MedQA (USMLE)	844/1273	66.3%
PubMedQA	333/500	66.6%
MedMCQA	2451/4183	58.6%

Reasoning Tax Analysis

Generation mode consistently outperforms log-likelihood scoring for reasoning models:

Benchmark	Generation	Log-Likelihood	Delta
MedQA	853/1273 (67.0%)	844/1273 (66.3%)	+9 marks
PubMedQA	695/1000 (69.5%)	333/500 (66.6%)	+2.9pp
MedMCQA	1178/2000 (58.9%)	2451/4183 (58.6%)	+0.3pp

Built with the Pentabrid clinical reasoning methodology — integrating Bayesian likelihood ratios, diagnostic frameworks, and clinical behavior patterns. Fine-tuned from Qwen3-8B.

Dr Adnan Agha
College of Medicine & Health Sciences, United Arab Emirates University
IP Application #2442

naturally-intuitive pinned discussion Feb 18

naturally-intuitive

Clinical Reasoning Labs for Medical Diagnostic Accuracy org Feb 18

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment