Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
MaziyarPanahi 
posted an update 2 days ago
Post
3694
🚨 Day 8/8: OpenMed Medical Reasoning Dataset Release - THE GRAND FINALE

Today I complete my 8-day release series with Medical-Reasoning-SFT-Mega.
The largest open medical reasoning dataset, combining 7 state-of-the-art AI models with fair distribution deduplication.

THE 7 SOURCE MODELS (Original Sample Counts):

1. Trinity-Mini: 810,284 samples
2. Qwen3-Next-80B: 604,249 samples
3. GPT-OSS-120B: 506,150 samples
4. Nemotron-Nano-30B: 444,544 samples
5. GLM-4.5-Air: 225,179 samples
6. MiniMax-M2.1: 204,773 samples
7. Baichuan-M3-235B: 124,520 samples

TOTAL BEFORE DEDUPLICATION: 2,919,699 samples

TOKEN COUNTS:
- Content tokens: 2.22 Billion
- Reasoning tokens: 1.56 Billion
- Total tokens: 3.78 Billion
- Samples with chain-of-thought: 100%

Quick Start:
from datasets import load_dataset
ds = load_dataset("OpenMed/Medical-Reasoning-SFT-Mega")


All datasets Apache 2.0 licensed. Free for research and commercial use.

Thank you for following OpenMed's release series. I can't wait to see what you build. 🔥

OpenMed/Medical-Reasoning-SFT-Mega
OpenMed/Medical-Reasoning-SFT-GPT-OSS-120B-V2
OpenMed/Medical-Reasoning-SFT-Trinity-Mini
OpenMed/Medical-Reasoning-SFT-GLM_4.5_Air
OpenMed/Medical-Reasoning-SFT-MiniMax-M2.1
OpenMed/Medical-Reasoning-SFT-Qwen3-Next-80B
OpenMed/Medical-Reasoning-SFT-Nemotron-Nano-30B
https://huggingface.co/datasets/OpenMed/Medical-Reasonin

https://huggingface.co/collections/OpenMed/medical-datasets

Please follow OpenMed 🤗

I love it, that's is really very helpful

So like I want to train my LLMs on all of your medical datasets, does OpenMed/Medical-Reasoning-SFT-Mega covers all of those medical datasets that are been separated for the particular model like qwen, nemotron, baichuan, gpt oss? Would you like to join our team? We are looking for people just like you!

You should also generate a medical datasets of GLM 4.7 and non reasoning data of GPT 5.2 with 1k samples in structured detail output, or we can provide you the OpenAI tools for free....as if you can work for our team..so like there is no need to spend your own money on API Cost, my username is ujjwal_tyagi.shirova on discord, so you can contact me there

i allways wanted to make my own model, intresting datasets like these will allow me to make my first model, thanks

·

you are welcome! please follow OpenMed for future release! 🤗