Spaces:
Running
Running
metadata
title: README
emoji: 👁
colorFrom: purple
colorTo: gray
sdk: static
pinned: false
🩹 MedInjection-FR
A French biomedical instruction dataset and model suite for studying how data provenance (native, synthetic, translated) impacts instruction-tuning of LLMs.
📊 Dataset Stats
Total size: 571,436 instruction–response pairs
Components:
- Native: 77,247
- Synthetic: 76,506
- Translated: 417,674
Tasks:
- MCQU (single-answer)
- MCQ (multi-answer)
- OEQ (open-ended)