Spaces:
Running
Running
MedInjection-FR commited on
Update README.md
Browse files
README.md
CHANGED
|
@@ -6,5 +6,22 @@ colorTo: gray
|
|
| 6 |
sdk: static
|
| 7 |
pinned: false
|
| 8 |
---
|
|
|
|
| 9 |
|
| 10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
sdk: static
|
| 7 |
pinned: false
|
| 8 |
---
|
| 9 |
+
# 🩹 MedInjection-FR
|
| 10 |
|
| 11 |
+
A **French biomedical instruction dataset and model suite** for studying how data provenance (**native, synthetic, translated**) impacts instruction-tuning of LLMs. [huggingface](https://huggingface.co/docs/hub/organizations-cards)
|
| 12 |
+
|
| 13 |
+
## 📊 Dataset Stats
|
| 14 |
+
|
| 15 |
+
**Total size**: 571,436 instruction–response pairs
|
| 16 |
+
|
| 17 |
+
**Components**:
|
| 18 |
+
- Native: 77,247
|
| 19 |
+
- Synthetic: 76,506
|
| 20 |
+
- Translated: 417,674
|
| 21 |
+
|
| 22 |
+
**Tasks**:
|
| 23 |
+
- MCQU (single-answer)
|
| 24 |
+
- MCQ (multi-answer)
|
| 25 |
+
- OEQ (open-ended)
|
| 26 |
+
|
| 27 |
+
***
|