Add library_name metadata and link to code
#1
by nielsr HF Staff - opened
README.md
CHANGED
|
@@ -1,21 +1,27 @@
|
|
| 1 |
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
language:
|
| 4 |
-
- en
|
| 5 |
-
tags:
|
| 6 |
-
- science
|
| 7 |
-
- hypothesis-generation
|
| 8 |
-
- biomedical
|
| 9 |
-
- deepseek
|
| 10 |
-
- qwen2
|
| 11 |
base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
|
|
|
|
|
|
|
|
|
|
| 12 |
pipeline_tag: text-generation
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
---
|
| 14 |
|
| 15 |
# MOOSE-Star-HC-R1D-7B
|
| 16 |
|
| 17 |
**MOOSE-Star Hypothesis Composition model** — a 7B model fine-tuned for generating scientific hypotheses from research questions, background surveys, and cross-paper inspirations.
|
| 18 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 19 |
> **Note**: This model is referred to as **MS-HC-7B (w/ 1x bounded)** in the paper. The full name includes "R1D" to indicate it is fine-tuned from DeepSeek-R1-Distill-Qwen-7B; the SFT data can be used to train other base models as well.
|
| 20 |
|
| 21 |
## Model Description
|
|
@@ -24,7 +30,6 @@ pipeline_tag: text-generation
|
|
| 24 |
- **Training Method**: Full-parameter SFT (ZeRO-3)
|
| 25 |
- **Training Data**: [TOMATO-Star-SFT-Data-R1D-32B](https://huggingface.co/datasets/ZonglinY/TOMATO-Star-SFT-Data-R1D-32B) HC split (114,548 samples = 96,879 normal + 17,669 bounded, mixed 1x)
|
| 26 |
- **Teacher Model**: Training data generated via rejection sampling with [DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B)
|
| 27 |
-
- **Paper**: [MOOSE-Star: Unlocking Tractable Training for Scientific Discovery by Breaking the Complexity Barrier](https://arxiv.org/abs/2603.03756)
|
| 28 |
|
| 29 |
## Training Configuration
|
| 30 |
|
|
@@ -251,10 +256,11 @@ Scores on a rubric scale. "Total" aggregates Motivation (Mot), Mechanism (Mec),
|
|
| 251 |
@article{yang2025moosestar,
|
| 252 |
title={MOOSE-Star: Unlocking Tractable Training for Scientific Discovery by Breaking the Complexity Barrier},
|
| 253 |
author={Yang, Zonglin and Bing, Lidong},
|
|
|
|
| 254 |
year={2025}
|
| 255 |
}
|
| 256 |
```
|
| 257 |
|
| 258 |
## License
|
| 259 |
|
| 260 |
-
This model is released under the [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) license.
|
|
|
|
| 1 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
|
| 3 |
+
language:
|
| 4 |
+
- en
|
| 5 |
+
license: apache-2.0
|
| 6 |
pipeline_tag: text-generation
|
| 7 |
+
library_name: transformers
|
| 8 |
+
tags:
|
| 9 |
+
- science
|
| 10 |
+
- hypothesis-generation
|
| 11 |
+
- biomedical
|
| 12 |
+
- deepseek
|
| 13 |
+
- qwen2
|
| 14 |
---
|
| 15 |
|
| 16 |
# MOOSE-Star-HC-R1D-7B
|
| 17 |
|
| 18 |
**MOOSE-Star Hypothesis Composition model** — a 7B model fine-tuned for generating scientific hypotheses from research questions, background surveys, and cross-paper inspirations.
|
| 19 |
|
| 20 |
+
This model was introduced in the paper [MOOSE-Star: Unlocking Tractable Training for Scientific Discovery by Breaking the Complexity Barrier](https://arxiv.org/abs/2603.03756).
|
| 21 |
+
|
| 22 |
+
- **Code**: [ZonglinY/MOOSE-Star](https://github.com/ZonglinY/MOOSE-Star)
|
| 23 |
+
- **Paper**: [arXiv:2603.03756](https://arxiv.org/abs/2603.03756)
|
| 24 |
+
|
| 25 |
> **Note**: This model is referred to as **MS-HC-7B (w/ 1x bounded)** in the paper. The full name includes "R1D" to indicate it is fine-tuned from DeepSeek-R1-Distill-Qwen-7B; the SFT data can be used to train other base models as well.
|
| 26 |
|
| 27 |
## Model Description
|
|
|
|
| 30 |
- **Training Method**: Full-parameter SFT (ZeRO-3)
|
| 31 |
- **Training Data**: [TOMATO-Star-SFT-Data-R1D-32B](https://huggingface.co/datasets/ZonglinY/TOMATO-Star-SFT-Data-R1D-32B) HC split (114,548 samples = 96,879 normal + 17,669 bounded, mixed 1x)
|
| 32 |
- **Teacher Model**: Training data generated via rejection sampling with [DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B)
|
|
|
|
| 33 |
|
| 34 |
## Training Configuration
|
| 35 |
|
|
|
|
| 256 |
@article{yang2025moosestar,
|
| 257 |
title={MOOSE-Star: Unlocking Tractable Training for Scientific Discovery by Breaking the Complexity Barrier},
|
| 258 |
author={Yang, Zonglin and Bing, Lidong},
|
| 259 |
+
journal={arXiv preprint arXiv:2603.03756},
|
| 260 |
year={2025}
|
| 261 |
}
|
| 262 |
```
|
| 263 |
|
| 264 |
## License
|
| 265 |
|
| 266 |
+
This model is released under the [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) license.
|