| --- |
| language: |
| - en |
| license: apache-2.0 |
| base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B |
| tags: |
| - scientific-discovery |
| - hypothesis-generation |
| - inspiration-retrieval |
| - multi-task |
| datasets: |
| - ZonglinY/TOMATO-Star-SFT-Data-R1D-32B |
| library_name: transformers |
| pipeline_tag: text-generation |
| --- |
| |
| # MOOSE-Star-R1D-7B Model Card |
|
|
| ## Overview |
|
|
| **MOOSE-Star-R1D-7B** (referred to as **MS-7B** in the paper) is a 7B parameter multi-task language model fine-tuned for both **inspiration retrieval** and **hypothesis composition** in scientific discovery workflows. It matches the IR performance of the single-task model ([MOOSE-Star-IR-R1D-7B](https://huggingface.co/ZonglinY/MOOSE-Star-IR-R1D-7B)) while significantly outperforming the single-task HC model ([MOOSE-Star-HC-R1D-7B](https://huggingface.co/ZonglinY/MOOSE-Star-HC-R1D-7B)), all in a single unified model. |
|
|
| - **Paper**: [MOOSE-Star: Unlocking Tractable Training for Scientific Discovery by Breaking the Complexity Barrier](https://arxiv.org/abs/2603.03756) (arXiv:2603.03756) |
| - **Base Model**: [DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B) |
| - **License**: Apache 2.0 |
| - **Code**: [ZonglinY/MOOSE-Star](https://github.com/ZonglinY/MOOSE-Star) |
|
|
| ## Model Description |
|
|
| | Parameter | Value | |
| |-----------|-------| |
| | **Base Model** | DeepSeek-R1-Distill-Qwen-7B | |
| | **Training Method** | Full-parameter SFT (ZeRO-3) | |
| | **Training Data** | TOMATO-Star-SFT-Data-R1D-32B: IR split (150,218 samples) + HC split with 1x bounded (114,548 samples) | |
| | **Chat Template** | deepseekr1 | |
| | **Cutoff Length** | 16384 | |
| | **Learning Rate** | 1e-5 | |
| | **Epochs** | 1 | |
| | **Batch Size** | 128 | |
|
|
| ## Task 1: Inspiration Retrieval (IR) |
|
|
| The model selects the most relevant **cross-paper inspiration** from 15 candidates (A-O) that includes 1 correct inspiration and 14 hard negatives. |
|
|
| ### IR Prompt Format (Simplified Overview) |
|
|
| The full prompt template is constructed via `instruction_prompts()` in the code examples below. The general structure is: |
|
|
| ``` |
| [Task instruction preamble] |
| |
| ## Context |
| |
| **Research Question:** |
| {research_question} |
| |
| **Background Survey (existing methods for THIS task):** |
| {background_survey} |
| |
| **Previous Hypothesis (if any):** |
| {previous_hypothesis_or_none} |
| |
| ## Candidate Inspiration Papers |
| |
| ### Candidate [A] |
| **Title:** {title_A} |
| **Abstract:** {abstract_A} |
| |
| ... (15 candidates total, A through O) |
| |
| ## Output Format |
| |
| <think> |
| [reasoning process] |
| </think> |
| |
| **Selected ID starts:** [X] **Selected ID ends** |
| |
| **Selection Reason starts:** [reason] **Selection Reason ends** |
| ``` |
|
|
| ### IR Usage |
|
|
| **Prerequisites**: Clone the [MOOSE-Star repo](https://github.com/ZonglinY/MOOSE-Star) for prompt templates and inference utilities: |
| ```bash |
| git clone https://github.com/ZonglinY/MOOSE-Star.git && cd MOOSE-Star |
| # See requirements.txt for full dependencies; at minimum: pip install transformers torch |
| ``` |
|
|
| #### Option A: SGLang Deployment (Recommended) |
|
|
| ```bash |
| # SGLang requires a separate environment; see https://github.com/sgl-project/sglang for installation |
| # Start the server |
| python -m sglang.launch_server --model-path ZonglinY/MOOSE-Star-R1D-7B --port 1235 |
| ``` |
|
|
| ```python |
| import sys |
| sys.path.insert(0, "./Inference") |
| from ir_probability_extractor import IRProbabilityExtractor |
| |
| extractor = IRProbabilityExtractor(base_urls=["http://localhost:1235/v1"]) |
| result = extractor.get_selection_probabilities( |
| research_question="Your research question", |
| background_survey="Your background survey", |
| candidates=[ |
| {"title": "Candidate A title", "abstract": "Candidate A abstract"}, |
| {"title": "Candidate B title", "abstract": "Candidate B abstract"}, |
| # ... up to 15 candidates (labeled A-O) |
| ], |
| ) |
| print(f"Selected: [{result.selected_label}]") |
| print(f"Probabilities: {result.probabilities}") |
| ``` |
|
|
| #### Option B: Direct HuggingFace Inference |
|
|
| ```python |
| import sys |
| sys.path.insert(0, "./utils") |
| from prompt_store import instruction_prompts |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| import re |
| |
| model_name = "ZonglinY/MOOSE-Star-R1D-7B" |
| tokenizer = AutoTokenizer.from_pretrained(model_name) |
| model = AutoModelForCausalLM.from_pretrained(model_name, dtype="auto", device_map="auto") |
| |
| p = instruction_prompts("inspiration_retrieval_with_reasoning_with_alphabetical_candidates") |
| |
| candidates = [{"title": "...", "abstract": "..."}, ...] |
| candidates_text = "".join( |
| f"### Candidate [{chr(ord('A') + i)}]\n**Title:** {c['title']}\n**Abstract:** {c['abstract']}\n\n" |
| for i, c in enumerate(candidates) |
| ) |
| |
| research_question = "Your research question" |
| background_survey = "Your background survey" |
| prompt = (p[0] + research_question |
| + p[1] + background_survey |
| + p[2] + "No previous hypothesis." |
| + p[3] + candidates_text |
| + p[4]) |
| |
| messages = [{"role": "user", "content": prompt}] |
| formatted = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=False) |
| formatted += "<\uff5cAssistant\uff5c>" |
| |
| inputs = tokenizer(formatted, return_tensors="pt").to(model.device) |
| outputs = model.generate(**inputs, max_new_tokens=8192, temperature=0.6, top_p=0.9, do_sample=True) |
| response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True) |
| |
| match = re.search(r"\*\*Selected ID starts:\*\*\s*\[(\w)\]\s*\*\*Selected ID ends\*\*", response) |
| if match: |
| print(f"Selected: [{match.group(1)}]") |
| ``` |
|
|
| ## Task 2: Hypothesis Composition (HC) |
|
|
| The model generates **delta hypotheses** from inspiration papers. Given a research question, background survey, and new inspiration paper, it outputs structured hypothesis components. |
|
|
| ### HC Prompt Format (Simplified Overview) |
|
|
| The full prompt template is constructed via `instruction_prompts()` in the code examples below. The general structure is: |
|
|
| ``` |
| [Task instruction preamble] |
| |
| ## Information Provided |
| |
| **Research Question**: |
| {research_question} |
| |
| **Background Survey**: |
| {background_survey} |
| |
| **Previous Hypothesis**: |
| {previous_hypothesis_or_none} |
| |
| **New Inspiration Paper Title**: |
| {inspiration_title} |
| |
| **New Inspiration Paper Abstract**: |
| {inspiration_abstract} |
| |
| ## Your Response |
| |
| <think> |
| [reasoning process] |
| </think> |
| |
| Inspiration: [Key concept] |
| - Motivation (WHY): [Why this addresses a gap] |
| - Mechanism (HOW IT WORKS): [How the concept works] |
| - Methodology (HOW IT'S INTEGRATED): [Implementation steps] |
| ``` |
|
|
| ### HC Usage |
|
|
| ```python |
| import sys |
| sys.path.insert(0, "./utils") |
| from prompt_store import instruction_prompts |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| |
| model_name = "ZonglinY/MOOSE-Star-R1D-7B" |
| tokenizer = AutoTokenizer.from_pretrained(model_name) |
| model = AutoModelForCausalLM.from_pretrained(model_name, dtype="auto", device_map="auto") |
| |
| p = instruction_prompts("prepare_HC_sft_data_to_go_comprehensive_v2_delta") |
| |
| research_question = "Your research question here" |
| background_survey = "Your background survey here" |
| inspiration_title = "Inspiration paper title" |
| inspiration_abstract = "Inspiration paper abstract" |
| |
| prompt = (p[0] + research_question |
| + p[1] + background_survey |
| + p[2] + "No previous hypothesis." |
| + p[3] + inspiration_title |
| + p[4] + inspiration_abstract |
| + p[5]) |
| |
| messages = [{"role": "user", "content": prompt}] |
| formatted = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=False) |
| formatted += "<\uff5cAssistant\uff5c>" |
| |
| inputs = tokenizer(formatted, return_tensors="pt").to(model.device) |
| outputs = model.generate(**inputs, max_new_tokens=8192, temperature=0.6, top_p=0.9, do_sample=True) |
| response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True) |
| print(response) |
| ``` |
|
|
| ## Evaluation Results |
|
|
| ### Inspiration Retrieval (Table 1) |
|
|
| | Model | Accuracy | |
| |-------|----------| |
| | Random Selection | 6.70% | |
| | R1-Distilled-Qwen-7B (base) | 28.42% | |
| | MS-IR-7B (single-task) | 54.37% | |
| | **MS-7B (this model)** | **54.34%** | |
|
|
| ### Hypothesis Composition - Normal (Table 2) |
|
|
| Rubric-based evaluation with ground-truth inspirations (Judge: GPT-4o): |
|
|
| | Model | Total | Mot | Mec | Met | Length | |
| |-------|-------|-----|-----|-----|--------| |
| | R1-Distilled-Qwen-7B (base) | 4.05 | 1.96 | 1.30 | 0.80 | 231.02 | |
| | MS-HC-7B (single-task) | 4.68 | 2.13 | 1.46 | 1.09 | 204.12 | |
| | MS-HC-7B w/ 1x bounded | 4.74 | 2.16 | 1.48 | 1.10 | 203.84 | |
| | **MS-7B (this model)** | **5.02** | **2.22** | **1.59** | **1.20** | 208.98 | |
|
|
| ### Hypothesis Composition - Bounded (Table 3) |
|
|
| Performance under varying levels of inspiration noise (Judge: GPT-4o): |
|
|
| | Model | Easy Total | Medium Total | Hard Total | |
| |-------|-----------|-------------|-----------| |
| | R1-Distilled-Qwen-7B (base) | 2.72 | 2.27 | 2.00 | |
| | MS-HC-7B w/ 2x bounded | 3.18 | 2.74 | 2.56 | |
| | **MS-7B (this model)** | **3.37** | **2.86** | **2.78** | |
|
|
| ## Key Findings |
|
|
| - **IR performance preserved**: Multi-task training maintains full IR accuracy (54.34% vs 54.37% single-task) |
| - **HC significantly improved**: Multi-task HC outperforms all single-task variants, including those with bounded composition augmentation |
| - **Robust under noise**: Largest improvements on Hard bounded composition, suggesting IR reasoning skills transfer to HC |
|
|
| ## Citation |
|
|
| ```bibtex |
| @article{yang2025moosestar, |
| title={MOOSE-Star: Unlocking Tractable Training for Scientific Discovery by Breaking the Complexity Barrier}, |
| author={Yang, Zonglin and Bing, Lidong}, |
| journal={arXiv preprint arXiv:2603.03756}, |
| year={2026} |
| } |
| ``` |
|
|