Upload README.md with huggingface_hub

a9f47f6 verified 8 days ago

9.45 kB

	---
	language:
	- en
	license: apache-2.0
	base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
	tags:
	- scientific-discovery
	- hypothesis-generation
	- inspiration-retrieval
	- multi-task
	datasets:
	- ZonglinY/TOMATO-Star-SFT-Data-R1D-32B
	library_name: transformers
	pipeline_tag: text-generation
	---

	# MOOSE-Star-R1D-7B Model Card

	## Overview

	MOOSE-Star-R1D-7B (referred to as MS-7B in the paper) is a 7B parameter multi-task language model fine-tuned for both inspiration retrieval and hypothesis composition in scientific discovery workflows. It matches the IR performance of the single-task model ([MOOSE-Star-IR-R1D-7B](https://huggingface.co/ZonglinY/MOOSE-Star-IR-R1D-7B)) while significantly outperforming the single-task HC model ([MOOSE-Star-HC-R1D-7B](https://huggingface.co/ZonglinY/MOOSE-Star-HC-R1D-7B)), all in a single unified model.

	- Paper: [MOOSE-Star: Unlocking Tractable Training for Scientific Discovery by Breaking the Complexity Barrier](https://arxiv.org/abs/2603.03756) (arXiv:2603.03756)
	- Base Model: [DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B)
	- License: Apache 2.0
	- Code: [ZonglinY/MOOSE-Star](https://github.com/ZonglinY/MOOSE-Star)

	## Model Description

	\| Parameter \| Value \|
	\|-----------\|-------\|
	\| Base Model \| DeepSeek-R1-Distill-Qwen-7B \|
	\| Training Method \| Full-parameter SFT (ZeRO-3) \|
	\| Training Data \| TOMATO-Star-SFT-Data-R1D-32B: IR split (150,218 samples) + HC split with 1x bounded (114,548 samples) \|
	\| Chat Template \| deepseekr1 \|
	\| Cutoff Length \| 16384 \|
	\| Learning Rate \| 1e-5 \|
	\| Epochs \| 1 \|
	\| Batch Size \| 128 \|

	## Task 1: Inspiration Retrieval (IR)

	The model selects the most relevant cross-paper inspiration from 15 candidates (A-O) that includes 1 correct inspiration and 14 hard negatives.

	### IR Prompt Format (Simplified Overview)

	The full prompt template is constructed via `instruction_prompts()` in the code examples below. The general structure is:

	```
	[Task instruction preamble]

	## Context

	Research Question:
	{research_question}

	Background Survey (existing methods for THIS task):
	{background_survey}

	Previous Hypothesis (if any):
	{previous_hypothesis_or_none}

	## Candidate Inspiration Papers

	### Candidate [A]
	Title: {title_A}
	Abstract: {abstract_A}

	... (15 candidates total, A through O)

	## Output Format

	<think>
	[reasoning process]
	</think>

	Selected ID starts: [X] Selected ID ends

	Selection Reason starts: [reason] Selection Reason ends
	```

	### IR Usage

	Prerequisites: Clone the [MOOSE-Star repo](https://github.com/ZonglinY/MOOSE-Star) for prompt templates and inference utilities:
	```bash
	git clone https://github.com/ZonglinY/MOOSE-Star.git && cd MOOSE-Star
	# See requirements.txt for full dependencies; at minimum: pip install transformers torch
	```

	#### Option A: SGLang Deployment (Recommended)

	```bash
	# SGLang requires a separate environment; see https://github.com/sgl-project/sglang for installation
	# Start the server
	python -m sglang.launch_server --model-path ZonglinY/MOOSE-Star-R1D-7B --port 1235
	```

	```python
	import sys
	sys.path.insert(0, "./Inference")
	from ir_probability_extractor import IRProbabilityExtractor

	extractor = IRProbabilityExtractor(base_urls=["http://localhost:1235/v1"])
	result = extractor.get_selection_probabilities(
	research_question="Your research question",
	background_survey="Your background survey",
	candidates=[
	{"title": "Candidate A title", "abstract": "Candidate A abstract"},
	{"title": "Candidate B title", "abstract": "Candidate B abstract"},
	# ... up to 15 candidates (labeled A-O)
	],
	)
	print(f"Selected: [{result.selected_label}]")
	print(f"Probabilities: {result.probabilities}")
	```

	#### Option B: Direct HuggingFace Inference

	```python
	import sys
	sys.path.insert(0, "./utils")
	from prompt_store import instruction_prompts
	from transformers import AutoModelForCausalLM, AutoTokenizer
	import re

	model_name = "ZonglinY/MOOSE-Star-R1D-7B"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(model_name, dtype="auto", device_map="auto")

	p = instruction_prompts("inspiration_retrieval_with_reasoning_with_alphabetical_candidates")

	candidates = [{"title": "...", "abstract": "..."}, ...]
	candidates_text = "".join(
	f"### Candidate [{chr(ord('A') + i)}]\nTitle: {c['title']}\nAbstract: {c['abstract']}\n\n"
	for i, c in enumerate(candidates)
	)

	research_question = "Your research question"
	background_survey = "Your background survey"
	prompt = (p[0] + research_question
	+ p[1] + background_survey
	+ p[2] + "No previous hypothesis."
	+ p[3] + candidates_text
	+ p[4])

	messages = [{"role": "user", "content": prompt}]
	formatted = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=False)
	formatted += "<\uff5cAssistant\uff5c>"

	inputs = tokenizer(formatted, return_tensors="pt").to(model.device)
	outputs = model.generate(**inputs, max_new_tokens=8192, temperature=0.6, top_p=0.9, do_sample=True)
	response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)

	match = re.search(r"\\Selected ID starts:\\\s\[(\w)\]\s\\Selected ID ends\\", response)
	if match:
	print(f"Selected: [{match.group(1)}]")
	```

	## Task 2: Hypothesis Composition (HC)

	The model generates delta hypotheses from inspiration papers. Given a research question, background survey, and new inspiration paper, it outputs structured hypothesis components.

	### HC Prompt Format (Simplified Overview)

	The full prompt template is constructed via `instruction_prompts()` in the code examples below. The general structure is:

	```
	[Task instruction preamble]

	## Information Provided

	Research Question:
	{research_question}

	Background Survey:
	{background_survey}

	Previous Hypothesis:
	{previous_hypothesis_or_none}

	New Inspiration Paper Title:
	{inspiration_title}

	New Inspiration Paper Abstract:
	{inspiration_abstract}

	## Your Response

	<think>
	[reasoning process]
	</think>

	Inspiration: [Key concept]
	- Motivation (WHY): [Why this addresses a gap]
	- Mechanism (HOW IT WORKS): [How the concept works]
	- Methodology (HOW IT'S INTEGRATED): [Implementation steps]
	```

	### HC Usage

	```python
	import sys
	sys.path.insert(0, "./utils")
	from prompt_store import instruction_prompts
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_name = "ZonglinY/MOOSE-Star-R1D-7B"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(model_name, dtype="auto", device_map="auto")

	p = instruction_prompts("prepare_HC_sft_data_to_go_comprehensive_v2_delta")

	research_question = "Your research question here"
	background_survey = "Your background survey here"
	inspiration_title = "Inspiration paper title"
	inspiration_abstract = "Inspiration paper abstract"

	prompt = (p[0] + research_question
	+ p[1] + background_survey
	+ p[2] + "No previous hypothesis."
	+ p[3] + inspiration_title
	+ p[4] + inspiration_abstract
	+ p[5])

	messages = [{"role": "user", "content": prompt}]
	formatted = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=False)
	formatted += "<\uff5cAssistant\uff5c>"

	inputs = tokenizer(formatted, return_tensors="pt").to(model.device)
	outputs = model.generate(**inputs, max_new_tokens=8192, temperature=0.6, top_p=0.9, do_sample=True)
	response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
	print(response)
	```

	## Evaluation Results

	### Inspiration Retrieval (Table 1)

	\| Model \| Accuracy \|
	\|-------\|----------\|
	\| Random Selection \| 6.70% \|
	\| R1-Distilled-Qwen-7B (base) \| 28.42% \|
	\| MS-IR-7B (single-task) \| 54.37% \|
	\| MS-7B (this model) \| 54.34% \|

	### Hypothesis Composition - Normal (Table 2)

	Rubric-based evaluation with ground-truth inspirations (Judge: GPT-4o):

	\| Model \| Total \| Mot \| Mec \| Met \| Length \|
	\|-------\|-------\|-----\|-----\|-----\|--------\|
	\| R1-Distilled-Qwen-7B (base) \| 4.05 \| 1.96 \| 1.30 \| 0.80 \| 231.02 \|
	\| MS-HC-7B (single-task) \| 4.68 \| 2.13 \| 1.46 \| 1.09 \| 204.12 \|
	\| MS-HC-7B w/ 1x bounded \| 4.74 \| 2.16 \| 1.48 \| 1.10 \| 203.84 \|
	\| MS-7B (this model) \| 5.02 \| 2.22 \| 1.59 \| 1.20 \| 208.98 \|

	### Hypothesis Composition - Bounded (Table 3)

	Performance under varying levels of inspiration noise (Judge: GPT-4o):

	\| Model \| Easy Total \| Medium Total \| Hard Total \|
	\|-------\|-----------\|-------------\|-----------\|
	\| R1-Distilled-Qwen-7B (base) \| 2.72 \| 2.27 \| 2.00 \|
	\| MS-HC-7B w/ 2x bounded \| 3.18 \| 2.74 \| 2.56 \|
	\| MS-7B (this model) \| 3.37 \| 2.86 \| 2.78 \|

	## Key Findings

	- IR performance preserved: Multi-task training maintains full IR accuracy (54.34% vs 54.37% single-task)
	- HC significantly improved: Multi-task HC outperforms all single-task variants, including those with bounded composition augmentation
	- Robust under noise: Largest improvements on Hard bounded composition, suggesting IR reasoning skills transfer to HC

	## Citation

	```bibtex
	@article{yang2025moosestar,
	title={MOOSE-Star: Unlocking Tractable Training for Scientific Discovery by Breaking the Complexity Barrier},
	author={Yang, Zonglin and Bing, Lidong},
	journal={arXiv preprint arXiv:2603.03756},
	year={2026}
	}
	```