SciReasoner-8B / README.md

Improve model card: Add library, pipeline tags, paper link, and GitHub link (#1)

772c4ad verified 4 months ago

5.95 kB

	---
	license: apache-2.0
	library_name: transformers
	pipeline_tag: text-generation
	tags:
	- qwen
	- scientific-reasoning
	---

	# SciReasoner 8B: Laying the Scientific Reasoning Ground Across Disciplines

	[![arXiv](https://img.shields.io/badge/arXiv-2509.21320-b31b1b.svg)](https://arxiv.org/abs/2509.21320)
	[![Hugging Face](https://img.shields.io/badge/HuggingFace-SciReason-FFAE1A)](https://huggingface.co/SciReason)
	[![License](https://img.shields.io/badge/License-Apache_2.0-2D7DB1.svg)](https://www.apache.org/licenses/LICENSE-2.0)

	This repository contains the weight of SciReasoner-8B, a scientific reasoning foundation model. It was presented in the paper [SciReasoner: Laying the Scientific Reasoning Ground Across Disciplines](https://huggingface.co/papers/2509.21320).

	Code: https://github.com/open-sciencelab/SciReason

	---

	## Usage:

	## 🔧 Environment Setup

	```bash
	git clone https://github.com/open-sciencelab/SciReason.git
	cd SciReason
	conda create --name scireason python=3.10 -y
	conda activate scireason
	pip install -r requirements/training.txt
	pip install -e .
	```

	> Note:
	> The above instructions are for reference only.
	> You may need to adjust them depending on your operating system and environment.

	---

	## 🚀 Running Evaluation

	The evaluation script will automatically download the required datasets and models from [Hugging Face](https://huggingface.co/SciReason).
	Please ensure your environment has internet access.

	### Evaluate all datasets

	```bash
	opencompass examples_scireasoner/eval_all.py --max-num-worker 1
	```

	* Default model: [SciReasoner-8B](https://huggingface.co/SciReason/SciReasoner-8B)
	* You can replace it with your own model if needed.
	* The `--max-num-worker` option controls concurrency:

	* By default, each process uses one GPU.
	* Adjust it according to your available GPUs.

	---

	### Evaluate few-shot performance (e.g., for closed-source models like `o3`)

	```bash
	opencompass examples_scireasoner/eval_all_fewshot.py --max-num-worker 1
	```

	This script evaluates the few-shot capabilities of your model on all datasets.

	---

	### Evaluate specific datasets or custom models

	* To evaluate specific datasets:
	Modify the configuration file to set `datasets` as a list of the datasets you want to test.

	* To use custom models:
	Modify the configuration file to set `models` to your target model.

	* Reference format: `opencompass.configs.models.scireason.hf_scireasoner_8b`
	* For more model configuration options, please check the [OpenCompass documentation](https://opencompass.readthedocs.io/en/latest/).


	Got it! I’ll add a FAQ section with the issue and solution clearly explained. Here’s how it fits into your README:


	## ❓ FAQ

	### 1. `meteor_score` Error

	If you encounter an error related to `meteor_score`, you may need to download NLTK resources.

	Solution:
	In an environment with internet access, run:

	```python
	import nltk
	nltk.download('wordnet')
	```

	By default, the files are downloaded to `/root/nltk_data`.
	If you are using a conda environment and running on a compute node or container, download them into your conda environment instead:

	```python
	import nltk
	import os

	conda_path = os.path.join(os.environ["CONDA_PREFIX"], "nltk_data")
	nltk.download('wordnet', download_dir=conda_path)
	```

	You can check all search paths using:

	```python
	import nltk
	print(nltk.data.path)
	```

	### 2. Running on compute nodes without internet access

	If your compute node cannot access the internet due to security policies, you need to pre-download/cache the datasets and models on a node with internet access first.

	Recommended steps:

	1. Set the environment variable `HF_HOME` to a shared/public directory for Hugging Face cache.
	2. On a node with internet access, run a dummy model once to pre-cache everything:

	```bash
	opencompass examples_scireasoner/eval_all_debug.py --max-num-worker 16
	```
	3. Now, you can run the actual evaluation code on the compute node without needing internet access.

	### 3. Resuming from checkpoints & step-wise evaluation

	Because the datasets are large and evaluation can be time-consuming, OpenCompass supports resuming from checkpoints and running evaluations in separate stages.

	* To resume from a checkpoint, use the `-r` flag with the timestamp of the previous run:

	```bash
	opencompass examples_scireasoner/eval_all.py -r <timestamp>
	```

	* To run specific stages only, use the `--mode` flag with one of the following options:

	* `all` – Run the full pipeline (default)
	* `infer` – Run inference only
	* `eval` – Run evaluation only
	* `viz` – Run visualization only

	For more details, please refer to the [OpenCompass Quick Start Guide](https://opencompass.readthedocs.io/en/latest/get_started/quick_start.html).

	### 4. Dataset size cache issue

	If you only want to test a subset of a dataset by modifying the code to trim it, be aware that OpenCompass caches the dataset size.

	Before running the evaluation, it is recommended to either:
	- Delete the entire cache file:
	```
	rm .cache/dataset_size.json
	```
	- Or remove the corresponding line for the modified dataset from the cache file.

	This ensures that OpenCompass recalculates the dataset size correctly.


	---

	## 🏗️ Codebase and References

	This repository is built on top of [OpenCompass v0.4.2](https://github.com/open-compass/opencompass/tree/0.4.2) with custom modifications.
	We plan to merge the changes back into the main OpenCompass branch in the future.

	For more usage details, please refer to the [OpenCompass documentation](https://opencompass.readthedocs.io/en/latest/).


	---

	## 📜 License

	This project is licensed under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0).



	You are free to use, modify, and distribute this project under the terms of the Apache 2.0 license.
	See the [LICENSE](LICENSE) file for full details.