---
pipeline_tag: text-generation
library_name: transformers
---

# [Test-Time Scaling with Reflective Generative Model](https://huggingface.co/papers/2507.01951)

**Project page:** [https://www.wenxiaobai.com/](https://www.wenxiaobai.com/)
**Code:** [https://github.com/MetaStone-AI/MetaStone-S1](https://github.com/MetaStone-AI/MetaStone-S1)

## Introduction
We release our first reflective generative model: MetaStone-S1.
With only 32B parameters, MetaStone-S1 performs comparably to the OpenAI-o3 series on mathematics, coding, and Chinese reasoning tasks.
<img src="./figures/performance.jpg" alt="Performance compared with OpenAI-o3-mini" width="800">

MetaStone‑S1 is trained based on our proposed **reflective generative form, which combines “Long-CoT Reinforcement Learning” and “Process Reward Learning” into a unified training form**.
This form enables a single model to simultaneously achieve deep reasoning and high-quality reasoning trajectory selection.
By sharing the backbone network between the PRMs and policy models, MetaStone‑S1 significantly reduces the inference cost of PRMs by 99%, resulting in faster and higher-quality responses.

<img src="./figures/intro.jpg" alt="Introduction" width="800">

This repo contains the training and evaluation code of MetaStone-S1. For full details please refer to our [paper](https://arxiv.org/abs/2507.01951) and [our official website](https://www.wenxiaobai.com/).

## Sample Usage

You can easily use MetaStone-S1 for text generation with the `transformers` library by setting `trust_remote_code=True`.
For full details on using the reflective generative model with its advanced features (SPRM inference, training, etc.), please refer to the [official GitHub repository](https://github.com/MetaStone-AI/MetaStone-S1).

```python
from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM
import torch

model_name = "MetaStoneTec/MetaStone-S1-1.5B" # Or MetaStoneTec/MetaStone-S1-7B, MetaStoneTec/MetaStone-S1-32B
pipe = pipeline(
    "text-generation",
    model=model_name,
    tokenizer=AutoTokenizer.from_pretrained(model_name, trust_remote_code=True),
    torch_dtype=torch.bfloat16, # or torch.float16 depending on your hardware
    device_map="auto",
    trust_remote_code=True, # Required for models with custom architectures like Qwen2
)

# Example: Text Generation
input_text = "The key to life is"
generated_text = pipe(input_text, max_new_tokens=20, do_sample=True)[0]["generated_text"]
print(f"Input: {input_text}
Output: {generated_text}")

# Example: Using chat template for conversational models
# Note: Ensure the tokenizer for the specific model has a chat template configured.
# You might need to load the model and tokenizer separately for chat templates.
# tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
# model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True)
# messages = [{"role": "user", "content": "Hi! How are you?"}]
# text = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
# inputs = tokenizer(text, return_tensors="pt").to(model.device)
# outputs = model.generate(inputs.input_ids, max_new_tokens=30)
# print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

## Performance

| Model                        | AIME24 | AIME25 | LiveCodeBench | C-EVAL |
|------------------------------|--------|--------|----------------|--------|
| DeepScaleR-1.5B-Preview      | 43.1   | 30.0   | -              | -      |
| R1-Distill-Qwen-1.5B         | 28.9   | 22.8   | 16.9           | 27.1   |
| R1-Distill-Qwen-7B           | <ins>55.5</ins>   | -      |  <ins>37.6</ins>           | -      |
| R1-Distill-Llama-8B          | 50.4   | -      | **39.6**           | -      |
| **MetaStone-S1-1.5B-low**    | 44.0   | 32.6   | 24.2           | 43.6   |
| **MetaStone-S1-1.5B-medium** | 53.1   | <ins>35.7</ins>   | 26.6           | 43.9   |
| **MetaStone-S1-1.5B-high**   | **57.9**   | **40.4**   | 28.1           | **44.1**   |


## Model

We save the parameters of the policy model and the SPRM head into two files:

- "model.safetensors" is the checkpoint of the policy model.

- "score_module.pt" is the checkpoint of the SPRM head.


You can find other sizes of MetaStone‑S1 below:

| Model|Transformers(HF) | ModelScope |
|---------------|---------|---------|
|MetaStone-S1-1.5B|[MetaStone-S1-1.5B](https://huggingface.co/MetaStoneTec/MetaStone-S1-1.5B)|[MetaStone-S1-1.5B](https://modelscope.cn/models/MetaStoneTec/MetaStone-S1-1.5B)|
|MetaStone-S1-7B|[MetaStone-S1-7B](https://huggingface.co/MetaStoneTec/MetaStone-S1-7B)|[MetaStone-S1-7B](https://modelscope.cn/models/MetaStoneTec/MetaStone-S1-7B)|
|MetaStone-S1-32B|[MetaStone-S1-32B](https://huggingface.co/MetaStoneTec/MetaStone-S1-32B)|[MetaStone-S1-32B](https://modelscope.cn/models/MetaStoneTec/MetaStone-S1-32B)|


## Evaluation
Since Huggingface models do not directly support inference on SPRM.
Please refer to [github](https://github.com/MetaStone-AI/MetaStone-S1) for the detailed training and testing pipeline.


## Citation
If you find our work helpful, feel free to give us a cite.
```
@misc{wang2025testtimescalingreflectivegenerative,
 title={Test-Time Scaling with Reflective Generative Model}, 
 author={Zixiao Wang and Yuxin Wang and Xiaorui Wang and Mengting Xing and Jie Gao and Jianjun Xu and Guangcan Liu and Chenhui Jin and Zhuo Wang and Shengzhuo Zhang and Hongtao Xie},
 year={2025},
 eprint={2507.01951},
 archivePrefix={arXiv},
 primaryClass={cs.LG},
 url={https://arxiv.org/abs/2507.01951}, 
}
```