Instructions to use RUC-AIBOX/STILL-3-1.5B-preview with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use RUC-AIBOX/STILL-3-1.5B-preview with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="RUC-AIBOX/STILL-3-1.5B-preview")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("RUC-AIBOX/STILL-3-1.5B-preview")
model = AutoModelForCausalLM.from_pretrained("RUC-AIBOX/STILL-3-1.5B-preview")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use RUC-AIBOX/STILL-3-1.5B-preview with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "RUC-AIBOX/STILL-3-1.5B-preview"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "RUC-AIBOX/STILL-3-1.5B-preview",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/RUC-AIBOX/STILL-3-1.5B-preview

SGLang

How to use RUC-AIBOX/STILL-3-1.5B-preview with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "RUC-AIBOX/STILL-3-1.5B-preview" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "RUC-AIBOX/STILL-3-1.5B-preview",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "RUC-AIBOX/STILL-3-1.5B-preview" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "RUC-AIBOX/STILL-3-1.5B-preview",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use RUC-AIBOX/STILL-3-1.5B-preview with Docker Model Runner:
```
docker model run hf.co/RUC-AIBOX/STILL-3-1.5B-preview
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Introduction

We release STILL-3-1.5B-preview, a slow-thinking reasoning model achieves 39.33% accuracy on AIME benchmark! We adapt reinforcement learning on 1.5B model and observe the continuous performance improvement as the number of training steps increased. For better reproducing our work and advancing research progress, we open-source our code, model, and data.

Code: https://github.com/RUCAIBox/Slow_Thinking_with_LLMs

Evaluation

We evaluated the model on four benchmarks: MATH, AIME, OMNI, and LiveAOPS. For MATH and AIME, we employed a sampling decoding setup with a sampling temperature of 0.6 and a top-p sampling probability of 0.95. Each question was sampled 64 times, and the average score was calculated. For OMNI and LiveAOPS (August-November 2024), we randomly sampled a subset of answers as integers to facilitate automated evaluation, and used greedy search decoding for the evaluation. The trained model, STILL-3-1.5B-preview, achieved significant improvement. The accuracy on the AIME task increased from 28.67% to 39.33%, resulting in a relative improvement of 37.18%.

	MATH	AIME	OMNI	LiveAOPS	Avg.
Backbone	84.04	28.67	25.60	33.33	42.91
STILL-3-1.5B-preview	85.48	39.33	33.00	39.50	49.33

Quick Start

from transformers import AutoTokenizer, AutoModelForCausalLM
from vllm import LLM, SamplingParams

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("RUC-AIBOX/STILL-3-1.5B-preview")
model = AutoModelForCausalLM.from_pretrained("RUC-AIBOX/STILL-3-1.5B-preview")

# Input text
question = "Convert the point $(0,3)$ in rectangular coordinates to polar coordinates.  Enter your answer in the form $(r,\\theta),$ where $r > 0$ and $0 \\le \\theta < 2 \\pi.$"

input_prompts = tokenizer.apply_chat_template(
                [
                {"role": "user", "content": question}],
                tokenize=False,
                add_generation_prompt=True
            )


# Params
llm = LLM(model=model_path, tensor_parallel_size=1, dtype='bfloat16')

sampling_params_gs = SamplingParams(temperature=0.6, top_p=0.95, max_tokens=32768, stop=stop_words, seed=42, skip_special_tokens=False)


# Completion
responses = model.generate(input_prompts, sampling_params)
print(responses[0].outputs[0].text)

Reference

Please kindly cite our reports if they are helpful for your research.

@article{Slow_Thinking_with_LLMs_3_Preview,
  title={STILL-3-1.5B-preview: Enhancing Slow Thinking Abilities of Small Models through Reinforcement Learning
},
  author={RUCAIBox STILL Team},
  url={https://github.com/RUCAIBox/Slow_Thinking_with_LLMs},
  year={2025}
}

@article{Slow_Thinking_with_LLMs_1,
  title={Enhancing LLM Reasoning with Reward-guided Tree Search},
  author={Jiang, Jinhao and Chen, Zhipeng and Min, Yingqian and Chen, Jie and Cheng, Xiaoxue and Wang, Jiapeng and Tang, Yiru and Sun, Haoxiang and Deng, Jia and Zhao, Wayne Xin and Liu, Zheng and Yan, Dong and Xie, Jian and Wang, Zhongyuan and Wen, Ji-Rong},
  journal={arXiv preprint arXiv:2411.11694},
  year={2024}
}

@article{Slow_Thinking_with_LLMs_2,
  title={Imitate, Explore, and Self-Improve: A Reproduction Report on Slow-thinking Reasoning Systems},
  author={Min, Yingqian and Chen, Zhipeng and Jiang, Jinhao and Chen, Jie and Deng, Jia and Hu, Yiwen and Tang, Yiru and Wang, Jiapeng and Cheng, Xiaoxue and Song, Huatong and Zhao, Wayne Xin and Liu, Zheng and Wang, Zhongyuan and Wen, Ji-Rong},
  journal={arXiv preprint arXiv:2412.09413},
  year={2024}
}

Downloads last month: 91

Safetensors

Model size

2B params

Tensor type

BF16

Model tree for RUC-AIBOX/STILL-3-1.5B-preview

Quantizations

3 models

Space using RUC-AIBOX/STILL-3-1.5B-preview 1

Papers for RUC-AIBOX/STILL-3-1.5B-preview

Imitate, Explore, and Self-Improve: A Reproduction Report on Slow-thinking Reasoning Systems

Paper • 2412.09413 • Published Dec 12, 2024 • 1

Technical Report: Enhancing LLM Reasoning with Reward-guided Tree Search

Paper • 2411.11694 • Published Nov 18, 2024