Instructions to use Dat1710/nexus-1.5b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Dat1710/nexus-1.5b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Dat1710/nexus-1.5b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Dat1710/nexus-1.5b")
model = AutoModelForCausalLM.from_pretrained("Dat1710/nexus-1.5b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Dat1710/nexus-1.5b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Dat1710/nexus-1.5b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Dat1710/nexus-1.5b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Dat1710/nexus-1.5b

SGLang

How to use Dat1710/nexus-1.5b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Dat1710/nexus-1.5b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Dat1710/nexus-1.5b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Dat1710/nexus-1.5b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Dat1710/nexus-1.5b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Dat1710/nexus-1.5b with Docker Model Runner:
```
docker model run hf.co/Dat1710/nexus-1.5b
```

nexus-1.5b / README.md

Dat1710

Update README.md

280023f verified 17 days ago

preview code

raw

history blame contribute delete

8.22 kB

	---
	library_name: transformers
	tags:
	- math
	- reasoning
	- reinforcement-learning
	- qwen2
	- mathematics
	- chain-of-thought
	license: apache-2.0
	language:
	- en
	- zh
	base_model: Qwen/Qwen2.5-Math-1.5B-Instruct
	pipeline_tag: text-generation
	---

	# Nexus-1.5B

	<p align="center">
	<img src="https://img.shields.io/badge/Base%20Model-Qwen2.5--Math--1.5B--Instruct-orange" />
	<img src="https://img.shields.io/badge/Parameters-1.54B-blue" />
	<img src="https://img.shields.io/badge/Method-LPRO-green" />
	<img src="https://img.shields.io/badge/MATH--500-80.2-red" />
	<img src="https://img.shields.io/badge/GSM8K-85.2-red" />
	</p>

	Nexus-1.5B is a 1.54-billion-parameter mathematical reasoning model developed by [Neuriton](https://www.facebook.com/neuriton), trained via Length-Penalized Reward Optimization (LPRO) — a novel reinforcement learning alignment method that improves both accuracy and response conciseness simultaneously.

	Built on top of `Qwen2.5-Math-1.5B-Instruct`, Nexus-1.5B achieves 80.2 on MATH-500 and 85.2 on GSM8K (CoT), surpassing its base model by +4.4 points on MATH-500 while reducing average response length by 14%.

	---

	## What is LPRO?

	Standard GRPO (Group Relative Policy Optimization) suffers from two key problems:

	1. Length bias — short responses receive disproportionately large gradient signals, implicitly penalizing long correct derivations.
	2. Entropy collapse — symmetric probability-ratio clipping causes the policy to converge to a narrow set of solution patterns, limiting further improvement.

	LPRO fixes both with three targeted modifications:

	\| Component \| What it does \|
	\|---\|---\|
	\| Asymmetric clipping \| Decouples the lower and upper clip bounds (`ε_low=0.20`, `ε_high=0.28`) to preserve policy entropy \|
	\| Token-level normalization \| Replaces per-response weight `1/G` with global weight `1/Σ\|oᵢ\|` to produce an unbiased gradient estimate \|
	\| Length-penalized advantage \| Adds a group-standardized length penalty: `Aᵢ = (rᵢ - μᵣ)/(σᵣ + ε) - λ·(Lᵢ - μ_L)/(σ_L + ε)` \|

	The final objective is:

	$$\mathcal{J}_{\text{LPRO}}(\theta) = \mathbb{E}\left[\frac{1}{\sum_{i=1}^{G}\|o_i\|} \sum_{i=1}^{G}\sum_{t=1}^{\|o_i\|} \min\!\left(r_{i,t}(\theta)\,\hat{A}_{i,t},\ \text{clip}_{\text{asym}}(r_{i,t}(\theta))\,\hat{A}_{i,t}\right)\right]$$

	---

	## Model Details

	\| Property \| Value \|
	\|---\|---\|
	\| Base model \| `Qwen/Qwen2.5-Math-1.5B-Instruct` \|
	\| Parameters \| 1.54B \|
	\| Architecture \| Transformer Decoder (28 layers, GQA, RoPE, SwiGLU, RMSNorm) \|
	\| Context length \| 8,192 tokens \|
	\| Vocabulary size \| 128,256 \|
	\| Training method \| LPRO (RL fine-tuning, no distillation) \|
	\| Training data \| 100 difficulty-filtered problems from MATH-500 \|
	\| Group size G \| 4 \|
	\| Length penalty λ \| 0.10 \|
	\| Learning rate \| 1e-6 \|
	\| PPO epochs/iter \| 4 \|

	---

	## Benchmark Results

	### Chain-of-Thought (CoT)

	\| Model \| GSM8K \| MATH-500 \| MMLU-STEM \| CMATH \| GaoKao Cloze \| GaoKao QA \|
	\|---\|---\|---\|---\|---\|---\|---\|
	\| Qwen2-Math-1.5B-Instruct \| 84.2 \| 69.4 \| 54.9 \| 79.6 \| 59.7 \| 50.7 \|
	\| Qwen2.5-Math-1.5B-Instruct \| 84.8 \| 75.8 \| 57.5 \| 83.0 \| 65.5 \| 54.1 \|
	\| Nexus-1.5B \| 85.2 \| 80.2 \| 60.3 \| 83.5 \| 67.2 \| 56.9 \|

	### Tool-Integrated Reasoning (TIR)

	\| Model \| MATH-500 \| Minerva Math \| GaoKao 2023 EN \| Olympiad Bench \| College Math \|
	\|---\|---\|---\|---\|---\|---\|
	\| Qwen2.5-Math-1.5B-Instruct \| 80.0 \| 34.0 \| 68.0 \| 49.0 \| 54.0 \|
	\| Nexus-1.5B \| 84.0 \| 40.0 \| 74.0 \| 56.0 \| 57.0 \|

	### Ablation: Effect of Length Penalty (λ)

	\| λ \| MATH-500 Acc. \| Avg. Response Length \|
	\|---\|---\|---\|
	\| 0.0 (GRPO baseline) \| 77.4 \| 312 tokens \|
	\| 0.1 (Nexus-1.5B) \| 80.2 \| 268 tokens \|
	\| 0.3 (over-penalized) \| 78.0 \| 201 tokens \|

	> Key insight: At λ=0.1, accuracy and conciseness improve simultaneously. The length penalty acts as a de-noising regularizer — discouraging redundant steps rather than suppressing genuinely long derivations.

	---

	## How to Use

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_name = "Dat1710/nexus-1.5b"

	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	torch_dtype="auto",
	device_map="auto"
	)

	# Chain-of-Thought prompt
	system_prompt = "Please reason step by step, and put your final answer within \\boxed{}."

	messages = [
	{"role": "system", "content": system_prompt},
	{"role": "user", "content": "Find all functions f: ℝ⁺ → ℝ⁺ such that for each x ∈ ℝ⁺, there is exactly one y ∈ ℝ⁺ satisfying xf(y) + yf(x) ≤ 2."}
	]

	text = tokenizer.apply_chat_template(
	messages,
	tokenize=False,
	add_generation_prompt=True
	)

	model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

	generated_ids = model.generate(
	**model_inputs,
	max_new_tokens=2048,
	temperature=0.7,
	do_sample=True,
	)

	generated_ids = [
	output_ids[len(input_ids):]
	for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
	]

	response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
	print(response)
	```

	### Tool-Integrated Reasoning (TIR)

	```python
	system_prompt = (
	"Please integrate natural language reasoning with programs to solve the problem above, "
	"and put your final answer within \\boxed{}."
	)
	```

	---

	## Evaluation Prompt Format

	CoT (8-shot for GSM8K, 4-shot for MATH-500):
	```
	<\|im_start\|>system
	Please reason step by step, and put your final answer within \boxed{}.<\|im_end\|>
	<\|im_start\|>user
	{problem}<\|im_end\|>
	<\|im_start\|>assistant
	```

	TIR (zero-shot):
	```
	<\|im_start\|>system
	Please integrate natural language reasoning with programs to solve the problem above,
	and put your final answer within \boxed{}.<\|im_end\|>
	<\|im_start\|>user
	{problem}<\|im_end\|>
	<\|im_start\|>assistant
	```

	---

	## Training Details

	### Data Curation

	Training problems are sourced from MATH-500 and filtered by difficulty using a learnable-zone criterion: a problem is retained if, among 8 sampled solutions from the base model, between 2 and 5 are correct. This yields 100 training problems that provide meaningful gradient signal — neither trivially easy nor intractably hard.

	### Training Procedure

	1. Group sampling: For each prompt, sample G=4 responses from the current policy.
	2. Reward computation: Rule-based binary reward (correctness via symbolic answer matching) + small format bonus (α=0.1) for well-formed `\boxed{}` output.
	3. Advantage computation: Compute length-penalized group z-score advantages.
	4. Policy update: Maximize LPRO objective for 4 epochs per iteration.
	5. Iterate: Set old policy ← new policy and repeat.

	### Reward Function

	$$r_i = \mathbf{1}[\hat{a}(o_i) = a^*] + 0.1 \cdot \mathbf{1}[\text{format}(o_i)]$$

	where $\hat{a}(o_i)$ is the extracted answer from the last `\boxed{}` expression, verified via symbolic equivalence.

	---

	## Limitations

	- Scale: Nexus-1.5B operates at 1.54B parameters. Hard olympiad problems (e.g., AIME) remain challenging for models at this scale.
	- Language: Primarily optimized for English and Chinese mathematical text. Performance on other languages is not evaluated.
	- Domain: Designed for mathematical reasoning. General language understanding or instruction-following tasks are outside the model's training distribution.
	- TIR dependency: Tool-integrated reasoning requires a sandboxed Python interpreter at inference time.

	---

	## Citation

	If you use Nexus-1.5B in your research, please cite:

	```bibtex
	@techreport{neuriton2026nexus,
	title = {Nexus-1.5B: Length-Penalized Reward Optimization for Robust Mathematical Reasoning},
	author = {Neuriton Team},
	institution = {Neuriton},
	year = {2026},
	month = {Summer},
	note = {Technical Report}
	}
	```

	---

	## Acknowledgements

	We thank the Qwen Team at Alibaba Group for open-sourcing the Qwen2.5-Math model family, and the authors of DAPO for the asymmetric clipping insight that is central to LPRO.

	---

	Developed by [Neuriton](https://neuriton.ai) · Summer 2026