Text Generation
Transformers
Safetensors
English
Chinese
qwen2
math
reasoning
reinforcement-learning
mathematics
chain-of-thought
conversational
text-generation-inference
Instructions to use Dat1710/nexus-1.5b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Dat1710/nexus-1.5b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Dat1710/nexus-1.5b") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Dat1710/nexus-1.5b") model = AutoModelForCausalLM.from_pretrained("Dat1710/nexus-1.5b") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Dat1710/nexus-1.5b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Dat1710/nexus-1.5b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Dat1710/nexus-1.5b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Dat1710/nexus-1.5b
- SGLang
How to use Dat1710/nexus-1.5b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Dat1710/nexus-1.5b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Dat1710/nexus-1.5b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Dat1710/nexus-1.5b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Dat1710/nexus-1.5b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Dat1710/nexus-1.5b with Docker Model Runner:
docker model run hf.co/Dat1710/nexus-1.5b
File size: 8,219 Bytes
a5d0d66 b2ce9eb a5d0d66 b2ce9eb a5d0d66 b2ce9eb a5d0d66 280023f a5d0d66 b2ce9eb a5d0d66 b2ce9eb a5d0d66 b2ce9eb a5d0d66 b2ce9eb a5d0d66 b2ce9eb a5d0d66 b2ce9eb a5d0d66 b2ce9eb a5d0d66 b2ce9eb a5d0d66 b2ce9eb a5d0d66 b2ce9eb a5d0d66 b2ce9eb a5d0d66 b2ce9eb a5d0d66 b2ce9eb a5d0d66 b2ce9eb a5d0d66 b2ce9eb a5d0d66 b2ce9eb a5d0d66 b2ce9eb a5d0d66 b2ce9eb a5d0d66 b2ce9eb a5d0d66 b2ce9eb a5d0d66 b2ce9eb a5d0d66 b2ce9eb a5d0d66 b2ce9eb a5d0d66 b2ce9eb a5d0d66 b2ce9eb a5d0d66 b2ce9eb a5d0d66 b2ce9eb a5d0d66 b2ce9eb a5d0d66 b2ce9eb a5d0d66 b2ce9eb a5d0d66 b2ce9eb a5d0d66 b2ce9eb a5d0d66 b2ce9eb a5d0d66 b2ce9eb a5d0d66 b2ce9eb a5d0d66 b2ce9eb a5d0d66 b2ce9eb a5d0d66 b2ce9eb a5d0d66 b2ce9eb a5d0d66 b2ce9eb a5d0d66 b2ce9eb a5d0d66 b2ce9eb a5d0d66 b2ce9eb a5d0d66 b2ce9eb a5d0d66 b2ce9eb a5d0d66 b2ce9eb a5d0d66 b2ce9eb a5d0d66 b2ce9eb a5d0d66 b2ce9eb a5d0d66 b2ce9eb a5d0d66 b2ce9eb a5d0d66 b2ce9eb a5d0d66 b2ce9eb a5d0d66 b2ce9eb a5d0d66 b2ce9eb a5d0d66 b2ce9eb a5d0d66 b2ce9eb a5d0d66 b2ce9eb | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 | ---
library_name: transformers
tags:
- math
- reasoning
- reinforcement-learning
- qwen2
- mathematics
- chain-of-thought
license: apache-2.0
language:
- en
- zh
base_model: Qwen/Qwen2.5-Math-1.5B-Instruct
pipeline_tag: text-generation
---
# Nexus-1.5B
<p align="center">
<img src="https://img.shields.io/badge/Base%20Model-Qwen2.5--Math--1.5B--Instruct-orange" />
<img src="https://img.shields.io/badge/Parameters-1.54B-blue" />
<img src="https://img.shields.io/badge/Method-LPRO-green" />
<img src="https://img.shields.io/badge/MATH--500-80.2-red" />
<img src="https://img.shields.io/badge/GSM8K-85.2-red" />
</p>
**Nexus-1.5B** is a 1.54-billion-parameter mathematical reasoning model developed by [Neuriton](https://www.facebook.com/neuriton), trained via **Length-Penalized Reward Optimization (LPRO)** — a novel reinforcement learning alignment method that improves both accuracy and response conciseness simultaneously.
Built on top of `Qwen2.5-Math-1.5B-Instruct`, Nexus-1.5B achieves **80.2 on MATH-500** and **85.2 on GSM8K** (CoT), surpassing its base model by **+4.4 points** on MATH-500 while reducing average response length by **14%**.
---
## What is LPRO?
Standard GRPO (Group Relative Policy Optimization) suffers from two key problems:
1. **Length bias** — short responses receive disproportionately large gradient signals, implicitly penalizing long correct derivations.
2. **Entropy collapse** — symmetric probability-ratio clipping causes the policy to converge to a narrow set of solution patterns, limiting further improvement.
**LPRO** fixes both with three targeted modifications:
| Component | What it does |
|---|---|
| **Asymmetric clipping** | Decouples the lower and upper clip bounds (`ε_low=0.20`, `ε_high=0.28`) to preserve policy entropy |
| **Token-level normalization** | Replaces per-response weight `1/G` with global weight `1/Σ|oᵢ|` to produce an unbiased gradient estimate |
| **Length-penalized advantage** | Adds a group-standardized length penalty: `Aᵢ = (rᵢ - μᵣ)/(σᵣ + ε) - λ·(Lᵢ - μ_L)/(σ_L + ε)` |
The final objective is:
$$\mathcal{J}_{\text{LPRO}}(\theta) = \mathbb{E}\left[\frac{1}{\sum_{i=1}^{G}|o_i|} \sum_{i=1}^{G}\sum_{t=1}^{|o_i|} \min\!\left(r_{i,t}(\theta)\,\hat{A}_{i,t},\ \text{clip}_{\text{asym}}(r_{i,t}(\theta))\,\hat{A}_{i,t}\right)\right]$$
---
## Model Details
| Property | Value |
|---|---|
| **Base model** | `Qwen/Qwen2.5-Math-1.5B-Instruct` |
| **Parameters** | 1.54B |
| **Architecture** | Transformer Decoder (28 layers, GQA, RoPE, SwiGLU, RMSNorm) |
| **Context length** | 8,192 tokens |
| **Vocabulary size** | 128,256 |
| **Training method** | LPRO (RL fine-tuning, no distillation) |
| **Training data** | 100 difficulty-filtered problems from MATH-500 |
| **Group size G** | 4 |
| **Length penalty λ** | 0.10 |
| **Learning rate** | 1e-6 |
| **PPO epochs/iter** | 4 |
---
## Benchmark Results
### Chain-of-Thought (CoT)
| Model | GSM8K | MATH-500 | MMLU-STEM | CMATH | GaoKao Cloze | GaoKao QA |
|---|---|---|---|---|---|---|
| Qwen2-Math-1.5B-Instruct | 84.2 | 69.4 | 54.9 | 79.6 | 59.7 | 50.7 |
| Qwen2.5-Math-1.5B-Instruct | 84.8 | 75.8 | 57.5 | 83.0 | 65.5 | 54.1 |
| **Nexus-1.5B** | **85.2** | **80.2** | **60.3** | **83.5** | **67.2** | **56.9** |
### Tool-Integrated Reasoning (TIR)
| Model | MATH-500 | Minerva Math | GaoKao 2023 EN | Olympiad Bench | College Math |
|---|---|---|---|---|---|
| Qwen2.5-Math-1.5B-Instruct | 80.0 | 34.0 | 68.0 | 49.0 | 54.0 |
| **Nexus-1.5B** | **84.0** | **40.0** | **74.0** | **56.0** | **57.0** |
### Ablation: Effect of Length Penalty (λ)
| λ | MATH-500 Acc. | Avg. Response Length |
|---|---|---|
| 0.0 (GRPO baseline) | 77.4 | 312 tokens |
| **0.1 (Nexus-1.5B)** | **80.2** | **268 tokens** |
| 0.3 (over-penalized) | 78.0 | 201 tokens |
> **Key insight:** At λ=0.1, accuracy and conciseness improve simultaneously. The length penalty acts as a de-noising regularizer — discouraging redundant steps rather than suppressing genuinely long derivations.
---
## How to Use
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "Dat1710/nexus-1.5b"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
# Chain-of-Thought prompt
system_prompt = "Please reason step by step, and put your final answer within \\boxed{}."
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": "Find all functions f: ℝ⁺ → ℝ⁺ such that for each x ∈ ℝ⁺, there is exactly one y ∈ ℝ⁺ satisfying xf(y) + yf(x) ≤ 2."}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=2048,
temperature=0.7,
do_sample=True,
)
generated_ids = [
output_ids[len(input_ids):]
for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
```
### Tool-Integrated Reasoning (TIR)
```python
system_prompt = (
"Please integrate natural language reasoning with programs to solve the problem above, "
"and put your final answer within \\boxed{}."
)
```
---
## Evaluation Prompt Format
**CoT (8-shot for GSM8K, 4-shot for MATH-500):**
```
<|im_start|>system
Please reason step by step, and put your final answer within \boxed{}.<|im_end|>
<|im_start|>user
{problem}<|im_end|>
<|im_start|>assistant
```
**TIR (zero-shot):**
```
<|im_start|>system
Please integrate natural language reasoning with programs to solve the problem above,
and put your final answer within \boxed{}.<|im_end|>
<|im_start|>user
{problem}<|im_end|>
<|im_start|>assistant
```
---
## Training Details
### Data Curation
Training problems are sourced from **MATH-500** and filtered by difficulty using a learnable-zone criterion: a problem is retained if, among 8 sampled solutions from the base model, **between 2 and 5 are correct**. This yields 100 training problems that provide meaningful gradient signal — neither trivially easy nor intractably hard.
### Training Procedure
1. **Group sampling:** For each prompt, sample G=4 responses from the current policy.
2. **Reward computation:** Rule-based binary reward (correctness via symbolic answer matching) + small format bonus (α=0.1) for well-formed `\boxed{}` output.
3. **Advantage computation:** Compute length-penalized group z-score advantages.
4. **Policy update:** Maximize LPRO objective for 4 epochs per iteration.
5. **Iterate:** Set old policy ← new policy and repeat.
### Reward Function
$$r_i = \mathbf{1}[\hat{a}(o_i) = a^*] + 0.1 \cdot \mathbf{1}[\text{format}(o_i)]$$
where $\hat{a}(o_i)$ is the extracted answer from the last `\boxed{}` expression, verified via symbolic equivalence.
---
## Limitations
- **Scale:** Nexus-1.5B operates at 1.54B parameters. Hard olympiad problems (e.g., AIME) remain challenging for models at this scale.
- **Language:** Primarily optimized for English and Chinese mathematical text. Performance on other languages is not evaluated.
- **Domain:** Designed for mathematical reasoning. General language understanding or instruction-following tasks are outside the model's training distribution.
- **TIR dependency:** Tool-integrated reasoning requires a sandboxed Python interpreter at inference time.
---
## Citation
If you use Nexus-1.5B in your research, please cite:
```bibtex
@techreport{neuriton2026nexus,
title = {Nexus-1.5B: Length-Penalized Reward Optimization for Robust Mathematical Reasoning},
author = {Neuriton Team},
institution = {Neuriton},
year = {2026},
month = {Summer},
note = {Technical Report}
}
```
---
## Acknowledgements
We thank the Qwen Team at Alibaba Group for open-sourcing the Qwen2.5-Math model family, and the authors of DAPO for the asymmetric clipping insight that is central to LPRO.
---
*Developed by [Neuriton](https://neuriton.ai) · Summer 2026* |