Instructions to use Dat1710/nexus-1.5b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Dat1710/nexus-1.5b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Dat1710/nexus-1.5b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Dat1710/nexus-1.5b")
model = AutoModelForCausalLM.from_pretrained("Dat1710/nexus-1.5b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Dat1710/nexus-1.5b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Dat1710/nexus-1.5b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Dat1710/nexus-1.5b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Dat1710/nexus-1.5b

SGLang

How to use Dat1710/nexus-1.5b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Dat1710/nexus-1.5b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Dat1710/nexus-1.5b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Dat1710/nexus-1.5b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Dat1710/nexus-1.5b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Dat1710/nexus-1.5b with Docker Model Runner:
```
docker model run hf.co/Dat1710/nexus-1.5b
```

nexus-1.5b

File size: 8,219 Bytes

a5d0d66
 
b2ce9eb
 
 
 
 
 
 
 
 
 
 
 
 
a5d0d66
 
b2ce9eb
a5d0d66
b2ce9eb
 
 
 
 
 
 
a5d0d66
280023f
a5d0d66
b2ce9eb
a5d0d66
b2ce9eb
a5d0d66
b2ce9eb
a5d0d66
b2ce9eb
a5d0d66
b2ce9eb
 
a5d0d66
b2ce9eb
a5d0d66
b2ce9eb
 
 
 
 
a5d0d66
b2ce9eb
a5d0d66
b2ce9eb
a5d0d66
b2ce9eb
a5d0d66
b2ce9eb
a5d0d66
b2ce9eb
 
 
 
 
 
 
 
 
 
 
 
 
a5d0d66
b2ce9eb
a5d0d66
b2ce9eb
a5d0d66
b2ce9eb
a5d0d66
b2ce9eb
 
 
 
 
a5d0d66
b2ce9eb
a5d0d66
b2ce9eb
 
 
 
a5d0d66
b2ce9eb
a5d0d66
b2ce9eb
 
 
 
 
a5d0d66
b2ce9eb
a5d0d66
b2ce9eb
a5d0d66
b2ce9eb
a5d0d66
b2ce9eb
 
a5d0d66
b2ce9eb
a5d0d66
b2ce9eb
 
 
 
 
 
a5d0d66
b2ce9eb
 
a5d0d66
b2ce9eb
 
 
 
a5d0d66
b2ce9eb
 
 
 
 
a5d0d66
b2ce9eb
a5d0d66
b2ce9eb
 
 
 
 
 
a5d0d66
b2ce9eb
 
 
 
a5d0d66
b2ce9eb
 
 
a5d0d66
b2ce9eb
a5d0d66
b2ce9eb
 
 
 
 
 
a5d0d66
b2ce9eb
a5d0d66
b2ce9eb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a5d0d66
b2ce9eb
a5d0d66
b2ce9eb
a5d0d66
b2ce9eb
a5d0d66
b2ce9eb
a5d0d66
b2ce9eb
a5d0d66
b2ce9eb
 
 
 
 
a5d0d66
b2ce9eb
a5d0d66
b2ce9eb
a5d0d66
b2ce9eb
a5d0d66
b2ce9eb
a5d0d66
b2ce9eb
a5d0d66
b2ce9eb
 
 
 
a5d0d66
b2ce9eb
a5d0d66
b2ce9eb
a5d0d66
b2ce9eb
a5d0d66
b2ce9eb
 
 
 
 
 
 
 
 
 
a5d0d66
b2ce9eb
a5d0d66
b2ce9eb
a5d0d66
b2ce9eb
a5d0d66
b2ce9eb
a5d0d66
b2ce9eb

---
library_name: transformers
tags:
  - math
  - reasoning
  - reinforcement-learning
  - qwen2
  - mathematics
  - chain-of-thought
license: apache-2.0
language:
  - en
  - zh
base_model: Qwen/Qwen2.5-Math-1.5B-Instruct
pipeline_tag: text-generation
---

# Nexus-1.5B

<p align="center">
  <img src="https://img.shields.io/badge/Base%20Model-Qwen2.5--Math--1.5B--Instruct-orange" />
  <img src="https://img.shields.io/badge/Parameters-1.54B-blue" />
  <img src="https://img.shields.io/badge/Method-LPRO-green" />
  <img src="https://img.shields.io/badge/MATH--500-80.2-red" />
  <img src="https://img.shields.io/badge/GSM8K-85.2-red" />
</p>

**Nexus-1.5B** is a 1.54-billion-parameter mathematical reasoning model developed by [Neuriton](https://www.facebook.com/neuriton), trained via **Length-Penalized Reward Optimization (LPRO)** — a novel reinforcement learning alignment method that improves both accuracy and response conciseness simultaneously.

Built on top of `Qwen2.5-Math-1.5B-Instruct`, Nexus-1.5B achieves **80.2 on MATH-500** and **85.2 on GSM8K** (CoT), surpassing its base model by **+4.4 points** on MATH-500 while reducing average response length by **14%**.

---

## What is LPRO?

Standard GRPO (Group Relative Policy Optimization) suffers from two key problems:

1. **Length bias** — short responses receive disproportionately large gradient signals, implicitly penalizing long correct derivations.
2. **Entropy collapse** — symmetric probability-ratio clipping causes the policy to converge to a narrow set of solution patterns, limiting further improvement.

**LPRO** fixes both with three targeted modifications:

| Component | What it does |
|---|---|
| **Asymmetric clipping** | Decouples the lower and upper clip bounds (`ε_low=0.20`, `ε_high=0.28`) to preserve policy entropy |
| **Token-level normalization** | Replaces per-response weight `1/G` with global weight `1/Σ|oᵢ|` to produce an unbiased gradient estimate |
| **Length-penalized advantage** | Adds a group-standardized length penalty: `Aᵢ = (rᵢ - μᵣ)/(σᵣ + ε) - λ·(Lᵢ - μ_L)/(σ_L + ε)` |

The final objective is:

$$\mathcal{J}_{\text{LPRO}}(\theta) = \mathbb{E}\left[\frac{1}{\sum_{i=1}^{G}|o_i|} \sum_{i=1}^{G}\sum_{t=1}^{|o_i|} \min\!\left(r_{i,t}(\theta)\,\hat{A}_{i,t},\ \text{clip}_{\text{asym}}(r_{i,t}(\theta))\,\hat{A}_{i,t}\right)\right]$$

---

## Model Details

| Property | Value |
|---|---|
| **Base model** | `Qwen/Qwen2.5-Math-1.5B-Instruct` |
| **Parameters** | 1.54B |
| **Architecture** | Transformer Decoder (28 layers, GQA, RoPE, SwiGLU, RMSNorm) |
| **Context length** | 8,192 tokens |
| **Vocabulary size** | 128,256 |
| **Training method** | LPRO (RL fine-tuning, no distillation) |
| **Training data** | 100 difficulty-filtered problems from MATH-500 |
| **Group size G** | 4 |
| **Length penalty λ** | 0.10 |
| **Learning rate** | 1e-6 |
| **PPO epochs/iter** | 4 |

---

## Benchmark Results

### Chain-of-Thought (CoT)

| Model | GSM8K | MATH-500 | MMLU-STEM | CMATH | GaoKao Cloze | GaoKao QA |
|---|---|---|---|---|---|---|
| Qwen2-Math-1.5B-Instruct | 84.2 | 69.4 | 54.9 | 79.6 | 59.7 | 50.7 |
| Qwen2.5-Math-1.5B-Instruct | 84.8 | 75.8 | 57.5 | 83.0 | 65.5 | 54.1 |
| **Nexus-1.5B** | **85.2** | **80.2** | **60.3** | **83.5** | **67.2** | **56.9** |

### Tool-Integrated Reasoning (TIR)

| Model | MATH-500 | Minerva Math | GaoKao 2023 EN | Olympiad Bench | College Math |
|---|---|---|---|---|---|
| Qwen2.5-Math-1.5B-Instruct | 80.0 | 34.0 | 68.0 | 49.0 | 54.0 |
| **Nexus-1.5B** | **84.0** | **40.0** | **74.0** | **56.0** | **57.0** |

### Ablation: Effect of Length Penalty (λ)

| λ | MATH-500 Acc. | Avg. Response Length |
|---|---|---|
| 0.0 (GRPO baseline) | 77.4 | 312 tokens |
| **0.1 (Nexus-1.5B)** | **80.2** | **268 tokens** |
| 0.3 (over-penalized) | 78.0 | 201 tokens |

> **Key insight:** At λ=0.1, accuracy and conciseness improve simultaneously. The length penalty acts as a de-noising regularizer — discouraging redundant steps rather than suppressing genuinely long derivations.

---

## How to Use

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Dat1710/nexus-1.5b"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# Chain-of-Thought prompt
system_prompt = "Please reason step by step, and put your final answer within \\boxed{}."

messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": "Find all functions f: ℝ⁺ → ℝ⁺ such that for each x ∈ ℝ⁺, there is exactly one y ∈ ℝ⁺ satisfying xf(y) + yf(x) ≤ 2."}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=2048,
    temperature=0.7,
    do_sample=True,
)

generated_ids = [
    output_ids[len(input_ids):]
    for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
```

### Tool-Integrated Reasoning (TIR)

```python
system_prompt = (
    "Please integrate natural language reasoning with programs to solve the problem above, "
    "and put your final answer within \\boxed{}."
)
```

---

## Evaluation Prompt Format

**CoT (8-shot for GSM8K, 4-shot for MATH-500):**
```
<|im_start|>system
Please reason step by step, and put your final answer within \boxed{}.<|im_end|>
<|im_start|>user
{problem}<|im_end|>
<|im_start|>assistant
```

**TIR (zero-shot):**
```
<|im_start|>system
Please integrate natural language reasoning with programs to solve the problem above,
and put your final answer within \boxed{}.<|im_end|>
<|im_start|>user
{problem}<|im_end|>
<|im_start|>assistant
```

---

## Training Details

### Data Curation

Training problems are sourced from **MATH-500** and filtered by difficulty using a learnable-zone criterion: a problem is retained if, among 8 sampled solutions from the base model, **between 2 and 5 are correct**. This yields 100 training problems that provide meaningful gradient signal — neither trivially easy nor intractably hard.

### Training Procedure

1. **Group sampling:** For each prompt, sample G=4 responses from the current policy.
2. **Reward computation:** Rule-based binary reward (correctness via symbolic answer matching) + small format bonus (α=0.1) for well-formed `\boxed{}` output.
3. **Advantage computation:** Compute length-penalized group z-score advantages.
4. **Policy update:** Maximize LPRO objective for 4 epochs per iteration.
5. **Iterate:** Set old policy ← new policy and repeat.

### Reward Function

$$r_i = \mathbf{1}[\hat{a}(o_i) = a^*] + 0.1 \cdot \mathbf{1}[\text{format}(o_i)]$$

where $\hat{a}(o_i)$ is the extracted answer from the last `\boxed{}` expression, verified via symbolic equivalence.

---

## Limitations

- **Scale:** Nexus-1.5B operates at 1.54B parameters. Hard olympiad problems (e.g., AIME) remain challenging for models at this scale.
- **Language:** Primarily optimized for English and Chinese mathematical text. Performance on other languages is not evaluated.
- **Domain:** Designed for mathematical reasoning. General language understanding or instruction-following tasks are outside the model's training distribution.
- **TIR dependency:** Tool-integrated reasoning requires a sandboxed Python interpreter at inference time.

---

## Citation

If you use Nexus-1.5B in your research, please cite:

```bibtex
@techreport{neuriton2026nexus,
  title     = {Nexus-1.5B: Length-Penalized Reward Optimization for Robust Mathematical Reasoning},
  author    = {Neuriton Team},
  institution = {Neuriton},
  year      = {2026},
  month     = {Summer},
  note      = {Technical Report}
}
```

---

## Acknowledgements

We thank the Qwen Team at Alibaba Group for open-sourcing the Qwen2.5-Math model family, and the authors of DAPO for the asymmetric clipping insight that is central to LPRO.

---

*Developed by [Neuriton](https://neuriton.ai) · Summer 2026*