| # STOP-1.5B: Early Path Pruning Module |
|
|
| This repository contains the STOP module trained for prefix-level path pruning on top of a 1.5B reasoning model. |
|
|
| ## Overview |
|
|
| STOP (Super TOken for Pruning) is a lightweight module that predicts whether a reasoning prefix is promising, enabling early pruning of unproductive paths. |
|
|
| It operates by: |
|
|
| - Appending a special `[STOP]` token |
| - Reading internal KV-cache states |
| - Producing a scalar quality score |
|
|
| ## Architecture |
|
|
| - Base model: frozen reasoning model (1.5B) |
| - Adapter: LoRA-based critique module |
| - Head: lightweight classifier |
|
|
| ## Training |
|
|
| The model is trained using prefix–potential supervision constructed via Monte Carlo rollouts. |
|
|
| ## Usage |
|
|
| After generating prefixes, STOP can be used to: |
|
|
| 1. Score each prefix |
| 2. Select top-k candidates |
| 3. Resume generation only on selected paths |
|
|
| ## Results |
|
|
| - Significant token reduction (up to 70%) |
| - Improved reasoning accuracy |
| - Strong performance in tool-use settings (AIMO3) |
|
|
| ## Citation |
|
|