Text Generation
MLX
Safetensors
English
rodan-modern
rodan
tiny-language-model
reasoning
chain-of-thought
dpo
Instructions to use bfuzzy1/Rodan-Reasoning with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use bfuzzy1/Rodan-Reasoning with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # if on a CUDA device, also pip install mlx[cuda] # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("bfuzzy1/Rodan-Reasoning") prompt = "Once upon a time in" text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- MLX LM
How to use bfuzzy1/Rodan-Reasoning with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Generate some text mlx_lm.generate --model "bfuzzy1/Rodan-Reasoning" --prompt "Once upon a time"
File size: 7,156 Bytes
b743d9d | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 | ---
license: apache-2.0
language:
- en
library_name: mlx
pipeline_tag: text-generation
tags:
- rodan
- tiny-language-model
- mlx
- reasoning
- chain-of-thought
- dpo
base_model: bfuzzy1/Rodan-Chat
---
# Rodan-10M-Reasoning
A 10.41M-parameter reasoning model trained on a single Apple M2 with MLX. It stacks on the chat model and
adds **recurrent depth**: the same 8 transformer blocks run twice per forward pass, giving the effective
depth of a 16-layer network at **zero extra parameters**. The idea is to spend more compute per token on
hard problems without growing the model.
> What it is, honestly. The recurrence *mechanism* works, the probes show the second pass doing real
> compositional computation, and the activation-patching maps a genuine arithmetic circuit. The model does
> **accurate single-step arithmetic** and reads **natural-language word problems** into the right operation.
> A final **DPO** pass (verifiable preference pairs, KL-leashed) then fixed its restraint: it now answers
> simple facts directly instead of doing arithmetic on them (math-on-non-math prompts dropped from ~half to
> ~1 in 8), at no board cost. On the board it sits at **35.41**, about level with the base (35.80), because
> recurrent depth doesn't move discrimination benchmarks. The win is in *what it does*, not the board number.
> Part of the Rodan-10M series. Lineage: base v6 β v9 (PLE-free) β Chat (instruction fold) β **Reasoning
> (this model)**. Warm-started from Chat, so it keeps instruction-following and ChatML.
## Architecture
Same as the base/chat stack, dim 320, 8 layers, 8 heads, MQA (1 KV head), SwiGLU 768, RMSNorm, RoPE base
200k, QK-norm, tied embeddings, value-residual, LRM, no PLE, with two changes:
- **`recurse=2`**: the 8 blocks run twice over the residual stream (16 effective layers, still 10.41M params).
- **ChatML + `<think>` template** for reasoning turns; direct answers for simple ones.
Trained in **bfloat16** (~8Γ faster than fp32 on this M2 at this depth/length), seq 512.
## Training recipe
Warm-started from Chat, then trained at `recurse=2` on a natural-language-reasoning mix. The key lesson from
the first attempt: an arithmetic-symbol-heavy fold made the model narrow (it tried to compute *everything*).
This version leads with word problems and adds a slice of direct-answer examples to teach restraint.
| share | source | mode |
|---|---|---|
| 24% | natural-language word problems (synthesized) | `<think>` β answer |
| 21% | symbolic arithmetic CoT | `<think>` β answer |
| 8% | answer-only facts | direct, no `<think>` |
| 2% | GSM8K | `<think>` β answer |
| 45% | replay (smol-smoltalk + curated: Cosmopedia / dolmino / FineMath / sci-QA) | mixed |
No web data anywhere, the curated-only lineage held since v6. Optimizer: Muon + AdamW, LR 1.8e-3 / Muon 9e-3,
seq 512, 7000 steps, bf16.

## Does the recursion work?
Measured directly, the same way we probed value-residual and LRM on the base. The second pass earns its keep:

The model leans hard on the second pass, run it at recurse 1 and held-out loss is much worse (ppl 5.72 vs
4.29). It flips the predicted token on ~23% of positions, and raises the probability of the correct next token
almost everywhere (+0.26 log-prob on average). It sharpens digits (entropy drops 0.14) and, unlike the first
attempt, the **quantitative-language words recovered** (+0.23), the natural-language word problems taught it
to handle "more / less / total / twice", which symbolic arithmetic alone never did.
Activation patching maps the arithmetic circuit causally: operands bind early, the computation resolves around
block 5, the answer is written at block 6, and multi-step problems unroll across depth (step 2 binds deeper
than step 1). Factual recall has a different shape, a single late lookup at block 6 with no early work. The
full circuit atlas is in `circuit.html`.
## Evaluation
Zero-shot lm-eval, limit 1000, recurse 2, raw.
| Task | Metric | Reasoning | Chat | v9 base | v6 base |
|---|---|---|---|---|---|
| HellaSwag | acc_norm | 31.9 | 30.1 | 30.1 | 31.8 |
| ARC-Easy | acc_norm | 36.7 | 35.3 | 35.4 | 35.6 |
| ARC-Challenge | acc_norm | 21.2 | 23.2 | 22.2 | 22.4 |
| PIQA | acc | 54.4 | 53.8 | 55.5 | 56.0 |
| ArithMark-2 | acc | 26.4 | 25.8 | 28.4 | 26.4 |
| LogicMark | acc | 43.3 | 48.5 | 44.8 | 44.8 |
| SciQ | acc | 67.4 | β | 67.8 | 67.5 |
| Winogrande | acc | 50.4 | β | 49.4 | 49.8 |
| **Board avg (Γ·4)** | | **35.41** | 35.04 | 35.70 | 35.80 |
(Numbers are the final DPO'd model. The pre-DPO fold scored 35.53; DPO held the board at 35.41, a noise-level
change, while fixing the restraint.)
Board 35.41, level with the base (v6 35.80) and above Chat. Recurrent depth doesn't move the board; that's
expected. What changed is behaviour, which the board can't see:
- **Arithmetic is accurate**, 4-5 of 6 on held-out single-step problems (`5+9=14`, `7Γ6=42`, `40β13=27`),
one step, stops cleanly. The earlier version mis-computed and over-reasoned.
- **Word problems translate**, "Sara has 12 apples and buys 7 more" β it sets up `12 + 7` and solves it.
- **Sometimes answers directly**, "capital of France β Paris", "opposite of hot β cold", no `<think>`.
**The restraint fix (DPO).** The fold alone left restraint unstable, it opened a `<think>` and did arithmetic
on ~half of non-math prompts (the 8% answer-only data couldn't settle it). A final DPO pass on synthesized,
verifiable preference pairs fixed it: *mode* pairs (non-math β direct answer β» spurious `<think>` math) and
*process* pairs (correct concise chain β» wrong/over-reasoned). LR 5e-7, Ξ² 0.1, 1 epoch, KL-leashed to the
frozen fold checkpoint. Result: **math-on-non-math dropped from ~4/8 to ~1/8**, board unchanged (35.53 β 35.41).
DPO steered the *behaviour* it had; it did not fix the residual 2-digit arithmetic slips (e.g. 25β9), which are
a capability limit, not a preference one, that needs more/harder arithmetic data, not preference tuning.

The arithmetic-compute slips on harder problems (multi-digit carry) remain the honest weak point.
## Usage
```python
ctx = f"<|im_start|>user\n{question}<|im_end|>\n<|im_start|>assistant\n"
# greedy, NO repetition penalty (it breaks the <think> format) ; stop on <|im_end|>
```
Load at `recurse=2`. It emits `<think>` reasoning then the answer for math, and often answers directly for
simple facts. Trade quality for speed by lowering `recurse` at inference.
## Limitations
- ~10M params, English only, research/education. Not for production, facts, or advice.
- DPO fixed most of the over-reasoning, but it still opens a `<think>` on roughly 1 in 8 non-math prompts.
- Thin world knowledge. It answers directly now, but can be wrong on the fact itself.
- Arithmetic is reliable on simple problems and slips on harder multi-digit ones.
- No safety alignment.
## License
Weights open. Data under the respective dataset licenses (smol-smoltalk, GSM8K, Cosmopedia, dolmino-mix
ODC-By, AllenAI QA sets, FineMath).
|