Text Generation
MLX
Safetensors
English
rodan-modern
rodan
tiny-language-model
apple-silicon
byte-bpe
Instructions to use bfuzzy1/Rodan-Base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use bfuzzy1/Rodan-Base with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # if on a CUDA device, also pip install mlx[cuda] # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("bfuzzy1/Rodan-Base") prompt = "Once upon a time in" text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- MLX LM
How to use bfuzzy1/Rodan-Base with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Generate some text mlx_lm.generate --model "bfuzzy1/Rodan-Base" --prompt "Once upon a time"
Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -22,10 +22,9 @@ that actually holds up for its size, scored on how much it gets per parameter ra
|
|
| 22 |
| **Rodan-10M-Base** | pretraining | foundation: commonsense + knowledge |
|
| 23 |
| Rodan-10M-Chat *(released)* | instruction fold | chat / instruction following |
|
| 24 |
| Rodan-10M-Reasoning *(released)* | recursive depth + CoT fold + DPO | verifiable math + reasoning |
|
| 25 |
-
| Rodan-10M-Latent *(planned)* | latent reasoning | in-head compute, no CoT tokens |
|
| 26 |
|
| 27 |
-
This card covers the base model only. The chat
|
| 28 |
-
|
| 29 |
|
| 30 |
## Architecture
|
| 31 |
|
|
@@ -106,9 +105,9 @@ Intelligence per parameter (board avg vs log-params; the shaded region is above
|
|
| 106 |
|
| 107 |

|
| 108 |
|
| 109 |
-
The fit runs over
|
| 110 |
-
sits +0.
|
| 111 |
-
|
| 112 |
models, which train on about 25B.
|
| 113 |
|
| 114 |
Training loss and data mix, v6 vs v9:
|
|
@@ -130,7 +129,9 @@ capability stages rather than more base pretraining. Unique tokens stay around 0
|
|
| 130 |
Zero-shot through lm-eval-harness, with a custom MLX backend for `loglikelihood`. We use acc_norm for the
|
| 131 |
length-sensitive multiple-choice tasks (HellaSwag, ARC, OpenBookQA) and plain acc otherwise.
|
| 132 |
|
| 133 |
-
|
|
|
|
|
|
|
| 134 |
|
| 135 |
| Task | Metric | Score | Random |
|
| 136 |
|---|---|---|---|
|
|
@@ -148,7 +149,8 @@ Zero-shot, limit 1000 examples per task. Board avg = (HellaSwag + (ARC-E + ARC-C
|
|
| 148 |
| CommonsenseQA | acc | 20.7 | 20 |
|
| 149 |
| **Board avg (÷4)** | | **35.80** | |
|
| 150 |
|
| 151 |
-
For context
|
|
|
|
| 152 |
|
| 153 |
| Model | Params | Tokens | Board avg (÷4) |
|
| 154 |
|---|---|---|---|
|
|
@@ -158,10 +160,10 @@ For context, it beats the <10M leader on about 1/65th the tokens:
|
|
| 158 |
|
| 159 |

|
| 160 |
|
| 161 |
-
v6
|
| 162 |
-
|
| 163 |
-
|
| 164 |
-
|
| 165 |
|
| 166 |
What the model is actually like: it holds up well for 11M on commonsense and science multiple-choice. SciQ
|
| 167 |
(67.5) beats GPT-2-124M, and PIQA (56.0), ARC-Easy (35.6), HellaSwag (31.8), and COPA (55.0) are all clearly
|
|
@@ -169,7 +171,7 @@ above random. Arithmetic has crept off the random floor (ArithMark 26.4) thanks
|
|
| 169 |
data, though it's a modest lift and actually generating arithmetic is still weak. On the harder abstract
|
| 170 |
reasoning tasks (Winogrande, CommonsenseQA, ARC-Challenge, OpenBookQA) and on open-ended generation it's near
|
| 171 |
chance, partly a capacity ceiling at this size and partly loglikelihood length-bias. It's a solid base for
|
| 172 |
-
discrimination; the deeper reasoning is the job of the
|
| 173 |
|
| 174 |
## Limitations
|
| 175 |
|
|
|
|
| 22 |
| **Rodan-10M-Base** | pretraining | foundation: commonsense + knowledge |
|
| 23 |
| Rodan-10M-Chat *(released)* | instruction fold | chat / instruction following |
|
| 24 |
| Rodan-10M-Reasoning *(released)* | recursive depth + CoT fold + DPO | verifiable math + reasoning |
|
|
|
|
| 25 |
|
| 26 |
+
This card covers the base model only. The chat and reasoning stages are separate models with their own
|
| 27 |
+
repos and cards.
|
| 28 |
|
| 29 |
## Architecture
|
| 30 |
|
|
|
|
| 105 |
|
| 106 |

|
| 107 |
|
| 108 |
+
The fit runs over the board models, with a residual σ of about 3.07 that matches the board's own. Rodan v6
|
| 109 |
+
sits roughly +0.3σ above the size-fit line — above-trend per parameter, ahead of liodon, and well clear of
|
| 110 |
+
the per-param underachievers lower in the field. It does this on roughly 1/65th the tokens of the leading
|
| 111 |
models, which train on about 25B.
|
| 112 |
|
| 113 |
Training loss and data mix, v6 vs v9:
|
|
|
|
| 129 |
Zero-shot through lm-eval-harness, with a custom MLX backend for `loglikelihood`. We use acc_norm for the
|
| 130 |
length-sensitive multiple-choice tasks (HellaSwag, ARC, OpenBookQA) and plain acc otherwise.
|
| 131 |
|
| 132 |
+
"The board" throughout is the [Open SLM Leaderboard](https://huggingface.co/spaces/AxiomicLabs/Open_SLM_Leaderboard)
|
| 133 |
+
(AxiomicLabs, sub-150M tier). Zero-shot, limit 1000 examples per task.
|
| 134 |
+
Board avg = (HellaSwag + (ARC-E + ARC-C)/2 + PIQA + ArithMark) / 4.
|
| 135 |
|
| 136 |
| Task | Metric | Score | Random |
|
| 137 |
|---|---|---|---|
|
|
|
|
| 149 |
| CommonsenseQA | acc | 20.7 | 20 |
|
| 150 |
| **Board avg (÷4)** | | **35.80** | |
|
| 151 |
|
| 152 |
+
For context — at 11.46M it's just over the 10M line, but it outscores the sub-10M leader (liodon) on about
|
| 153 |
+
1/65th the tokens:
|
| 154 |
|
| 155 |
| Model | Params | Tokens | Board avg (÷4) |
|
| 156 |
|---|---|---|---|
|
|
|
|
| 160 |
|
| 161 |

|
| 162 |
|
| 163 |
+
v6 sits above the size-fit line (~+0.3σ) — above-trend per parameter, ahead of liodon. The v9 challenger
|
| 164 |
+
(PLE-free, 10.41M, pure-curated) tied it: 35.70 board avg at 9% fewer params, about even on per-param too.
|
| 165 |
+
v9 confirmed the ~11M ceiling and that PLE was dead weight, but since it didn't move the board, v6 stays the
|
| 166 |
+
base. From here the work moves to the capability stages (chat, reasoning).
|
| 167 |
|
| 168 |
What the model is actually like: it holds up well for 11M on commonsense and science multiple-choice. SciQ
|
| 169 |
(67.5) beats GPT-2-124M, and PIQA (56.0), ARC-Easy (35.6), HellaSwag (31.8), and COPA (55.0) are all clearly
|
|
|
|
| 171 |
data, though it's a modest lift and actually generating arithmetic is still weak. On the harder abstract
|
| 172 |
reasoning tasks (Winogrande, CommonsenseQA, ARC-Challenge, OpenBookQA) and on open-ended generation it's near
|
| 173 |
chance, partly a capacity ceiling at this size and partly loglikelihood length-bias. It's a solid base for
|
| 174 |
+
discrimination; the deeper reasoning is the job of the separate Chat and Reasoning models.
|
| 175 |
|
| 176 |
## Limitations
|
| 177 |
|