Text Generation
MLX
Safetensors
English
rodan-modern
rodan
tiny-language-model
apple-silicon
byte-bpe
Instructions to use bfuzzy1/Rodan-Base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use bfuzzy1/Rodan-Base with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # if on a CUDA device, also pip install mlx[cuda] # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("bfuzzy1/Rodan-Base") prompt = "Once upon a time in" text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- MLX LM
How to use bfuzzy1/Rodan-Base with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Generate some text mlx_lm.generate --model "bfuzzy1/Rodan-Base" --prompt "Once upon a time"
Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -106,8 +106,8 @@ Intelligence per parameter (board avg vs log-params; the shaded region is above
|
|
| 106 |

|
| 107 |
|
| 108 |
The fit runs over the board models, with a residual σ of about 3.07 that matches the board's own. Rodan v6
|
| 109 |
-
sits roughly +0.3σ above the size-fit line — above-trend per parameter, ahead of liodon
|
| 110 |
-
|
| 111 |
models, which train on about 25B.
|
| 112 |
|
| 113 |
Training loss and data mix, v6 vs v9:
|
|
@@ -118,11 +118,10 @@ v9 starts from v6, drops the dead PLE down to 10.41M, and trains on the pure-cur
|
|
| 118 |
tie: board avg 35.70 against v6's 35.80, a 0.10 gap that's well inside the noise, at 9% fewer parameters. It
|
| 119 |
gave up about 1.7 points of HellaSwag and picked up 2.0 on ArithMark (28.4, the folded arithmetic finally
|
| 120 |
showing), and the per-param number came out about even too (~+0.32σ vs v6's +0.31σ). Two conclusions fall
|
| 121 |
-
out of that. PLE really was dead weight, since cutting 1.05M params changed nothing.
|
| 122 |
-
|
| 123 |
-
|
| 124 |
-
|
| 125 |
-
1/50th of what the leaders use.
|
| 126 |
|
| 127 |
## Evaluation
|
| 128 |
|
|
@@ -162,14 +161,14 @@ For context — at 11.46M it's just over the 10M line, but it outscores the sub-
|
|
| 162 |
|
| 163 |
v6 sits above the size-fit line (~+0.3σ) — above-trend per parameter, ahead of liodon. The v9 challenger
|
| 164 |
(PLE-free, 10.41M, pure-curated) tied it: 35.70 board avg at 9% fewer params, about even on per-param too.
|
| 165 |
-
v9 confirmed
|
| 166 |
-
|
| 167 |
|
| 168 |
What the model is actually like: it holds up well for 11M on commonsense and science multiple-choice. SciQ
|
| 169 |
(67.5), PIQA (56.0), ARC-Easy (35.6), HellaSwag (31.8), and COPA (55.0) are all clearly above random. Arithmetic has crept off the random floor (ArithMark 26.4) thanks to the folded-in computation
|
| 170 |
data, though it's a modest lift and actually generating arithmetic is still weak. On the harder abstract
|
| 171 |
reasoning tasks (Winogrande, CommonsenseQA, ARC-Challenge, OpenBookQA) and on open-ended generation it's near
|
| 172 |
-
chance, partly
|
| 173 |
discrimination; the deeper reasoning is the job of the separate Chat and Reasoning models.
|
| 174 |
|
| 175 |
## Limitations
|
|
|
|
| 106 |

|
| 107 |
|
| 108 |
The fit runs over the board models, with a residual σ of about 3.07 that matches the board's own. Rodan v6
|
| 109 |
+
sits roughly +0.3σ above the size-fit line — above-trend per parameter, ahead of liodon and the other
|
| 110 |
+
similar-size models that fall below the line. It does this on roughly 1/65th the tokens of the leading
|
| 111 |
models, which train on about 25B.
|
| 112 |
|
| 113 |
Training loss and data mix, v6 vs v9:
|
|
|
|
| 118 |
tie: board avg 35.70 against v6's 35.80, a 0.10 gap that's well inside the noise, at 9% fewer parameters. It
|
| 119 |
gave up about 1.7 points of HellaSwag and picked up 2.0 on ArithMark (28.4, the folded arithmetic finally
|
| 120 |
showing), and the per-param number came out about even too (~+0.32σ vs v6's +0.31σ). Two conclusions fall
|
| 121 |
+
out of that. PLE really was dead weight, since cutting 1.05M params changed nothing. Across the variants we
|
| 122 |
+
ran, the board avg stayed near 35.8 — raw web lowered it, the leaner pure-curated mix matched v6 — so none of
|
| 123 |
+
them beat the base, and v6 stays the packaged checkpoint. Unique tokens stay around 0.5B the whole way, a
|
| 124 |
+
small fraction of what the leading models use, so there is likely more to gain from additional curated tokens.
|
|
|
|
| 125 |
|
| 126 |
## Evaluation
|
| 127 |
|
|
|
|
| 161 |
|
| 162 |
v6 sits above the size-fit line (~+0.3σ) — above-trend per parameter, ahead of liodon. The v9 challenger
|
| 163 |
(PLE-free, 10.41M, pure-curated) tied it: 35.70 board avg at 9% fewer params, about even on per-param too.
|
| 164 |
+
v9 confirmed that PLE was dead weight, but since it didn't beat v6's board score, v6 stays the base. From
|
| 165 |
+
here the work moved to the capability stages (chat, reasoning).
|
| 166 |
|
| 167 |
What the model is actually like: it holds up well for 11M on commonsense and science multiple-choice. SciQ
|
| 168 |
(67.5), PIQA (56.0), ARC-Easy (35.6), HellaSwag (31.8), and COPA (55.0) are all clearly above random. Arithmetic has crept off the random floor (ArithMark 26.4) thanks to the folded-in computation
|
| 169 |
data, though it's a modest lift and actually generating arithmetic is still weak. On the harder abstract
|
| 170 |
reasoning tasks (Winogrande, CommonsenseQA, ARC-Challenge, OpenBookQA) and on open-ended generation it's near
|
| 171 |
+
chance, partly the limited capacity at this size and partly loglikelihood length-bias. It's a solid base for
|
| 172 |
discrimination; the deeper reasoning is the job of the separate Chat and Reasoning models.
|
| 173 |
|
| 174 |
## Limitations
|