Instructions to use bfuzzy1/Rodan-Chat with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use bfuzzy1/Rodan-Chat with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # if on a CUDA device, also pip install mlx[cuda] # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("bfuzzy1/Rodan-Chat") prompt = "Once upon a time in" text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- MLX LM
How to use bfuzzy1/Rodan-Chat with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Generate some text mlx_lm.generate --model "bfuzzy1/Rodan-Chat" --prompt "Once upon a time"
| license: apache-2.0 | |
| language: | |
| - en | |
| library_name: mlx | |
| pipeline_tag: text-generation | |
| tags: | |
| - rodan | |
| - tiny-language-model | |
| - mlx | |
| - chat | |
| - chatml | |
| base_model: bfuzzy1/Rodan-Base | |
| # Rodan-10M-Chat | |
| A 10.41M-parameter chat model trained on one Apple M2 with MLX. It starts from the leaner Rodan-10M v9 base | |
| (PLE-free) and folds conversational data into a short continued-pretrain stage, which adds chat ability | |
| without throwing away what the base already knew. | |
| > Part of the Rodan-10M series. The base model (`Rodan-10M-Base`, v6) lives in its own repo; this is the | |
| > chat stage built on the v9 checkpoint. | |
| ## What it is | |
| A ~10M chat model. With the right decoding settings it answers simple instructions coherently and stays on | |
| topic, which is more than I expected from something this small. It is not a production assistant. Its world | |
| knowledge is thin, its arithmetic is shaky, and it will make things up. Treat it as a demo of how much chat | |
| ability you can fit in 10M params, not as something to rely on. | |
| ``` | |
| USER: What is the capital of France? | |
| RODAN: The capital of France is Paris. It's a city known as the country, which has its own unique | |
| and cultural significance in Europe. | |
| USER: Give me a tip for staying healthy. | |
| RODAN: You can use a regular exercise routine to manage your body and improve overall well-being. | |
| It's important ... maintaining healthy weight management by burning calories. | |
| USER: Why do we sleep? | |
| RODAN: Sleep can be used to improve cognitive function and memory. | |
| ``` | |
| ## How it was built | |
| At 10M params, ordinary supervised finetuning costs you base capability. We saw it in earlier Rodan | |
| experiments: a masked-SFT stage dropped board avg from about 34 to 32, the SFT tax. So instead of a separate | |
| masked-SFT stage, Rodan-Chat folds the instruction data into | |
| a continued-pretrain run mixed with 45% replay of the base's own domains (the approach Falcon used). The | |
| replay is what keeps the model from forgetting. Chat ability gets added while commonsense, science, and | |
| arithmetic stay roughly where they were. | |
| - Warm-start: Rodan-10M v9 (PLE-free, 10.41M). The tied embedding grows 8192→8194 for 2 ChatML tokens. | |
| - Data (73M tokens): 40M smol-smoltalk conversations in ChatML, plus 33M curated replay, full-sequence LM loss. | |
| - Optimizer: Muon on the 2D weights, AdamW elsewhere, low LR (1.2e-3, Muon 7e-3, below the base run), cosine, 6000 steps. | |
| - Result: perplexity dropped 24.9 → 14.6, and the base board avg held at 35.04. | |
| | Source | Share | Role | | |
| |---|---|---| | |
| | smol-smoltalk (ChatML) | 55% | instruction / multi-turn chat | | |
| | Cosmopedia (replay) | 9% | commonsense anchor | | |
| | dolmino pes2o + StackExchange (replay) | 9% | knowledge anchor | | |
| | synthetic arithmetic (replay) | 9% | computation anchor | | |
| | FineMath (replay) | 9% | math anchor | | |
| | science-QA (replay) | 9% | science-MC anchor | | |
|  | |
| ## Architecture | |
| Same as the base: decoder-only, dim 320, 8 layers, 8 heads, MQA with 1 KV head, SwiGLU 768, RMSNorm, RoPE | |
| base 200k, QK-norm, tied embeddings, value-residual, LRM. No PLE, since the probe on the base showed it was | |
| dead. Vocab is 8194 (the 8k byte-BPE set plus `<|im_start|>` and `<|im_end|>`). | |
| ## Evaluation | |
| The base capability held; there was no SFT-tax collapse. Zero-shot lm-eval, limit 1000, ChatML-wrapped: | |
| | Task | Metric | Rodan-Chat | v9 base | Δ | | |
| |---|---|---|---|---| | |
| | HellaSwag | acc_norm | 31.7 | 30.1 | +1.6 | | |
| | ARC-Easy | acc_norm | 35.3 | 35.4 | ≈ | | |
| | ARC-Challenge | acc_norm | 22.4 | 22.2 | ≈ | | |
| | PIQA | acc | 53.8 | 55.5 | −1.7 | | |
| | ArithMark-2 | acc | 25.8 | 28.4 | −2.6 | | |
| | **Board avg (÷4)** | | **35.04** | 35.70 | −0.66 | | |
| The 0.66 dip is partly just the ChatML wrapper hurting multiple-choice loglikelihood, and it's nowhere near | |
| the 34→32 drop a naive finetune would have caused. The replay did its job. | |
| For instruction following itself, IFEval is close to useless at 10M: it grades strict constraint compliance, | |
| which really needs a model two or three orders of magnitude larger. So we measured the thing we actually care | |
| about instead. On 24 instruction prompts, an LLM judge compared Rodan-Chat against the v9 base, both decoded | |
| with the same repetition penalty. Chat won 14, tied 9, and lost 1, for a 93% win-rate excluding ties. The | |
| base tended to lose by sliding into code or rambling, while Chat gave coherent on-topic answers, several of | |
| them correct (Paris, photosynthesis producing glucose, the opposite of hot being cold, sleep helping memory). | |
|  | |
| We skipped a full IFEval score on purpose. It grades strict format compliance, which a 10M model fails | |
| near-uniformly, so the number carries no signal and isn't worth the long generative eval. The win-rate above | |
| is the instruction-following metric we trust at this scale. | |
| ## Usage | |
| Wrap prompts in ChatML and decode with a repetition penalty. Tiny models loop badly under pure greedy | |
| decoding, and the penalty is the difference between gibberish and readable answers. | |
| ```python | |
| ctx = f"<|im_start|>user\n{question}<|im_end|>\n<|im_start|>assistant\n" | |
| # greedy + repetition_penalty 1.3 + no-repeat-3gram ; stop on <|im_end|> (8193) or <|endoftext|> (0) | |
| ``` | |
| The settings I'd recommend: greedy, `repetition_penalty=1.3`, `no_repeat_ngram=3`, `max_new≈70`, low or zero | |
| temperature. | |
| ## Limitations | |
| - ~10M params, English only, for research and teaching. Don't use it in production, for factual queries, or for advice. | |
| - Thin world knowledge, weak arithmetic, prone to making things up, near chance on abstract reasoning. | |
| - It needs a repetition penalty to stay coherent; pure greedy decoding loops. | |
| - No safety alignment. It imitates the shape of a chat reply without being a reliable assistant. | |
| ## License | |
| Weights are open. Data falls under the respective dataset licenses (smol-smoltalk, Cosmopedia, dolmino-mix | |
| ODC-By, AllenAI QA sets, FineMath). | |