| --- |
| license: apache-2.0 |
| language: |
| - pt |
| datasets: |
| - AxionLab-Co/ThinkSet-PTBR |
| metrics: |
| - accuracy: 16.9% |
| pipeline_tag: text-generation |
| --- |
| |
| **๐ง MiniAxion1.5-3M** |
|
|
| **Emergent reasoning in a 2.7M parameter model. |
| A tiny Portuguese-first language model that learns how to think before it learns how to be correct.** |
|
|
| **๐ Overview** |
|
|
| MiniAxion1.5-3M is an ultra-compact (~2.7M parameters) GPT-style language model designed to investigate reasoning emergence at extreme small scale. |
|
|
| Unlike typical small models optimized for fluency, MiniAxion is explicitly trained to produce: |
|
|
| Structured reasoning traces |
| Step-by-step thinking (<THINK><STEP>) |
| Deterministic answer formatting |
|
|
| It operates primarily in Portuguese, making it a rare example of a non-English reasoning-first nano model. |
|
|
| **โก Why This Model Is Interesting** |
|
|
| Most models follow this trajectory: |
|
|
| Language โ Knowledge โ Reasoning |
|
|
| MiniAxion flips part of that: |
|
|
| Structure โ Reasoning format โ (still learning correctness) |
|
|
| **๐ก Key insight:** |
|
|
| The model demonstrates that reasoning structure can emerge independently of reasoning accuracy. |
|
|
| **๐งช Evaluation** |
| Task Performance |
| Task Accuracy |
| Addition 10% |
| Subtraction 10% |
| Multiplication 0% |
| Even/Odd 100% |
| Comparison 5% |
| Sequence Completion 0% |
| Word Problems (Addition) 10% |
| Word Problems (Subtraction) 0% |
| Word Problems (Multiplication) 10% |
| True/False 100% |
| Chat/Greetings 100% |
|
|
| **๐ง Reasoning Behavior Metrics** |
| Metric Score |
| Thinking Rate 100% |
| Step Format 100% |
| Answer Completion 100% |
|
|
| โ The model always thinks |
| โ The model always structures reasoning |
| โ The model always produces an answer |
|
|
| **๐ Interpretation** |
|
|
| MiniAxion exhibits a clear dissociation: |
|
|
| โ
What it learned |
| Reasoning format |
| Step-by-step decomposition |
| Logical task patterns (parity, boolean) |
| โ What it did NOT learn |
| Arithmetic correctness |
| Numerical reasoning |
| Multi-step computation |
|
|
| **๐ฌ Core Finding** |
|
|
| Reasoning โ Correctness |
|
|
| MiniAxion shows that: |
|
|
| Models can internalize thinking patterns |
| Without actually learning how to solve problems |
|
|
| This makes it a strong candidate for studying: |
|
|
| Emergent reasoning |
| Tiny Recursive Models (TRMs) |
| Reasoning distillation |
|
|
| **๐๏ธ Architecture** |
| Type: GPT-style Transformer |
| Parameters: ~2.7M |
| Objective: Next-token prediction |
| Language: Portuguese (primary) |
| Specialization: Structured reasoning traces |
|
|
| **๐ง Training Strategy** |
|
|
| The model was trained with a reasoning-first approach: |
|
|
| Portuguese language grounding |
| Structured reasoning data (<THINK><STEP>) |
| Emphasis on: |
| Deterministic formats |
| Multi-step thinking |
| Explicit reasoning tokens |
|
|
| ๐ซ No RLHF |
| ๐ซ No instruction tuning at scale |
| ๐ซ No large model distillation (yet) |
|
|
| โ ๏ธ Limitations |
| 1. Arithmetic Collapse |
|
|
| Near-random performance in: |
|
|
| Addition |
|
|
| Subtraction |
|
|
| Multiplication |
|
|
| โ Indicates lack of numerical representation learning |
|
|
| Strong dependence on: |
|
|
| Prompt format |
|
|
| Token patterns |
|
|
| Seen reasoning templates |
|
|
| **๐ฎ Future Work** |
|
|
| This model is just the beginning. |
|
|
| ๐ Scaling |
|
|
| 5M / 10M / 20M versions |
|
|
| Track emergence of correctness |
|
|
| ๐งช Distillation |
|
|
| Inject reasoning from larger models |
|
|
| Improve accuracy without scaling params |
|
|
| ๐ Self-Play / Synthetic Data |
|
|
| Generate reasoning loops |
|
|
| Reinforce correct chains |
|
|
| ๐งฉ Hybrid Reasoning |
|
|
| Combine symbolic + neural learning |
|
|
| Fix arithmetic weakness |
|
|
| ๐งพ Example Output |
|
|
| <THINK> |
| <STEP> Identifico os nรบmeros |
| <STEP> Tento somar os valores |
| <STEP> Ajusto o resultado |
| </THINK> |
| <ANSWER> 74 </ANSWER> |
|
|
| โ Perfect reasoning structure |
| โ Incorrect answer |
|
|
| **๐ก Takeaway** |
|
|
| MiniAxion1.5-3M proves something important: |
|
|
| Even a 2.7M model can learn to simulate thinking before it learns to actually think correctly. |
|
|
| **๐ค Use Cases** |
|
|
| Research on emergent reasoning |
|
|
| Tiny model experimentation (CPU-friendly) |
|
|
| Educational demos of: |
|
|
| Chain-of-Thought |
|
|
| Reasoning failure modes |
|
|
| Base model for: |
|
|
| Distillation |
|
|
| NRM experiments |