AxionLab-Co
/

NanoThink-5M

+---
+license: mit
+datasets:
+- wikimedia/wikipedia
+language:
+- pt
+pipeline_tag: text-generation
+library_name: transformers
+---
+# 🧠 NanoThink-5M
+> A 5M parameter language model trained from scratch on portuguese and thinking dataset to simulate structured reasoning.
+---
+## 🚀 Overview
+**NanoThink-5M** is an ultra-lightweight (~5M parameters) transformer model designed to explore the limits of **reasoning behavior in small-scale neural networks**.
+Built entirely from scratch, it runs efficiently on CPU and focuses on generating structured reasoning outputs in Portuguese.
+---
+## 💡 Key Idea
+> How far can a tiny model go in *simulating reasoning*?
+NanoThink-5M does not truly reason — instead, it learns to **imitate reasoning patterns** through structured training.
+---
+## 🧠 Capabilities
+* Generates step-by-step reasoning (`<THINK>`)
+* Produces structured answers (`<ANSWER>`)
+* Handles simple arithmetic and logic patterns
+* Fully CPU-compatible
+---
+## ⚙️ Model Details
+* Architecture: Causal Transformer (GPT-style)
+* Parameters: ~5M
+* Layers: 4
+* Heads: 4
+* Embedding size: 128
+* Context length: 256 tokens
+---
+## 🏗️ Training Pipeline
+### 1. Tokenizer
+Custom tokenizer trained from scratch.
+### 2. Pretraining
+* Portuguese text corpus
+* Language modeling objective
+### 3. Fine-tuning
+* Synthetic reasoning dataset
+* Tasks include:
+  * Arithmetic
+  * Logical comparisons
+  * Multi-step problems
+Structured format:
+```text
+<USER> ... <\USER>
+<THINK> ... <\THINK>
+<ANSWER> ... <\ANSWER>
+<END>
+```
+---
+## 📊 Example
+**Input:**
+```text
+João tem 3 maçãs e ganhou 2, quantas ele tem agora?
+```
+**Output:**
+```text
+<THINK>
+3 + 2 = 5
+</THINK>
+<ANSWER>
+João has 5 apples.
+</ANSWER>
+```
+---
+## ⚠️ Limitations
+* Not reliable for precise mathematical reasoning
+* May generate inconsistent intermediate steps
+* Reasoning is **simulated, not grounded**
+> This model demonstrates *the appearance of reasoning*, not true reasoning.
+---
+## 🧪 Research Insight
+NanoThink-5M highlights an important phenomenon:
+> Small models can learn to **look intelligent before being intelligent**.
+This reinforces the distinction between:
+* Simulated reasoning
+* Actual reasoning
+---
+## 💻 Usage
+```python
+import torch
+from safetensors.torch import load_file
+from model import NanoThink
+from tokenizers import Tokenizer
+tokenizer = Tokenizer.from_file("tokenizer.json")
+model = NanoThink(vocab_size=1229)
+state_dict = load_file("model.safetensors")
+model.load_state_dict(state_dict)
+model.eval()
+```
+---
+## 🔮 Future Work
+* Scaling to 10M–50M parameters
+* Improving dataset quality
+* Enhancing reasoning consistency
+* Multilingual support
+---
+## 🤝 Contributions
+This is an experimental project — contributions and ideas are welcome.
+---
+## 📜 License
+MIT
+---
+## 🧠 Author
+AxionLab Co.
+Independent research project exploring the limits of small language models.
+---
+## ⭐ Final Thought
+> Intelligence can be mimicked at small scale — but not yet achieved.
+NanoThink-5M is a step toward understanding that boundary.