Text Generation
Transformers
Safetensors
Portuguese
nanothink
NanoThink-5M / README.md
AxionLab-official's picture
Update README.md
b73274c verified
---
license: mit
datasets:
- wikimedia/wikipedia
- AxionLab-official/ThinkSet-PTBR
language:
- pt
pipeline_tag: text-generation
library_name: transformers
---
# 🧠 NanoThink-5M
> A 5M parameter language model trained from scratch on portuguese and thinking dataset to simulate structured reasoning.
---
## 🚀 Overview
**NanoThink-5M** is an ultra-lightweight (~5M parameters) transformer model designed to explore the limits of **reasoning behavior in small-scale neural networks**.
Built entirely from scratch, it runs efficiently on CPU and focuses on generating structured reasoning outputs in Portuguese.
---
## 💡 Key Idea
> How far can a tiny model go in *simulating reasoning*?
NanoThink-5M does not truly reason — instead, it learns to **imitate reasoning patterns** through structured training.
---
## 🧠 Capabilities
* Generates step-by-step reasoning (`<THINK>`)
* Produces structured answers (`<ANSWER>`)
* Handles simple arithmetic and logic patterns
* Fully CPU-compatible
---
## ⚙️ Model Details
* Architecture: Causal Transformer (GPT-style)
* Parameters: ~5M
* Layers: 4
* Heads: 4
* Embedding size: 128
* Context length: 256 tokens
---
## 🏗️ Training Pipeline
### 1. Tokenizer
Custom tokenizer trained from scratch.
### 2. Pretraining
* Portuguese text corpus
* Language modeling objective
### 3. Fine-tuning
* Synthetic reasoning dataset
* Tasks include:
* Arithmetic
* Logical comparisons
* Multi-step problems
Structured format:
```text
<USER> ... <\USER>
<THINK> ... <\THINK>
<ANSWER> ... <\ANSWER>
<END>
```
---
## 📊 Example
**Input:**
```text
João tem 3 maçãs e ganhou 2, quantas ele tem agora?
```
**Output:**
```text
<THINK>
3 + 2 = 5
</THINK>
<ANSWER>
João tem 5 maçãs.
</ANSWER>
```
---
## ⚠️ Limitations
* Not reliable for precise mathematical reasoning
* May generate inconsistent intermediate steps
* Reasoning is **simulated, not grounded**
> This model demonstrates *the appearance of reasoning*, not true reasoning.
---
## 🧪 Research Insight
NanoThink-5M highlights an important phenomenon:
> Small models can learn to **look intelligent before being intelligent**.
This reinforces the distinction between:
* Simulated reasoning
* Actual reasoning
---
## 💻 Usage
```python
import torch
from tokenizers import Tokenizer
from model import NanoThink
from safetensors.torch import load_file
MODEL_PATH = "model.safetensors"
TOKENIZER_PATH = "tokenizer.json"
tokenizer = Tokenizer.from_file(TOKENIZER_PATH)
model = NanoThink(vocab_size=tokenizer.get_vocab_size())
model.load_state_dict(load_file(MODEL_PATH))
model.eval()
history = ""
while True:
user_input = input("You: ")
if user_input.lower() in ["get out", "exit", "quit"]:
break
prompt = history + f"\n<USER>\n{user_input}\n</USER>\n"
input_ids = torch.tensor([tokenizer.encode(prompt).ids])
output_ids = []
for _ in range(120):
logits = model(input_ids)
next_token = torch.multinomial(torch.softmax(logits[0, -1], dim=-1), 1).item()
input_ids = torch.cat([input_ids, torch.tensor([[next_token]])], dim=1)
output_ids.append(next_token)
text = tokenizer.decode(output_ids)
if "</ANSWER>" in text:
break
output = tokenizer.decode(output_ids)
if "<ANSWER>" in output:
output = output.split("<ANSWER>")[1].split("</ANSWER>")[0]
print("\n💬 Answer:")
print(output.strip())
print("\n" + "-"*50 + "\n")
history += f"\n<USER>\n{user_input}\n</USER>\n<ANSWER>\n{output.strip()}\n</ANSWER>\n"
```
---
## 🔮 Future Work
* Scaling to 10M–50M parameters
* Improving dataset quality and training techniques
* Enhancing reasoning consistency
* Multilingual support
---
## 🤝 Contributions
This is an experimental project, contributions and ideas are welcome.
---
## 📜 License
MIT
---
## 🧠 Author
AxionLab Co.
Independent research project exploring the limits of small language models.
---
## ⭐ Final Thought
> Intelligence can be mimicked at small scale — but not yet achieved.
NanoThink-5M is a step toward understanding that boundary.