| --- |
| license: mit |
| datasets: |
| - wikimedia/wikipedia |
| - AxionLab-official/ThinkSet-PTBR |
| language: |
| - pt |
| pipeline_tag: text-generation |
| library_name: transformers |
| --- |
| # 🧠 NanoThink-5M |
|
|
| > A 5M parameter language model trained from scratch on portuguese and thinking dataset to simulate structured reasoning. |
|
|
| --- |
|
|
| ## 🚀 Overview |
|
|
| **NanoThink-5M** is an ultra-lightweight (~5M parameters) transformer model designed to explore the limits of **reasoning behavior in small-scale neural networks**. |
|
|
| Built entirely from scratch, it runs efficiently on CPU and focuses on generating structured reasoning outputs in Portuguese. |
|
|
| --- |
|
|
| ## 💡 Key Idea |
|
|
| > How far can a tiny model go in *simulating reasoning*? |
|
|
| NanoThink-5M does not truly reason — instead, it learns to **imitate reasoning patterns** through structured training. |
|
|
| --- |
|
|
| ## 🧠 Capabilities |
|
|
| * Generates step-by-step reasoning (`<THINK>`) |
| * Produces structured answers (`<ANSWER>`) |
| * Handles simple arithmetic and logic patterns |
| * Fully CPU-compatible |
|
|
| --- |
|
|
| ## ⚙️ Model Details |
|
|
| * Architecture: Causal Transformer (GPT-style) |
| * Parameters: ~5M |
| * Layers: 4 |
| * Heads: 4 |
| * Embedding size: 128 |
| * Context length: 256 tokens |
|
|
| --- |
|
|
| ## 🏗️ Training Pipeline |
|
|
| ### 1. Tokenizer |
|
|
| Custom tokenizer trained from scratch. |
|
|
| ### 2. Pretraining |
|
|
| * Portuguese text corpus |
| * Language modeling objective |
|
|
| ### 3. Fine-tuning |
|
|
| * Synthetic reasoning dataset |
| * Tasks include: |
|
|
| * Arithmetic |
| * Logical comparisons |
| * Multi-step problems |
|
|
| Structured format: |
|
|
| ```text |
| <USER> ... <\USER> |
| <THINK> ... <\THINK> |
| <ANSWER> ... <\ANSWER> |
| <END> |
| ``` |
|
|
| --- |
|
|
| ## 📊 Example |
|
|
| **Input:** |
|
|
| ```text |
| João tem 3 maçãs e ganhou 2, quantas ele tem agora? |
| ``` |
|
|
| **Output:** |
|
|
| ```text |
| <THINK> |
| 3 + 2 = 5 |
| </THINK> |
| <ANSWER> |
| João tem 5 maçãs. |
| </ANSWER> |
| ``` |
|
|
| --- |
|
|
| ## ⚠️ Limitations |
|
|
| * Not reliable for precise mathematical reasoning |
| * May generate inconsistent intermediate steps |
| * Reasoning is **simulated, not grounded** |
|
|
| > This model demonstrates *the appearance of reasoning*, not true reasoning. |
|
|
| --- |
|
|
| ## 🧪 Research Insight |
|
|
| NanoThink-5M highlights an important phenomenon: |
|
|
| > Small models can learn to **look intelligent before being intelligent**. |
|
|
| This reinforces the distinction between: |
|
|
| * Simulated reasoning |
| * Actual reasoning |
|
|
| --- |
|
|
| ## 💻 Usage |
|
|
| ```python |
| import torch |
| from tokenizers import Tokenizer |
| from model import NanoThink |
| from safetensors.torch import load_file |
| |
| MODEL_PATH = "model.safetensors" |
| TOKENIZER_PATH = "tokenizer.json" |
| |
| |
| tokenizer = Tokenizer.from_file(TOKENIZER_PATH) |
| |
| model = NanoThink(vocab_size=tokenizer.get_vocab_size()) |
| model.load_state_dict(load_file(MODEL_PATH)) |
| model.eval() |
| |
| history = "" |
| |
| while True: |
| user_input = input("You: ") |
| |
| if user_input.lower() in ["get out", "exit", "quit"]: |
| break |
| |
| prompt = history + f"\n<USER>\n{user_input}\n</USER>\n" |
| |
| input_ids = torch.tensor([tokenizer.encode(prompt).ids]) |
| |
| output_ids = [] |
| |
| for _ in range(120): |
| logits = model(input_ids) |
| next_token = torch.multinomial(torch.softmax(logits[0, -1], dim=-1), 1).item() |
| |
| input_ids = torch.cat([input_ids, torch.tensor([[next_token]])], dim=1) |
| output_ids.append(next_token) |
| |
| text = tokenizer.decode(output_ids) |
| |
| if "</ANSWER>" in text: |
| break |
| |
| output = tokenizer.decode(output_ids) |
| |
| |
| if "<ANSWER>" in output: |
| output = output.split("<ANSWER>")[1].split("</ANSWER>")[0] |
| |
| print("\n💬 Answer:") |
| print(output.strip()) |
| print("\n" + "-"*50 + "\n") |
| |
| history += f"\n<USER>\n{user_input}\n</USER>\n<ANSWER>\n{output.strip()}\n</ANSWER>\n" |
| ``` |
|
|
| --- |
|
|
| ## 🔮 Future Work |
|
|
| * Scaling to 10M–50M parameters |
| * Improving dataset quality and training techniques |
| * Enhancing reasoning consistency |
| * Multilingual support |
|
|
|
|
| --- |
|
|
| ## 🤝 Contributions |
|
|
| This is an experimental project, contributions and ideas are welcome. |
|
|
| --- |
|
|
| ## 📜 License |
|
|
| MIT |
|
|
| --- |
|
|
| ## 🧠 Author |
|
|
| AxionLab Co. |
|
|
| Independent research project exploring the limits of small language models. |
|
|
| --- |
|
|
| ## ⭐ Final Thought |
|
|
| > Intelligence can be mimicked at small scale — but not yet achieved. |
|
|
| NanoThink-5M is a step toward understanding that boundary. |