--- license: mit datasets: - wikimedia/wikipedia - AxionLab-official/ThinkSet-PTBR language: - pt pipeline_tag: text-generation library_name: transformers --- # 🧠 NanoThink-5M > A 5M parameter language model trained from scratch on portuguese and thinking dataset to simulate structured reasoning. --- ## 🚀 Overview **NanoThink-5M** is an ultra-lightweight (~5M parameters) transformer model designed to explore the limits of **reasoning behavior in small-scale neural networks**. Built entirely from scratch, it runs efficiently on CPU and focuses on generating structured reasoning outputs in Portuguese. --- ## 💡 Key Idea > How far can a tiny model go in *simulating reasoning*? NanoThink-5M does not truly reason — instead, it learns to **imitate reasoning patterns** through structured training. --- ## 🧠 Capabilities * Generates step-by-step reasoning (``) * Produces structured answers (``) * Handles simple arithmetic and logic patterns * Fully CPU-compatible --- ## ⚙️ Model Details * Architecture: Causal Transformer (GPT-style) * Parameters: ~5M * Layers: 4 * Heads: 4 * Embedding size: 128 * Context length: 256 tokens --- ## 🏗️ Training Pipeline ### 1. Tokenizer Custom tokenizer trained from scratch. ### 2. Pretraining * Portuguese text corpus * Language modeling objective ### 3. Fine-tuning * Synthetic reasoning dataset * Tasks include: * Arithmetic * Logical comparisons * Multi-step problems Structured format: ```text ... <\USER> ... <\THINK> ... <\ANSWER> ``` --- ## 📊 Example **Input:** ```text João tem 3 maçãs e ganhou 2, quantas ele tem agora? ``` **Output:** ```text 3 + 2 = 5 João tem 5 maçãs. ``` --- ## ⚠️ Limitations * Not reliable for precise mathematical reasoning * May generate inconsistent intermediate steps * Reasoning is **simulated, not grounded** > This model demonstrates *the appearance of reasoning*, not true reasoning. --- ## 🧪 Research Insight NanoThink-5M highlights an important phenomenon: > Small models can learn to **look intelligent before being intelligent**. This reinforces the distinction between: * Simulated reasoning * Actual reasoning --- ## 💻 Usage ```python import torch from tokenizers import Tokenizer from model import NanoThink from safetensors.torch import load_file MODEL_PATH = "model.safetensors" TOKENIZER_PATH = "tokenizer.json" tokenizer = Tokenizer.from_file(TOKENIZER_PATH) model = NanoThink(vocab_size=tokenizer.get_vocab_size()) model.load_state_dict(load_file(MODEL_PATH)) model.eval() history = "" while True: user_input = input("You: ") if user_input.lower() in ["get out", "exit", "quit"]: break prompt = history + f"\n\n{user_input}\n\n" input_ids = torch.tensor([tokenizer.encode(prompt).ids]) output_ids = [] for _ in range(120): logits = model(input_ids) next_token = torch.multinomial(torch.softmax(logits[0, -1], dim=-1), 1).item() input_ids = torch.cat([input_ids, torch.tensor([[next_token]])], dim=1) output_ids.append(next_token) text = tokenizer.decode(output_ids) if "" in text: break output = tokenizer.decode(output_ids) if "" in output: output = output.split("")[1].split("")[0] print("\n💬 Answer:") print(output.strip()) print("\n" + "-"*50 + "\n") history += f"\n\n{user_input}\n\n\n{output.strip()}\n\n" ``` --- ## 🔮 Future Work * Scaling to 10M–50M parameters * Improving dataset quality and training techniques * Enhancing reasoning consistency * Multilingual support --- ## 🤝 Contributions This is an experimental project, contributions and ideas are welcome. --- ## 📜 License MIT --- ## 🧠 Author AxionLab Co. Independent research project exploring the limits of small language models. --- ## ⭐ Final Thought > Intelligence can be mimicked at small scale — but not yet achieved. NanoThink-5M is a step toward understanding that boundary.