AxionLab-official commited on
Commit
8b2c269
·
verified ·
1 Parent(s): 79360eb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +178 -3
README.md CHANGED
@@ -1,3 +1,178 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - wikimedia/wikipedia
5
+ language:
6
+ - pt
7
+ pipeline_tag: text-generation
8
+ library_name: transformers
9
+ ---
10
+ # 🧠 NanoThink-5M
11
+
12
+ > A 5M parameter language model trained from scratch on portuguese and thinking dataset to simulate structured reasoning.
13
+
14
+ ---
15
+
16
+ ## 🚀 Overview
17
+
18
+ **NanoThink-5M** is an ultra-lightweight (~5M parameters) transformer model designed to explore the limits of **reasoning behavior in small-scale neural networks**.
19
+
20
+ Built entirely from scratch, it runs efficiently on CPU and focuses on generating structured reasoning outputs in Portuguese.
21
+
22
+ ---
23
+
24
+ ## 💡 Key Idea
25
+
26
+ > How far can a tiny model go in *simulating reasoning*?
27
+
28
+ NanoThink-5M does not truly reason — instead, it learns to **imitate reasoning patterns** through structured training.
29
+
30
+ ---
31
+
32
+ ## 🧠 Capabilities
33
+
34
+ * Generates step-by-step reasoning (`<THINK>`)
35
+ * Produces structured answers (`<ANSWER>`)
36
+ * Handles simple arithmetic and logic patterns
37
+ * Fully CPU-compatible
38
+
39
+ ---
40
+
41
+ ## ⚙️ Model Details
42
+
43
+ * Architecture: Causal Transformer (GPT-style)
44
+ * Parameters: ~5M
45
+ * Layers: 4
46
+ * Heads: 4
47
+ * Embedding size: 128
48
+ * Context length: 256 tokens
49
+
50
+ ---
51
+
52
+ ## 🏗️ Training Pipeline
53
+
54
+ ### 1. Tokenizer
55
+
56
+ Custom tokenizer trained from scratch.
57
+
58
+ ### 2. Pretraining
59
+
60
+ * Portuguese text corpus
61
+ * Language modeling objective
62
+
63
+ ### 3. Fine-tuning
64
+
65
+ * Synthetic reasoning dataset
66
+ * Tasks include:
67
+
68
+ * Arithmetic
69
+ * Logical comparisons
70
+ * Multi-step problems
71
+
72
+ Structured format:
73
+
74
+ ```text
75
+ <USER> ... <\USER>
76
+ <THINK> ... <\THINK>
77
+ <ANSWER> ... <\ANSWER>
78
+ <END>
79
+ ```
80
+
81
+ ---
82
+
83
+ ## 📊 Example
84
+
85
+ **Input:**
86
+
87
+ ```text
88
+ João tem 3 maçãs e ganhou 2, quantas ele tem agora?
89
+ ```
90
+
91
+ **Output:**
92
+
93
+ ```text
94
+ <THINK>
95
+ 3 + 2 = 5
96
+ </THINK>
97
+ <ANSWER>
98
+ João has 5 apples.
99
+ </ANSWER>
100
+ ```
101
+
102
+ ---
103
+
104
+ ## ⚠️ Limitations
105
+
106
+ * Not reliable for precise mathematical reasoning
107
+ * May generate inconsistent intermediate steps
108
+ * Reasoning is **simulated, not grounded**
109
+
110
+ > This model demonstrates *the appearance of reasoning*, not true reasoning.
111
+
112
+ ---
113
+
114
+ ## 🧪 Research Insight
115
+
116
+ NanoThink-5M highlights an important phenomenon:
117
+
118
+ > Small models can learn to **look intelligent before being intelligent**.
119
+
120
+ This reinforces the distinction between:
121
+
122
+ * Simulated reasoning
123
+ * Actual reasoning
124
+
125
+ ---
126
+
127
+ ## 💻 Usage
128
+
129
+ ```python
130
+ import torch
131
+ from safetensors.torch import load_file
132
+ from model import NanoThink
133
+ from tokenizers import Tokenizer
134
+
135
+ tokenizer = Tokenizer.from_file("tokenizer.json")
136
+
137
+ model = NanoThink(vocab_size=1229)
138
+ state_dict = load_file("model.safetensors")
139
+ model.load_state_dict(state_dict)
140
+
141
+ model.eval()
142
+ ```
143
+
144
+ ---
145
+
146
+ ## 🔮 Future Work
147
+
148
+ * Scaling to 10M–50M parameters
149
+ * Improving dataset quality
150
+ * Enhancing reasoning consistency
151
+ * Multilingual support
152
+
153
+ ---
154
+
155
+ ## 🤝 Contributions
156
+
157
+ This is an experimental project — contributions and ideas are welcome.
158
+
159
+ ---
160
+
161
+ ## 📜 License
162
+
163
+ MIT
164
+
165
+ ---
166
+
167
+ ## 🧠 Author
168
+
169
+ AxionLab Co.
170
+ Independent research project exploring the limits of small language models.
171
+
172
+ ---
173
+
174
+ ## ⭐ Final Thought
175
+
176
+ > Intelligence can be mimicked at small scale — but not yet achieved.
177
+
178
+ NanoThink-5M is a step toward understanding that boundary.