AxionLab-official commited on
Commit
5fe3a00
ยท
verified ยท
1 Parent(s): a355081

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +115 -78
README.md CHANGED
@@ -12,76 +12,93 @@ tags:
12
  - chatbot
13
  ---
14
 
15
- ๐Ÿง  MiniBot-0.9M-Base
16
 
17
- Ultra-lightweight GPT-2 style language model (~900K parameters) specialized in Portuguese conversational text.
18
 
19
- ๐Ÿ“Œ Model Overview
 
 
 
20
 
21
- MiniBot-0.9M-Base is a tiny decoder-only Transformer (~0.9M parameters) based on the GPT-2 architecture, designed for efficient text generation in Portuguese.
 
 
 
 
 
 
22
 
23
- This model is a base (pretrained) model, meaning it was trained for next-token prediction without instruction tuning or alignment.
24
 
25
- It is intended primarily for:
26
 
27
- ๐Ÿงช Fine-tuning experiments
28
- ๐ŸŽฎ Playground usage
29
- โšก Ultra-fast local inference
30
- ๐Ÿง  Research on small-scale language models
31
- ๐ŸŽฏ Key Characteristics
32
- ๐Ÿ‡ง๐Ÿ‡ท Language: Portuguese (primary)
33
- ๐Ÿง  Architecture: GPT-2 style (decoder-only Transformer)
34
- ๐Ÿ”ค Embeddings: GPT-2 compatible embeddings
35
- ๐Ÿ“‰ Parameters: ~900,000
36
- โš™๏ธ Objective: Causal Language Modeling (next-token prediction)
37
- ๐Ÿšซ Alignment: None (base model)
38
- ๐Ÿ—๏ธ Architecture Details
39
 
40
- MiniBot-0.9M follows a scaled-down GPT-2 design, including:
41
 
42
- Token + positional embeddings
43
- Multi-head self-attention
44
- Feed-forward (MLP) layers
45
- Autoregressive decoding
46
 
47
- Despite its small size, it preserves the core inductive biases of GPT-2, making it ideal for experimentation and educational purposes.
48
 
49
- ๐Ÿ“š Training
50
- Dataset
 
 
51
 
52
- The model was trained on a Portuguese conversational dataset, including:
53
 
 
54
 
55
- Pure text
56
 
57
- Training Notes
58
 
59
- Focused on language pattern learning, not reasoning
 
 
 
 
60
 
61
- No instruction tuning (no RLHF, no alignment)
62
 
63
- Lightweight training pipeline
64
- Optimized for small-scale experimentation
65
- ๐Ÿ’ก Capabilities
66
 
67
- โœ… Strengths:
68
 
69
- Geraรงรฃo de texto em portuguรชs
70
- Estrutura bรกsica de diรกlogo
71
- Continuaรงรฃo de prompts simples
72
- Aprendizado de padrรตes linguรญsticos
73
 
74
- โŒ Limitations:
75
 
76
- Raciocรญnio muito limitado
77
- Perda de contexto em conversas longas
78
- Respostas inconsistentes
79
- Possรญvel repetiรงรฃo ou incoerรชncia
 
 
 
 
 
 
 
 
 
 
 
 
80
 
81
- ๐Ÿ‘‰ This model behaves as a statistical language generator, not a reasoning system.
82
 
83
- ๐Ÿš€ Usage
84
- Hugging Face Transformers
85
  ```python
86
  from transformers import AutoTokenizer, AutoModelForCausalLM
87
 
@@ -90,7 +107,7 @@ model_name = "AxionLab-official/MiniBot-0.9M-Base"
90
  tokenizer = AutoTokenizer.from_pretrained(model_name)
91
  model = AutoModelForCausalLM.from_pretrained(model_name)
92
 
93
- prompt = "The cat "
94
  inputs = tokenizer(prompt, return_tensors="pt")
95
 
96
  outputs = model.generate(
@@ -98,49 +115,69 @@ outputs = model.generate(
98
  max_new_tokens=50,
99
  temperature=0.8,
100
  top_p=0.95,
101
- do_sample=True
102
  )
103
 
104
  print(tokenizer.decode(outputs[0], skip_special_tokens=True))
105
  ```
106
 
107
- โš™๏ธ Recommended Generation Settings
108
 
109
- For better results:
 
 
 
 
 
110
 
111
- temperature: 0.7 โ€“ 1.0
112
- top_p: 0.9 โ€“ 0.95
113
- do_sample: True
114
- max_new_tokens: 30 โ€“ 80
115
- ๐Ÿงช Intended Use
116
 
117
- This is a foundation model, ideal for:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
118
 
119
- ๐Ÿง  Fine-tuning (chat, instruction, roleplay, tools)
120
- ๐ŸŽฎ Prompt playground experimentation
121
- ๐Ÿ”ฌ Research in tiny LLMs
122
- ๐Ÿ“‰ Benchmarking small architectures
123
- โš ๏ธ Limitations
124
 
125
- Due to its extremely small size:
 
 
 
 
126
 
127
- Limited world knowledge
128
- Weak generalization
129
- No safety alignment
130
- Not suitable for production use
131
- ๐Ÿ”ฎ Future Work
132
 
133
- Planned directions:
134
 
135
- ๐Ÿง  Instruction-tuned version (MiniBot-Instruct)
136
- ๐Ÿ“š Larger dataset scaling
137
- ๐Ÿ”ค Tokenizer improvements
138
- ๐Ÿ“ˆ Larger variants (1Mโ€“10M params)
139
- ๐Ÿค– Experimental reasoning fine-tuning
140
- ๐Ÿ“œ License
141
 
142
- MIT
143
 
144
- ๐Ÿ‘ค Author
 
 
 
 
145
 
146
- Developed by AxionLab
 
 
 
12
  - chatbot
13
  ---
14
 
15
+ # ๐Ÿง  MiniBot-0.9M-Base
16
 
17
+ > **Ultra-lightweight GPT-2 style language model (~900K parameters) specialized in Portuguese conversational text.**
18
 
19
+ [![Model](https://img.shields.io/badge/๐Ÿค—%20Hugging%20Face-MiniBot--0.9M--Base-yellow)](https://huggingface.co/AxionLab-official/MiniBot-0.9M-Base)
20
+ [![License](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)
21
+ [![Language](https://img.shields.io/badge/Language-Portuguese-blue)](https://huggingface.co/AxionLab-official/MiniBot-0.9M-Base)
22
+ [![Parameters](https://img.shields.io/badge/Parameters-~900K-orange)](https://huggingface.co/AxionLab-official/MiniBot-0.9M-Base)
23
 
24
+ ---
25
+
26
+ ## ๐Ÿ“Œ Overview
27
+
28
+ **MiniBot-0.9M-Base** is a tiny decoder-only Transformer (~0.9M parameters) based on the GPT-2 architecture, designed for efficient text generation in **Portuguese**.
29
+
30
+ This is a **base (pretrained) model** โ€” trained purely for next-token prediction, with no instruction tuning or alignment of any kind. It serves as the foundation for fine-tuned variants such as [MiniBot-0.9M-Instruct](https://huggingface.co/AxionLab-official/MiniBot-0.9M-Instruct).
31
 
32
+ ---
33
 
34
+ ## ๐ŸŽฏ Key Characteristics
35
 
36
+ | Attribute | Detail |
37
+ |---|---|
38
+ | ๐Ÿ‡ง๐Ÿ‡ท **Language** | Portuguese (primary) |
39
+ | ๐Ÿง  **Architecture** | GPT-2 style (Transformer decoder-only) |
40
+ | ๐Ÿ”ค **Embeddings** | GPT-2 compatible |
41
+ | ๐Ÿ“‰ **Parameters** | ~900K |
42
+ | โš™๏ธ **Objective** | Causal Language Modeling (next-token prediction) |
43
+ | ๐Ÿšซ **Alignment** | None (base model) |
 
 
 
 
44
 
45
+ ---
46
 
47
+ ## ๐Ÿ—๏ธ Architecture
 
 
 
48
 
49
+ MiniBot-0.9M follows a scaled-down GPT-2 design:
50
 
51
+ - Token embeddings + positional embeddings
52
+ - Multi-head self-attention
53
+ - Feed-forward (MLP) layers
54
+ - Autoregressive decoding
55
 
56
+ Despite its small size, it preserves the core inductive biases of GPT-2, making it well-suited for experimentation and educational purposes.
57
 
58
+ ---
59
 
60
+ ## ๐Ÿ“š Training Dataset
61
 
62
+ The model was trained on a Portuguese conversational dataset focused on language pattern learning.
63
 
64
+ **Training notes:**
65
+ - Pure next-token prediction objective
66
+ - No instruction tuning (no SFT, no RLHF, no alignment)
67
+ - Lightweight training pipeline
68
+ - Optimized for small-scale experimentation
69
 
70
+ ---
71
 
72
+ ## ๐Ÿ’ก Capabilities
 
 
73
 
74
+ ### โœ… Strengths
75
 
76
+ - Portuguese text generation
77
+ - Basic dialogue structure
78
+ - Simple prompt continuation
79
+ - Linguistic pattern learning
80
 
81
+ ### โŒ Limitations
82
 
83
+ - Very limited reasoning ability
84
+ - Loses context in long conversations
85
+ - Inconsistent outputs
86
+ - Prone to repetition or incoherence
87
+
88
+ > โš ๏ธ This model behaves as a statistical language generator, not a reasoning system.
89
+
90
+ ---
91
+
92
+ ## ๐Ÿš€ Getting Started
93
+
94
+ ### Installation
95
+
96
+ ```bash
97
+ pip install transformers torch
98
+ ```
99
 
100
+ ### Usage with Hugging Face Transformers
101
 
 
 
102
  ```python
103
  from transformers import AutoTokenizer, AutoModelForCausalLM
104
 
 
107
  tokenizer = AutoTokenizer.from_pretrained(model_name)
108
  model = AutoModelForCausalLM.from_pretrained(model_name)
109
 
110
+ prompt = "User: Me explique o que รฉ gravidade\nBot:"
111
  inputs = tokenizer(prompt, return_tensors="pt")
112
 
113
  outputs = model.generate(
 
115
  max_new_tokens=50,
116
  temperature=0.8,
117
  top_p=0.95,
118
+ do_sample=True,
119
  )
120
 
121
  print(tokenizer.decode(outputs[0], skip_special_tokens=True))
122
  ```
123
 
124
+ ### โš™๏ธ Recommended Settings
125
 
126
+ | Parameter | Recommended Value | Description |
127
+ |---|---|---|
128
+ | `temperature` | `0.7 โ€“ 1.0` | Controls randomness |
129
+ | `top_p` | `0.9 โ€“ 0.95` | Nucleus sampling |
130
+ | `do_sample` | `True` | Enable sampling |
131
+ | `max_new_tokens` | `30 โ€“ 80` | Response length |
132
 
133
+ > ๐Ÿ’ก Base models generally benefit from higher temperature values compared to instruct variants, since there is no fine-tuning to constrain the output distribution.
 
 
 
 
134
 
135
+ ---
136
+
137
+ ## ๐Ÿงช Intended Use Cases
138
+
139
+ | Use Case | Suitability |
140
+ |---|---|
141
+ | ๐Ÿง  Fine-tuning (chat, instruction, roleplay) | โœ… Ideal |
142
+ | ๐ŸŽฎ Prompt playground & experimentation | โœ… Ideal |
143
+ | ๐Ÿ”ฌ Research on tiny LLMs | โœ… Ideal |
144
+ | ๐Ÿ“‰ Benchmarking small architectures | โœ… Ideal |
145
+ | โšก Local / CPU-only applications | โœ… Ideal |
146
+ | ๐Ÿญ Critical production environments | โŒ Not recommended |
147
+
148
+ ---
149
+
150
+ ## โš ๏ธ Disclaimer
151
+
152
+ - Extremely small model (~900K parameters)
153
+ - Limited world knowledge and weak generalization
154
+ - No safety or alignment measures
155
+ - **Not suitable for production use**
156
 
157
+ ---
158
+
159
+ ## ๐Ÿ”ฎ Future Work
 
 
160
 
161
+ - [x] ๐ŸŽฏ Instruction-tuned version โ†’ [`MiniBot-0.9M-Instruct`](https://huggingface.co/AxionLab-official/MiniBot-0.9M-Instruct)
162
+ - [ ] ๐Ÿ“š Larger and more diverse dataset
163
+ - [ ] ๐Ÿ”ค Tokenizer improvements
164
+ - [ ] ๐Ÿ“ˆ Scaling to 1Mโ€“10M parameters
165
+ - [ ] ๐Ÿง  Experimental reasoning fine-tuning
166
 
167
+ ---
 
 
 
 
168
 
169
+ ## ๐Ÿ“œ License
170
 
171
+ Distributed under the **MIT License**. See [`LICENSE`](LICENSE) for more details.
 
 
 
 
 
172
 
173
+ ---
174
 
175
+ ## ๐Ÿ‘ค Author
176
+
177
+ Developed by **[AxionLab](https://huggingface.co/AxionLab-official)** ๐Ÿ”ฌ
178
+
179
+ ---
180
 
181
+ <div align="center">
182
+ <sub>MiniBot-0.9M-Base ยท AxionLab ยท MIT License</sub>
183
+ </div>