File size: 5,080 Bytes
f41e2de
 
 
 
 
75b02a4
 
 
 
 
 
 
a27e334
75b02a4
 
5fe3a00
75b02a4
5fe3a00
75b02a4
5fe3a00
 
 
 
75b02a4
5fe3a00
 
 
 
 
 
 
75b02a4
5fe3a00
75b02a4
5fe3a00
75b02a4
5fe3a00
 
 
 
 
 
 
 
75b02a4
5fe3a00
75b02a4
5fe3a00
75b02a4
5fe3a00
75b02a4
5fe3a00
 
 
 
75b02a4
5fe3a00
75b02a4
5fe3a00
a355081
5fe3a00
a355081
5fe3a00
a355081
5fe3a00
 
 
 
 
a355081
5fe3a00
a355081
5fe3a00
75b02a4
5fe3a00
75b02a4
5fe3a00
 
 
 
75b02a4
5fe3a00
75b02a4
5fe3a00
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
75b02a4
5fe3a00
75b02a4
 
 
 
 
 
 
 
 
5fe3a00
75b02a4
 
 
 
 
 
 
5fe3a00
75b02a4
a355081
75b02a4
a355081
 
5fe3a00
75b02a4
5fe3a00
 
 
 
 
 
75b02a4
5fe3a00
75b02a4
5fe3a00
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
75b02a4
5fe3a00
 
 
75b02a4
5fe3a00
 
 
 
 
75b02a4
5fe3a00
75b02a4
5fe3a00
75b02a4
5fe3a00
75b02a4
5fe3a00
75b02a4
5fe3a00
 
 
 
 
75b02a4
5fe3a00
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
---
license: mit
language:
- pt
pipeline_tag: text-generation
tags:
- base
- pretrain
- pretrained
- nano
- mini
- chatbot
library_name: transformers
---

# ๐Ÿง  MiniBot-0.9M-Base

> **Ultra-lightweight GPT-2 style language model (~900K parameters) specialized in Portuguese conversational text.**

[![Model](https://img.shields.io/badge/๐Ÿค—%20Hugging%20Face-MiniBot--0.9M--Base-yellow)](https://huggingface.co/AxionLab-official/MiniBot-0.9M-Base)
[![License](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)
[![Language](https://img.shields.io/badge/Language-Portuguese-blue)](https://huggingface.co/AxionLab-official/MiniBot-0.9M-Base)
[![Parameters](https://img.shields.io/badge/Parameters-~900K-orange)](https://huggingface.co/AxionLab-official/MiniBot-0.9M-Base)

---

## ๐Ÿ“Œ Overview

**MiniBot-0.9M-Base** is a tiny decoder-only Transformer (~0.9M parameters) based on the GPT-2 architecture, designed for efficient text generation in **Portuguese**.

This is a **base (pretrained) model** โ€” trained purely for next-token prediction, with no instruction tuning or alignment of any kind. It serves as the foundation for fine-tuned variants such as [MiniBot-0.9M-Instruct](https://huggingface.co/AxionLab-official/MiniBot-0.9M-Instruct).

---

## ๐ŸŽฏ Key Characteristics

| Attribute | Detail |
|---|---|
| ๐Ÿ‡ง๐Ÿ‡ท **Language** | Portuguese (primary) |
| ๐Ÿง  **Architecture** | GPT-2 style (Transformer decoder-only) |
| ๐Ÿ”ค **Embeddings** | GPT-2 compatible |
| ๐Ÿ“‰ **Parameters** | ~900K |
| โš™๏ธ **Objective** | Causal Language Modeling (next-token prediction) |
| ๐Ÿšซ **Alignment** | None (base model) |

---

## ๐Ÿ—๏ธ Architecture

MiniBot-0.9M follows a scaled-down GPT-2 design:

- Token embeddings + positional embeddings
- Multi-head self-attention
- Feed-forward (MLP) layers
- Autoregressive decoding

Despite its small size, it preserves the core inductive biases of GPT-2, making it well-suited for experimentation and educational purposes.

---

## ๐Ÿ“š Training Dataset

The model was trained on a Portuguese conversational dataset focused on language pattern learning.

**Training notes:**
- Pure next-token prediction objective
- No instruction tuning (no SFT, no RLHF, no alignment)
- Lightweight training pipeline
- Optimized for small-scale experimentation

---

## ๐Ÿ’ก Capabilities

### โœ… Strengths

- Portuguese text generation
- Basic dialogue structure
- Simple prompt continuation
- Linguistic pattern learning

### โŒ Limitations

- Very limited reasoning ability
- Loses context in long conversations
- Inconsistent outputs
- Prone to repetition or incoherence

> โš ๏ธ This model behaves as a statistical language generator, not a reasoning system.

---

## ๐Ÿš€ Getting Started

### Installation

```bash
pip install transformers torch
```

### Usage with Hugging Face Transformers

```python
from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "AxionLab-official/MiniBot-0.9M-Base"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

prompt = "User: Me explique o que รฉ gravidade\nBot:"
inputs = tokenizer(prompt, return_tensors="pt")

outputs = model.generate(
    **inputs,
    max_new_tokens=50,
    temperature=0.8,
    top_p=0.95,
    do_sample=True,
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

### โš™๏ธ Recommended Settings

| Parameter | Recommended Value | Description |
|---|---|---|
| `temperature` | `0.7 โ€“ 1.0` | Controls randomness |
| `top_p` | `0.9 โ€“ 0.95` | Nucleus sampling |
| `do_sample` | `True` | Enable sampling |
| `max_new_tokens` | `30 โ€“ 80` | Response length |

> ๐Ÿ’ก Base models generally benefit from higher temperature values compared to instruct variants, since there is no fine-tuning to constrain the output distribution.

---

## ๐Ÿงช Intended Use Cases

| Use Case | Suitability |
|---|---|
| ๐Ÿง  Fine-tuning (chat, instruction, roleplay) | โœ… Ideal |
| ๐ŸŽฎ Prompt playground & experimentation | โœ… Ideal |
| ๐Ÿ”ฌ Research on tiny LLMs | โœ… Ideal |
| ๐Ÿ“‰ Benchmarking small architectures | โœ… Ideal |
| โšก Local / CPU-only applications | โœ… Ideal |
| ๐Ÿญ Critical production environments | โŒ Not recommended |

---

## โš ๏ธ Disclaimer

- Extremely small model (~900K parameters)
- Limited world knowledge and weak generalization
- No safety or alignment measures
- **Not suitable for production use**

---

## ๐Ÿ”ฎ Future Work

- [x] ๐ŸŽฏ Instruction-tuned version โ†’ [`MiniBot-0.9M-Instruct`](https://huggingface.co/AxionLab-official/MiniBot-0.9M-Instruct)
- [ ] ๐Ÿ“š Larger and more diverse dataset
- [ ] ๐Ÿ”ค Tokenizer improvements
- [ ] ๐Ÿ“ˆ Scaling to 1Mโ€“10M parameters
- [ ] ๐Ÿง  Experimental reasoning fine-tuning

---

## ๐Ÿ“œ License

Distributed under the **MIT License**. See [`LICENSE`](LICENSE) for more details.

---

## ๐Ÿ‘ค Author

Developed by **[AxionLab](https://huggingface.co/AxionLab-official)** ๐Ÿ”ฌ

---

<div align="center">
  <sub>MiniBot-0.9M-Base ยท AxionLab ยท MIT License</sub>
</div>