File size: 5,080 Bytes
f41e2de 75b02a4 a27e334 75b02a4 5fe3a00 75b02a4 5fe3a00 75b02a4 5fe3a00 75b02a4 5fe3a00 75b02a4 5fe3a00 75b02a4 5fe3a00 75b02a4 5fe3a00 75b02a4 5fe3a00 75b02a4 5fe3a00 75b02a4 5fe3a00 75b02a4 5fe3a00 75b02a4 5fe3a00 75b02a4 5fe3a00 a355081 5fe3a00 a355081 5fe3a00 a355081 5fe3a00 a355081 5fe3a00 a355081 5fe3a00 75b02a4 5fe3a00 75b02a4 5fe3a00 75b02a4 5fe3a00 75b02a4 5fe3a00 75b02a4 5fe3a00 75b02a4 5fe3a00 75b02a4 5fe3a00 75b02a4 a355081 75b02a4 a355081 5fe3a00 75b02a4 5fe3a00 75b02a4 5fe3a00 75b02a4 5fe3a00 75b02a4 5fe3a00 75b02a4 5fe3a00 75b02a4 5fe3a00 75b02a4 5fe3a00 75b02a4 5fe3a00 75b02a4 5fe3a00 75b02a4 5fe3a00 75b02a4 5fe3a00 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 | ---
license: mit
language:
- pt
pipeline_tag: text-generation
tags:
- base
- pretrain
- pretrained
- nano
- mini
- chatbot
library_name: transformers
---
# ๐ง MiniBot-0.9M-Base
> **Ultra-lightweight GPT-2 style language model (~900K parameters) specialized in Portuguese conversational text.**
[](https://huggingface.co/AxionLab-official/MiniBot-0.9M-Base)
[](https://opensource.org/licenses/MIT)
[](https://huggingface.co/AxionLab-official/MiniBot-0.9M-Base)
[](https://huggingface.co/AxionLab-official/MiniBot-0.9M-Base)
---
## ๐ Overview
**MiniBot-0.9M-Base** is a tiny decoder-only Transformer (~0.9M parameters) based on the GPT-2 architecture, designed for efficient text generation in **Portuguese**.
This is a **base (pretrained) model** โ trained purely for next-token prediction, with no instruction tuning or alignment of any kind. It serves as the foundation for fine-tuned variants such as [MiniBot-0.9M-Instruct](https://huggingface.co/AxionLab-official/MiniBot-0.9M-Instruct).
---
## ๐ฏ Key Characteristics
| Attribute | Detail |
|---|---|
| ๐ง๐ท **Language** | Portuguese (primary) |
| ๐ง **Architecture** | GPT-2 style (Transformer decoder-only) |
| ๐ค **Embeddings** | GPT-2 compatible |
| ๐ **Parameters** | ~900K |
| โ๏ธ **Objective** | Causal Language Modeling (next-token prediction) |
| ๐ซ **Alignment** | None (base model) |
---
## ๐๏ธ Architecture
MiniBot-0.9M follows a scaled-down GPT-2 design:
- Token embeddings + positional embeddings
- Multi-head self-attention
- Feed-forward (MLP) layers
- Autoregressive decoding
Despite its small size, it preserves the core inductive biases of GPT-2, making it well-suited for experimentation and educational purposes.
---
## ๐ Training Dataset
The model was trained on a Portuguese conversational dataset focused on language pattern learning.
**Training notes:**
- Pure next-token prediction objective
- No instruction tuning (no SFT, no RLHF, no alignment)
- Lightweight training pipeline
- Optimized for small-scale experimentation
---
## ๐ก Capabilities
### โ
Strengths
- Portuguese text generation
- Basic dialogue structure
- Simple prompt continuation
- Linguistic pattern learning
### โ Limitations
- Very limited reasoning ability
- Loses context in long conversations
- Inconsistent outputs
- Prone to repetition or incoherence
> โ ๏ธ This model behaves as a statistical language generator, not a reasoning system.
---
## ๐ Getting Started
### Installation
```bash
pip install transformers torch
```
### Usage with Hugging Face Transformers
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
model_name = "AxionLab-official/MiniBot-0.9M-Base"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
prompt = "User: Me explique o que รฉ gravidade\nBot:"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
**inputs,
max_new_tokens=50,
temperature=0.8,
top_p=0.95,
do_sample=True,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
### โ๏ธ Recommended Settings
| Parameter | Recommended Value | Description |
|---|---|---|
| `temperature` | `0.7 โ 1.0` | Controls randomness |
| `top_p` | `0.9 โ 0.95` | Nucleus sampling |
| `do_sample` | `True` | Enable sampling |
| `max_new_tokens` | `30 โ 80` | Response length |
> ๐ก Base models generally benefit from higher temperature values compared to instruct variants, since there is no fine-tuning to constrain the output distribution.
---
## ๐งช Intended Use Cases
| Use Case | Suitability |
|---|---|
| ๐ง Fine-tuning (chat, instruction, roleplay) | โ
Ideal |
| ๐ฎ Prompt playground & experimentation | โ
Ideal |
| ๐ฌ Research on tiny LLMs | โ
Ideal |
| ๐ Benchmarking small architectures | โ
Ideal |
| โก Local / CPU-only applications | โ
Ideal |
| ๐ญ Critical production environments | โ Not recommended |
---
## โ ๏ธ Disclaimer
- Extremely small model (~900K parameters)
- Limited world knowledge and weak generalization
- No safety or alignment measures
- **Not suitable for production use**
---
## ๐ฎ Future Work
- [x] ๐ฏ Instruction-tuned version โ [`MiniBot-0.9M-Instruct`](https://huggingface.co/AxionLab-official/MiniBot-0.9M-Instruct)
- [ ] ๐ Larger and more diverse dataset
- [ ] ๐ค Tokenizer improvements
- [ ] ๐ Scaling to 1Mโ10M parameters
- [ ] ๐ง Experimental reasoning fine-tuning
---
## ๐ License
Distributed under the **MIT License**. See [`LICENSE`](LICENSE) for more details.
---
## ๐ค Author
Developed by **[AxionLab](https://huggingface.co/AxionLab-official)** ๐ฌ
---
<div align="center">
<sub>MiniBot-0.9M-Base ยท AxionLab ยท MIT License</sub>
</div> |