MiniBot-0.9M-Base / README.md
AxionLab-official's picture
Update README.md
5fe3a00 verified
---
license: mit
language:
- pt
pipeline_tag: text-generation
tags:
- base
- pretrain
- pretrained
- nano
- mini
- chatbot
---
# ๐Ÿง  MiniBot-0.9M-Base
> **Ultra-lightweight GPT-2 style language model (~900K parameters) specialized in Portuguese conversational text.**
[![Model](https://img.shields.io/badge/๐Ÿค—%20Hugging%20Face-MiniBot--0.9M--Base-yellow)](https://huggingface.co/AxionLab-official/MiniBot-0.9M-Base)
[![License](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)
[![Language](https://img.shields.io/badge/Language-Portuguese-blue)](https://huggingface.co/AxionLab-official/MiniBot-0.9M-Base)
[![Parameters](https://img.shields.io/badge/Parameters-~900K-orange)](https://huggingface.co/AxionLab-official/MiniBot-0.9M-Base)
---
## ๐Ÿ“Œ Overview
**MiniBot-0.9M-Base** is a tiny decoder-only Transformer (~0.9M parameters) based on the GPT-2 architecture, designed for efficient text generation in **Portuguese**.
This is a **base (pretrained) model** โ€” trained purely for next-token prediction, with no instruction tuning or alignment of any kind. It serves as the foundation for fine-tuned variants such as [MiniBot-0.9M-Instruct](https://huggingface.co/AxionLab-official/MiniBot-0.9M-Instruct).
---
## ๐ŸŽฏ Key Characteristics
| Attribute | Detail |
|---|---|
| ๐Ÿ‡ง๐Ÿ‡ท **Language** | Portuguese (primary) |
| ๐Ÿง  **Architecture** | GPT-2 style (Transformer decoder-only) |
| ๐Ÿ”ค **Embeddings** | GPT-2 compatible |
| ๐Ÿ“‰ **Parameters** | ~900K |
| โš™๏ธ **Objective** | Causal Language Modeling (next-token prediction) |
| ๐Ÿšซ **Alignment** | None (base model) |
---
## ๐Ÿ—๏ธ Architecture
MiniBot-0.9M follows a scaled-down GPT-2 design:
- Token embeddings + positional embeddings
- Multi-head self-attention
- Feed-forward (MLP) layers
- Autoregressive decoding
Despite its small size, it preserves the core inductive biases of GPT-2, making it well-suited for experimentation and educational purposes.
---
## ๐Ÿ“š Training Dataset
The model was trained on a Portuguese conversational dataset focused on language pattern learning.
**Training notes:**
- Pure next-token prediction objective
- No instruction tuning (no SFT, no RLHF, no alignment)
- Lightweight training pipeline
- Optimized for small-scale experimentation
---
## ๐Ÿ’ก Capabilities
### โœ… Strengths
- Portuguese text generation
- Basic dialogue structure
- Simple prompt continuation
- Linguistic pattern learning
### โŒ Limitations
- Very limited reasoning ability
- Loses context in long conversations
- Inconsistent outputs
- Prone to repetition or incoherence
> โš ๏ธ This model behaves as a statistical language generator, not a reasoning system.
---
## ๐Ÿš€ Getting Started
### Installation
```bash
pip install transformers torch
```
### Usage with Hugging Face Transformers
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
model_name = "AxionLab-official/MiniBot-0.9M-Base"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
prompt = "User: Me explique o que รฉ gravidade\nBot:"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
**inputs,
max_new_tokens=50,
temperature=0.8,
top_p=0.95,
do_sample=True,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
### โš™๏ธ Recommended Settings
| Parameter | Recommended Value | Description |
|---|---|---|
| `temperature` | `0.7 โ€“ 1.0` | Controls randomness |
| `top_p` | `0.9 โ€“ 0.95` | Nucleus sampling |
| `do_sample` | `True` | Enable sampling |
| `max_new_tokens` | `30 โ€“ 80` | Response length |
> ๐Ÿ’ก Base models generally benefit from higher temperature values compared to instruct variants, since there is no fine-tuning to constrain the output distribution.
---
## ๐Ÿงช Intended Use Cases
| Use Case | Suitability |
|---|---|
| ๐Ÿง  Fine-tuning (chat, instruction, roleplay) | โœ… Ideal |
| ๐ŸŽฎ Prompt playground & experimentation | โœ… Ideal |
| ๐Ÿ”ฌ Research on tiny LLMs | โœ… Ideal |
| ๐Ÿ“‰ Benchmarking small architectures | โœ… Ideal |
| โšก Local / CPU-only applications | โœ… Ideal |
| ๐Ÿญ Critical production environments | โŒ Not recommended |
---
## โš ๏ธ Disclaimer
- Extremely small model (~900K parameters)
- Limited world knowledge and weak generalization
- No safety or alignment measures
- **Not suitable for production use**
---
## ๐Ÿ”ฎ Future Work
- [x] ๐ŸŽฏ Instruction-tuned version โ†’ [`MiniBot-0.9M-Instruct`](https://huggingface.co/AxionLab-official/MiniBot-0.9M-Instruct)
- [ ] ๐Ÿ“š Larger and more diverse dataset
- [ ] ๐Ÿ”ค Tokenizer improvements
- [ ] ๐Ÿ“ˆ Scaling to 1Mโ€“10M parameters
- [ ] ๐Ÿง  Experimental reasoning fine-tuning
---
## ๐Ÿ“œ License
Distributed under the **MIT License**. See [`LICENSE`](LICENSE) for more details.
---
## ๐Ÿ‘ค Author
Developed by **[AxionLab](https://huggingface.co/AxionLab-official)** ๐Ÿ”ฌ
---
<div align="center">
<sub>MiniBot-0.9M-Base ยท AxionLab ยท MIT License</sub>
</div>