---
license: mit
language:
- pt
pipeline_tag: text-generation
tags:
- base
- pretrain
- pretrained
- nano
- mini
- chatbot
---

# 🧠 MiniBot-0.9M-Base

> **Ultra-lightweight GPT-2 style language model (~900K parameters) specialized in Portuguese conversational text.**

[![Model](https://img.shields.io/badge/🤗%20Hugging%20Face-MiniBot--0.9M--Base-yellow)](https://huggingface.co/AxionLab-official/MiniBot-0.9M-Base)
[![License](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)
[![Language](https://img.shields.io/badge/Language-Portuguese-blue)](https://huggingface.co/AxionLab-official/MiniBot-0.9M-Base)
[![Parameters](https://img.shields.io/badge/Parameters-~900K-orange)](https://huggingface.co/AxionLab-official/MiniBot-0.9M-Base)

---

## 📌 Overview

**MiniBot-0.9M-Base** is a tiny decoder-only Transformer (~0.9M parameters) based on the GPT-2 architecture, designed for efficient text generation in **Portuguese**.

This is a **base (pretrained) model** — trained purely for next-token prediction, with no instruction tuning or alignment of any kind. It serves as the foundation for fine-tuned variants such as [MiniBot-0.9M-Instruct](https://huggingface.co/AxionLab-official/MiniBot-0.9M-Instruct).

---

## 🎯 Key Characteristics

| Attribute | Detail |
|---|---|
| 🇧🇷 **Language** | Portuguese (primary) |
| 🧠 **Architecture** | GPT-2 style (Transformer decoder-only) |
| 🔤 **Embeddings** | GPT-2 compatible |
| 📉 **Parameters** | ~900K |
| ⚙️ **Objective** | Causal Language Modeling (next-token prediction) |
| 🚫 **Alignment** | None (base model) |

---

## 🏗️ Architecture

MiniBot-0.9M follows a scaled-down GPT-2 design:

- Token embeddings + positional embeddings
- Multi-head self-attention
- Feed-forward (MLP) layers
- Autoregressive decoding

Despite its small size, it preserves the core inductive biases of GPT-2, making it well-suited for experimentation and educational purposes.

---

## 📚 Training Dataset

The model was trained on a Portuguese conversational dataset focused on language pattern learning.

**Training notes:**
- Pure next-token prediction objective
- No instruction tuning (no SFT, no RLHF, no alignment)
- Lightweight training pipeline
- Optimized for small-scale experimentation

---

## 💡 Capabilities

### ✅ Strengths

- Portuguese text generation
- Basic dialogue structure
- Simple prompt continuation
- Linguistic pattern learning

### ❌ Limitations

- Very limited reasoning ability
- Loses context in long conversations
- Inconsistent outputs
- Prone to repetition or incoherence

> ⚠️ This model behaves as a statistical language generator, not a reasoning system.

---

## 🚀 Getting Started

### Installation

```bash
pip install transformers torch
```

### Usage with Hugging Face Transformers

```python
from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "AxionLab-official/MiniBot-0.9M-Base"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

prompt = "User: Me explique o que é gravidade\nBot:"
inputs = tokenizer(prompt, return_tensors="pt")

outputs = model.generate(
    **inputs,
    max_new_tokens=50,
    temperature=0.8,
    top_p=0.95,
    do_sample=True,
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

### ⚙️ Recommended Settings

| Parameter | Recommended Value | Description |
|---|---|---|
| `temperature` | `0.7 – 1.0` | Controls randomness |
| `top_p` | `0.9 – 0.95` | Nucleus sampling |
| `do_sample` | `True` | Enable sampling |
| `max_new_tokens` | `30 – 80` | Response length |

> 💡 Base models generally benefit from higher temperature values compared to instruct variants, since there is no fine-tuning to constrain the output distribution.

---

## 🧪 Intended Use Cases

| Use Case | Suitability |
|---|---|
| 🧠 Fine-tuning (chat, instruction, roleplay) | ✅ Ideal |
| 🎮 Prompt playground & experimentation | ✅ Ideal |
| 🔬 Research on tiny LLMs | ✅ Ideal |
| 📉 Benchmarking small architectures | ✅ Ideal |
| ⚡ Local / CPU-only applications | ✅ Ideal |
| 🏭 Critical production environments | ❌ Not recommended |

---

## ⚠️ Disclaimer

- Extremely small model (~900K parameters)
- Limited world knowledge and weak generalization
- No safety or alignment measures
- **Not suitable for production use**

---

## 🔮 Future Work

- [x] 🎯 Instruction-tuned version → [`MiniBot-0.9M-Instruct`](https://huggingface.co/AxionLab-official/MiniBot-0.9M-Instruct)
- [ ] 📚 Larger and more diverse dataset
- [ ] 🔤 Tokenizer improvements
- [ ] 📈 Scaling to 1M–10M parameters
- [ ] 🧠 Experimental reasoning fine-tuning

---

## 📜 License

Distributed under the **MIT License**. See [`LICENSE`](LICENSE) for more details.

---

## 👤 Author

Developed by **[AxionLab](https://huggingface.co/AxionLab-official)** 🔬

---

<div align="center">
  <sub>MiniBot-0.9M-Base · AxionLab · MIT License</sub>
</div>