File size: 5,080 Bytes

f41e2de
 
 
 
 
75b02a4
 
 
 
 
 
 
a27e334
75b02a4
 
5fe3a00
75b02a4
5fe3a00
75b02a4
5fe3a00
 
 
 
75b02a4
5fe3a00
 
 
 
 
 
 
75b02a4
5fe3a00
75b02a4
5fe3a00
75b02a4
5fe3a00
 
 
 
 
 
 
 
75b02a4
5fe3a00
75b02a4
5fe3a00
75b02a4
5fe3a00
75b02a4
5fe3a00
 
 
 
75b02a4
5fe3a00
75b02a4
5fe3a00
a355081
5fe3a00
a355081
5fe3a00
a355081
5fe3a00
 
 
 
 
a355081
5fe3a00
a355081
5fe3a00
75b02a4
5fe3a00
75b02a4
5fe3a00
 
 
 
75b02a4
5fe3a00
75b02a4
5fe3a00
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
75b02a4
5fe3a00
75b02a4
 
 
 
 
 
 
 
 
5fe3a00
75b02a4
 
 
 
 
 
 
5fe3a00
75b02a4
a355081
75b02a4
a355081
 
5fe3a00
75b02a4
5fe3a00
 
 
 
 
 
75b02a4
5fe3a00
75b02a4
5fe3a00
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
75b02a4
5fe3a00
 
 
75b02a4
5fe3a00
 
 
 
 
75b02a4
5fe3a00
75b02a4
5fe3a00
75b02a4
5fe3a00
75b02a4
5fe3a00
75b02a4
5fe3a00
 
 
 
 
75b02a4
5fe3a00

---
license: mit
language:
- pt
pipeline_tag: text-generation
tags:
- base
- pretrain
- pretrained
- nano
- mini
- chatbot
library_name: transformers
---

# 🧠 MiniBot-0.9M-Base

> **Ultra-lightweight GPT-2 style language model (~900K parameters) specialized in Portuguese conversational text.**

[![Model](https://img.shields.io/badge/🤗%20Hugging%20Face-MiniBot--0.9M--Base-yellow)](https://huggingface.co/AxionLab-official/MiniBot-0.9M-Base)
[![License](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)
[![Language](https://img.shields.io/badge/Language-Portuguese-blue)](https://huggingface.co/AxionLab-official/MiniBot-0.9M-Base)
[![Parameters](https://img.shields.io/badge/Parameters-~900K-orange)](https://huggingface.co/AxionLab-official/MiniBot-0.9M-Base)

---

## 📌 Overview

**MiniBot-0.9M-Base** is a tiny decoder-only Transformer (~0.9M parameters) based on the GPT-2 architecture, designed for efficient text generation in **Portuguese**.

This is a **base (pretrained) model** — trained purely for next-token prediction, with no instruction tuning or alignment of any kind. It serves as the foundation for fine-tuned variants such as [MiniBot-0.9M-Instruct](https://huggingface.co/AxionLab-official/MiniBot-0.9M-Instruct).

---

## 🎯 Key Characteristics

| Attribute | Detail |
|---|---|
| 🇧🇷 **Language** | Portuguese (primary) |
| 🧠 **Architecture** | GPT-2 style (Transformer decoder-only) |
| 🔤 **Embeddings** | GPT-2 compatible |
| 📉 **Parameters** | ~900K |
| ⚙️ **Objective** | Causal Language Modeling (next-token prediction) |
| 🚫 **Alignment** | None (base model) |

---

## 🏗️ Architecture

MiniBot-0.9M follows a scaled-down GPT-2 design:

- Token embeddings + positional embeddings
- Multi-head self-attention
- Feed-forward (MLP) layers
- Autoregressive decoding

Despite its small size, it preserves the core inductive biases of GPT-2, making it well-suited for experimentation and educational purposes.

---

## 📚 Training Dataset

The model was trained on a Portuguese conversational dataset focused on language pattern learning.

**Training notes:**
- Pure next-token prediction objective
- No instruction tuning (no SFT, no RLHF, no alignment)
- Lightweight training pipeline
- Optimized for small-scale experimentation

---

## 💡 Capabilities

### ✅ Strengths

- Portuguese text generation
- Basic dialogue structure
- Simple prompt continuation
- Linguistic pattern learning

### ❌ Limitations

- Very limited reasoning ability
- Loses context in long conversations
- Inconsistent outputs
- Prone to repetition or incoherence

> ⚠️ This model behaves as a statistical language generator, not a reasoning system.

---

## 🚀 Getting Started

### Installation

```bash
pip install transformers torch
```

### Usage with Hugging Face Transformers

```python
from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "AxionLab-official/MiniBot-0.9M-Base"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

prompt = "User: Me explique o que é gravidade\nBot:"
inputs = tokenizer(prompt, return_tensors="pt")

outputs = model.generate(
    **inputs,
    max_new_tokens=50,
    temperature=0.8,
    top_p=0.95,
    do_sample=True,
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

### ⚙️ Recommended Settings

| Parameter | Recommended Value | Description |
|---|---|---|
| `temperature` | `0.7 – 1.0` | Controls randomness |
| `top_p` | `0.9 – 0.95` | Nucleus sampling |
| `do_sample` | `True` | Enable sampling |
| `max_new_tokens` | `30 – 80` | Response length |

> 💡 Base models generally benefit from higher temperature values compared to instruct variants, since there is no fine-tuning to constrain the output distribution.

---

## 🧪 Intended Use Cases

| Use Case | Suitability |
|---|---|
| 🧠 Fine-tuning (chat, instruction, roleplay) | ✅ Ideal |
| 🎮 Prompt playground & experimentation | ✅ Ideal |
| 🔬 Research on tiny LLMs | ✅ Ideal |
| 📉 Benchmarking small architectures | ✅ Ideal |
| ⚡ Local / CPU-only applications | ✅ Ideal |
| 🏭 Critical production environments | ❌ Not recommended |

---

## ⚠️ Disclaimer

- Extremely small model (~900K parameters)
- Limited world knowledge and weak generalization
- No safety or alignment measures
- **Not suitable for production use**

---

## 🔮 Future Work

- [x] 🎯 Instruction-tuned version → [`MiniBot-0.9M-Instruct`](https://huggingface.co/AxionLab-official/MiniBot-0.9M-Instruct)
- [ ] 📚 Larger and more diverse dataset
- [ ] 🔤 Tokenizer improvements
- [ ] 📈 Scaling to 1M–10M parameters
- [ ] 🧠 Experimental reasoning fine-tuning

---

## 📜 License

Distributed under the **MIT License**. See [`LICENSE`](LICENSE) for more details.

---

## 👤 Author

Developed by **[AxionLab](https://huggingface.co/AxionLab-official)** 🔬

---

<div align="center">
  <sub>MiniBot-0.9M-Base · AxionLab · MIT License</sub>
</div>