๐ง MiniBot-0.9M-Base
Ultra-lightweight GPT-2 style language model (~900K parameters) specialized in Portuguese conversational text.
๐ Overview
MiniBot-0.9M-Base is a tiny decoder-only Transformer (~0.9M parameters) based on the GPT-2 architecture, designed for efficient text generation in Portuguese.
This is a base (pretrained) model โ trained purely for next-token prediction, with no instruction tuning or alignment of any kind. It serves as the foundation for fine-tuned variants such as MiniBot-0.9M-Instruct.
๐ฏ Key Characteristics
| Attribute | Detail |
|---|---|
| ๐ง๐ท Language | Portuguese (primary) |
| ๐ง Architecture | GPT-2 style (Transformer decoder-only) |
| ๐ค Embeddings | GPT-2 compatible |
| ๐ Parameters | ~900K |
| โ๏ธ Objective | Causal Language Modeling (next-token prediction) |
| ๐ซ Alignment | None (base model) |
๐๏ธ Architecture
MiniBot-0.9M follows a scaled-down GPT-2 design:
- Token embeddings + positional embeddings
- Multi-head self-attention
- Feed-forward (MLP) layers
- Autoregressive decoding
Despite its small size, it preserves the core inductive biases of GPT-2, making it well-suited for experimentation and educational purposes.
๐ Training Dataset
The model was trained on a Portuguese conversational dataset focused on language pattern learning.
Training notes:
- Pure next-token prediction objective
- No instruction tuning (no SFT, no RLHF, no alignment)
- Lightweight training pipeline
- Optimized for small-scale experimentation
๐ก Capabilities
โ Strengths
- Portuguese text generation
- Basic dialogue structure
- Simple prompt continuation
- Linguistic pattern learning
โ Limitations
- Very limited reasoning ability
- Loses context in long conversations
- Inconsistent outputs
- Prone to repetition or incoherence
โ ๏ธ This model behaves as a statistical language generator, not a reasoning system.
๐ Getting Started
Installation
pip install transformers torch
Usage with Hugging Face Transformers
from transformers import AutoTokenizer, AutoModelForCausalLM
model_name = "AxionLab-official/MiniBot-0.9M-Base"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
prompt = "User: Me explique o que รฉ gravidade\nBot:"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
**inputs,
max_new_tokens=50,
temperature=0.8,
top_p=0.95,
do_sample=True,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
โ๏ธ Recommended Settings
| Parameter | Recommended Value | Description |
|---|---|---|
temperature |
0.7 โ 1.0 |
Controls randomness |
top_p |
0.9 โ 0.95 |
Nucleus sampling |
do_sample |
True |
Enable sampling |
max_new_tokens |
30 โ 80 |
Response length |
๐ก Base models generally benefit from higher temperature values compared to instruct variants, since there is no fine-tuning to constrain the output distribution.
๐งช Intended Use Cases
| Use Case | Suitability |
|---|---|
| ๐ง Fine-tuning (chat, instruction, roleplay) | โ Ideal |
| ๐ฎ Prompt playground & experimentation | โ Ideal |
| ๐ฌ Research on tiny LLMs | โ Ideal |
| ๐ Benchmarking small architectures | โ Ideal |
| โก Local / CPU-only applications | โ Ideal |
| ๐ญ Critical production environments | โ Not recommended |
โ ๏ธ Disclaimer
- Extremely small model (~900K parameters)
- Limited world knowledge and weak generalization
- No safety or alignment measures
- Not suitable for production use
๐ฎ Future Work
- ๐ฏ Instruction-tuned version โ
MiniBot-0.9M-Instruct - ๐ Larger and more diverse dataset
- ๐ค Tokenizer improvements
- ๐ Scaling to 1Mโ10M parameters
- ๐ง Experimental reasoning fine-tuning
๐ License
Distributed under the MIT License. See LICENSE for more details.
๐ค Author
Developed by AxionLab ๐ฌ
- Downloads last month
- -