| --- |
| license: mit |
| language: |
| - pt |
| pipeline_tag: text-generation |
| tags: |
| - base |
| - pretrain |
| - pretrained |
| - nano |
| - mini |
| - chatbot |
| --- |
| |
| # ๐ง MiniBot-0.9M-Base |
|
|
| > **Ultra-lightweight GPT-2 style language model (~900K parameters) specialized in Portuguese conversational text.** |
|
|
| [](https://huggingface.co/AxionLab-official/MiniBot-0.9M-Base) |
| [](https://opensource.org/licenses/MIT) |
| [](https://huggingface.co/AxionLab-official/MiniBot-0.9M-Base) |
| [](https://huggingface.co/AxionLab-official/MiniBot-0.9M-Base) |
|
|
| --- |
|
|
| ## ๐ Overview |
|
|
| **MiniBot-0.9M-Base** is a tiny decoder-only Transformer (~0.9M parameters) based on the GPT-2 architecture, designed for efficient text generation in **Portuguese**. |
|
|
| This is a **base (pretrained) model** โ trained purely for next-token prediction, with no instruction tuning or alignment of any kind. It serves as the foundation for fine-tuned variants such as [MiniBot-0.9M-Instruct](https://huggingface.co/AxionLab-official/MiniBot-0.9M-Instruct). |
|
|
| --- |
|
|
| ## ๐ฏ Key Characteristics |
|
|
| | Attribute | Detail | |
| |---|---| |
| | ๐ง๐ท **Language** | Portuguese (primary) | |
| | ๐ง **Architecture** | GPT-2 style (Transformer decoder-only) | |
| | ๐ค **Embeddings** | GPT-2 compatible | |
| | ๐ **Parameters** | ~900K | |
| | โ๏ธ **Objective** | Causal Language Modeling (next-token prediction) | |
| | ๐ซ **Alignment** | None (base model) | |
|
|
| --- |
|
|
| ## ๐๏ธ Architecture |
|
|
| MiniBot-0.9M follows a scaled-down GPT-2 design: |
|
|
| - Token embeddings + positional embeddings |
| - Multi-head self-attention |
| - Feed-forward (MLP) layers |
| - Autoregressive decoding |
|
|
| Despite its small size, it preserves the core inductive biases of GPT-2, making it well-suited for experimentation and educational purposes. |
|
|
| --- |
|
|
| ## ๐ Training Dataset |
|
|
| The model was trained on a Portuguese conversational dataset focused on language pattern learning. |
|
|
| **Training notes:** |
| - Pure next-token prediction objective |
| - No instruction tuning (no SFT, no RLHF, no alignment) |
| - Lightweight training pipeline |
| - Optimized for small-scale experimentation |
|
|
| --- |
|
|
| ## ๐ก Capabilities |
|
|
| ### โ
Strengths |
|
|
| - Portuguese text generation |
| - Basic dialogue structure |
| - Simple prompt continuation |
| - Linguistic pattern learning |
|
|
| ### โ Limitations |
|
|
| - Very limited reasoning ability |
| - Loses context in long conversations |
| - Inconsistent outputs |
| - Prone to repetition or incoherence |
|
|
| > โ ๏ธ This model behaves as a statistical language generator, not a reasoning system. |
|
|
| --- |
|
|
| ## ๐ Getting Started |
|
|
| ### Installation |
|
|
| ```bash |
| pip install transformers torch |
| ``` |
|
|
| ### Usage with Hugging Face Transformers |
|
|
| ```python |
| from transformers import AutoTokenizer, AutoModelForCausalLM |
| |
| model_name = "AxionLab-official/MiniBot-0.9M-Base" |
| |
| tokenizer = AutoTokenizer.from_pretrained(model_name) |
| model = AutoModelForCausalLM.from_pretrained(model_name) |
| |
| prompt = "User: Me explique o que รฉ gravidade\nBot:" |
| inputs = tokenizer(prompt, return_tensors="pt") |
| |
| outputs = model.generate( |
| **inputs, |
| max_new_tokens=50, |
| temperature=0.8, |
| top_p=0.95, |
| do_sample=True, |
| ) |
| |
| print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
| ``` |
|
|
| ### โ๏ธ Recommended Settings |
|
|
| | Parameter | Recommended Value | Description | |
| |---|---|---| |
| | `temperature` | `0.7 โ 1.0` | Controls randomness | |
| | `top_p` | `0.9 โ 0.95` | Nucleus sampling | |
| | `do_sample` | `True` | Enable sampling | |
| | `max_new_tokens` | `30 โ 80` | Response length | |
|
|
| > ๐ก Base models generally benefit from higher temperature values compared to instruct variants, since there is no fine-tuning to constrain the output distribution. |
|
|
| --- |
|
|
| ## ๐งช Intended Use Cases |
|
|
| | Use Case | Suitability | |
| |---|---| |
| | ๐ง Fine-tuning (chat, instruction, roleplay) | โ
Ideal | |
| | ๐ฎ Prompt playground & experimentation | โ
Ideal | |
| | ๐ฌ Research on tiny LLMs | โ
Ideal | |
| | ๐ Benchmarking small architectures | โ
Ideal | |
| | โก Local / CPU-only applications | โ
Ideal | |
| | ๐ญ Critical production environments | โ Not recommended | |
|
|
| --- |
|
|
| ## โ ๏ธ Disclaimer |
|
|
| - Extremely small model (~900K parameters) |
| - Limited world knowledge and weak generalization |
| - No safety or alignment measures |
| - **Not suitable for production use** |
|
|
| --- |
|
|
| ## ๐ฎ Future Work |
|
|
| - [x] ๐ฏ Instruction-tuned version โ [`MiniBot-0.9M-Instruct`](https://huggingface.co/AxionLab-official/MiniBot-0.9M-Instruct) |
| - [ ] ๐ Larger and more diverse dataset |
| - [ ] ๐ค Tokenizer improvements |
| - [ ] ๐ Scaling to 1Mโ10M parameters |
| - [ ] ๐ง Experimental reasoning fine-tuning |
|
|
| --- |
|
|
| ## ๐ License |
|
|
| Distributed under the **MIT License**. See [`LICENSE`](LICENSE) for more details. |
|
|
| --- |
|
|
| ## ๐ค Author |
|
|
| Developed by **[AxionLab](https://huggingface.co/AxionLab-official)** ๐ฌ |
|
|
| --- |
|
|
| <div align="center"> |
| <sub>MiniBot-0.9M-Base ยท AxionLab ยท MIT License</sub> |
| </div> |