--- license: mit language: - pt pipeline_tag: text-generation tags: - base - pretrain - pretrained - nano - mini - chatbot --- # ๐ง MiniBot-0.9M-Base > **Ultra-lightweight GPT-2 style language model (~900K parameters) specialized in Portuguese conversational text.** [](https://huggingface.co/AxionLab-official/MiniBot-0.9M-Base) [](https://opensource.org/licenses/MIT) [](https://huggingface.co/AxionLab-official/MiniBot-0.9M-Base) [](https://huggingface.co/AxionLab-official/MiniBot-0.9M-Base) --- ## ๐ Overview **MiniBot-0.9M-Base** is a tiny decoder-only Transformer (~0.9M parameters) based on the GPT-2 architecture, designed for efficient text generation in **Portuguese**. This is a **base (pretrained) model** โ trained purely for next-token prediction, with no instruction tuning or alignment of any kind. It serves as the foundation for fine-tuned variants such as [MiniBot-0.9M-Instruct](https://huggingface.co/AxionLab-official/MiniBot-0.9M-Instruct). --- ## ๐ฏ Key Characteristics | Attribute | Detail | |---|---| | ๐ง๐ท **Language** | Portuguese (primary) | | ๐ง **Architecture** | GPT-2 style (Transformer decoder-only) | | ๐ค **Embeddings** | GPT-2 compatible | | ๐ **Parameters** | ~900K | | โ๏ธ **Objective** | Causal Language Modeling (next-token prediction) | | ๐ซ **Alignment** | None (base model) | --- ## ๐๏ธ Architecture MiniBot-0.9M follows a scaled-down GPT-2 design: - Token embeddings + positional embeddings - Multi-head self-attention - Feed-forward (MLP) layers - Autoregressive decoding Despite its small size, it preserves the core inductive biases of GPT-2, making it well-suited for experimentation and educational purposes. --- ## ๐ Training Dataset The model was trained on a Portuguese conversational dataset focused on language pattern learning. **Training notes:** - Pure next-token prediction objective - No instruction tuning (no SFT, no RLHF, no alignment) - Lightweight training pipeline - Optimized for small-scale experimentation --- ## ๐ก Capabilities ### โ Strengths - Portuguese text generation - Basic dialogue structure - Simple prompt continuation - Linguistic pattern learning ### โ Limitations - Very limited reasoning ability - Loses context in long conversations - Inconsistent outputs - Prone to repetition or incoherence > โ ๏ธ This model behaves as a statistical language generator, not a reasoning system. --- ## ๐ Getting Started ### Installation ```bash pip install transformers torch ``` ### Usage with Hugging Face Transformers ```python from transformers import AutoTokenizer, AutoModelForCausalLM model_name = "AxionLab-official/MiniBot-0.9M-Base" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name) prompt = "User: Me explique o que รฉ gravidade\nBot:" inputs = tokenizer(prompt, return_tensors="pt") outputs = model.generate( **inputs, max_new_tokens=50, temperature=0.8, top_p=0.95, do_sample=True, ) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ### โ๏ธ Recommended Settings | Parameter | Recommended Value | Description | |---|---|---| | `temperature` | `0.7 โ 1.0` | Controls randomness | | `top_p` | `0.9 โ 0.95` | Nucleus sampling | | `do_sample` | `True` | Enable sampling | | `max_new_tokens` | `30 โ 80` | Response length | > ๐ก Base models generally benefit from higher temperature values compared to instruct variants, since there is no fine-tuning to constrain the output distribution. --- ## ๐งช Intended Use Cases | Use Case | Suitability | |---|---| | ๐ง Fine-tuning (chat, instruction, roleplay) | โ Ideal | | ๐ฎ Prompt playground & experimentation | โ Ideal | | ๐ฌ Research on tiny LLMs | โ Ideal | | ๐ Benchmarking small architectures | โ Ideal | | โก Local / CPU-only applications | โ Ideal | | ๐ญ Critical production environments | โ Not recommended | --- ## โ ๏ธ Disclaimer - Extremely small model (~900K parameters) - Limited world knowledge and weak generalization - No safety or alignment measures - **Not suitable for production use** --- ## ๐ฎ Future Work - [x] ๐ฏ Instruction-tuned version โ [`MiniBot-0.9M-Instruct`](https://huggingface.co/AxionLab-official/MiniBot-0.9M-Instruct) - [ ] ๐ Larger and more diverse dataset - [ ] ๐ค Tokenizer improvements - [ ] ๐ Scaling to 1Mโ10M parameters - [ ] ๐ง Experimental reasoning fine-tuning --- ## ๐ License Distributed under the **MIT License**. See [`LICENSE`](LICENSE) for more details. --- ## ๐ค Author Developed by **[AxionLab](https://huggingface.co/AxionLab-official)** ๐ฌ ---