--- license: mit language: - pt pipeline_tag: text-generation tags: - base - pretrain - pretrained - nano - mini - chatbot --- # ๐Ÿง  MiniBot-0.9M-Base > **Ultra-lightweight GPT-2 style language model (~900K parameters) specialized in Portuguese conversational text.** [![Model](https://img.shields.io/badge/๐Ÿค—%20Hugging%20Face-MiniBot--0.9M--Base-yellow)](https://huggingface.co/AxionLab-official/MiniBot-0.9M-Base) [![License](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT) [![Language](https://img.shields.io/badge/Language-Portuguese-blue)](https://huggingface.co/AxionLab-official/MiniBot-0.9M-Base) [![Parameters](https://img.shields.io/badge/Parameters-~900K-orange)](https://huggingface.co/AxionLab-official/MiniBot-0.9M-Base) --- ## ๐Ÿ“Œ Overview **MiniBot-0.9M-Base** is a tiny decoder-only Transformer (~0.9M parameters) based on the GPT-2 architecture, designed for efficient text generation in **Portuguese**. This is a **base (pretrained) model** โ€” trained purely for next-token prediction, with no instruction tuning or alignment of any kind. It serves as the foundation for fine-tuned variants such as [MiniBot-0.9M-Instruct](https://huggingface.co/AxionLab-official/MiniBot-0.9M-Instruct). --- ## ๐ŸŽฏ Key Characteristics | Attribute | Detail | |---|---| | ๐Ÿ‡ง๐Ÿ‡ท **Language** | Portuguese (primary) | | ๐Ÿง  **Architecture** | GPT-2 style (Transformer decoder-only) | | ๐Ÿ”ค **Embeddings** | GPT-2 compatible | | ๐Ÿ“‰ **Parameters** | ~900K | | โš™๏ธ **Objective** | Causal Language Modeling (next-token prediction) | | ๐Ÿšซ **Alignment** | None (base model) | --- ## ๐Ÿ—๏ธ Architecture MiniBot-0.9M follows a scaled-down GPT-2 design: - Token embeddings + positional embeddings - Multi-head self-attention - Feed-forward (MLP) layers - Autoregressive decoding Despite its small size, it preserves the core inductive biases of GPT-2, making it well-suited for experimentation and educational purposes. --- ## ๐Ÿ“š Training Dataset The model was trained on a Portuguese conversational dataset focused on language pattern learning. **Training notes:** - Pure next-token prediction objective - No instruction tuning (no SFT, no RLHF, no alignment) - Lightweight training pipeline - Optimized for small-scale experimentation --- ## ๐Ÿ’ก Capabilities ### โœ… Strengths - Portuguese text generation - Basic dialogue structure - Simple prompt continuation - Linguistic pattern learning ### โŒ Limitations - Very limited reasoning ability - Loses context in long conversations - Inconsistent outputs - Prone to repetition or incoherence > โš ๏ธ This model behaves as a statistical language generator, not a reasoning system. --- ## ๐Ÿš€ Getting Started ### Installation ```bash pip install transformers torch ``` ### Usage with Hugging Face Transformers ```python from transformers import AutoTokenizer, AutoModelForCausalLM model_name = "AxionLab-official/MiniBot-0.9M-Base" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name) prompt = "User: Me explique o que รฉ gravidade\nBot:" inputs = tokenizer(prompt, return_tensors="pt") outputs = model.generate( **inputs, max_new_tokens=50, temperature=0.8, top_p=0.95, do_sample=True, ) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ### โš™๏ธ Recommended Settings | Parameter | Recommended Value | Description | |---|---|---| | `temperature` | `0.7 โ€“ 1.0` | Controls randomness | | `top_p` | `0.9 โ€“ 0.95` | Nucleus sampling | | `do_sample` | `True` | Enable sampling | | `max_new_tokens` | `30 โ€“ 80` | Response length | > ๐Ÿ’ก Base models generally benefit from higher temperature values compared to instruct variants, since there is no fine-tuning to constrain the output distribution. --- ## ๐Ÿงช Intended Use Cases | Use Case | Suitability | |---|---| | ๐Ÿง  Fine-tuning (chat, instruction, roleplay) | โœ… Ideal | | ๐ŸŽฎ Prompt playground & experimentation | โœ… Ideal | | ๐Ÿ”ฌ Research on tiny LLMs | โœ… Ideal | | ๐Ÿ“‰ Benchmarking small architectures | โœ… Ideal | | โšก Local / CPU-only applications | โœ… Ideal | | ๐Ÿญ Critical production environments | โŒ Not recommended | --- ## โš ๏ธ Disclaimer - Extremely small model (~900K parameters) - Limited world knowledge and weak generalization - No safety or alignment measures - **Not suitable for production use** --- ## ๐Ÿ”ฎ Future Work - [x] ๐ŸŽฏ Instruction-tuned version โ†’ [`MiniBot-0.9M-Instruct`](https://huggingface.co/AxionLab-official/MiniBot-0.9M-Instruct) - [ ] ๐Ÿ“š Larger and more diverse dataset - [ ] ๐Ÿ”ค Tokenizer improvements - [ ] ๐Ÿ“ˆ Scaling to 1Mโ€“10M parameters - [ ] ๐Ÿง  Experimental reasoning fine-tuning --- ## ๐Ÿ“œ License Distributed under the **MIT License**. See [`LICENSE`](LICENSE) for more details. --- ## ๐Ÿ‘ค Author Developed by **[AxionLab](https://huggingface.co/AxionLab-official)** ๐Ÿ”ฌ ---
MiniBot-0.9M-Base ยท AxionLab ยท MIT License