MiniBot-0.9M-Base / README.md
AxionLab-official's picture
Update README.md
5fe3a00 verified
metadata
license: mit
language:
  - pt
pipeline_tag: text-generation
tags:
  - base
  - pretrain
  - pretrained
  - nano
  - mini
  - chatbot

๐Ÿง  MiniBot-0.9M-Base

Ultra-lightweight GPT-2 style language model (~900K parameters) specialized in Portuguese conversational text.

Model License Language Parameters


๐Ÿ“Œ Overview

MiniBot-0.9M-Base is a tiny decoder-only Transformer (~0.9M parameters) based on the GPT-2 architecture, designed for efficient text generation in Portuguese.

This is a base (pretrained) model โ€” trained purely for next-token prediction, with no instruction tuning or alignment of any kind. It serves as the foundation for fine-tuned variants such as MiniBot-0.9M-Instruct.


๐ŸŽฏ Key Characteristics

Attribute Detail
๐Ÿ‡ง๐Ÿ‡ท Language Portuguese (primary)
๐Ÿง  Architecture GPT-2 style (Transformer decoder-only)
๐Ÿ”ค Embeddings GPT-2 compatible
๐Ÿ“‰ Parameters ~900K
โš™๏ธ Objective Causal Language Modeling (next-token prediction)
๐Ÿšซ Alignment None (base model)

๐Ÿ—๏ธ Architecture

MiniBot-0.9M follows a scaled-down GPT-2 design:

  • Token embeddings + positional embeddings
  • Multi-head self-attention
  • Feed-forward (MLP) layers
  • Autoregressive decoding

Despite its small size, it preserves the core inductive biases of GPT-2, making it well-suited for experimentation and educational purposes.


๐Ÿ“š Training Dataset

The model was trained on a Portuguese conversational dataset focused on language pattern learning.

Training notes:

  • Pure next-token prediction objective
  • No instruction tuning (no SFT, no RLHF, no alignment)
  • Lightweight training pipeline
  • Optimized for small-scale experimentation

๐Ÿ’ก Capabilities

โœ… Strengths

  • Portuguese text generation
  • Basic dialogue structure
  • Simple prompt continuation
  • Linguistic pattern learning

โŒ Limitations

  • Very limited reasoning ability
  • Loses context in long conversations
  • Inconsistent outputs
  • Prone to repetition or incoherence

โš ๏ธ This model behaves as a statistical language generator, not a reasoning system.


๐Ÿš€ Getting Started

Installation

pip install transformers torch

Usage with Hugging Face Transformers

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "AxionLab-official/MiniBot-0.9M-Base"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

prompt = "User: Me explique o que รฉ gravidade\nBot:"
inputs = tokenizer(prompt, return_tensors="pt")

outputs = model.generate(
    **inputs,
    max_new_tokens=50,
    temperature=0.8,
    top_p=0.95,
    do_sample=True,
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

โš™๏ธ Recommended Settings

Parameter Recommended Value Description
temperature 0.7 โ€“ 1.0 Controls randomness
top_p 0.9 โ€“ 0.95 Nucleus sampling
do_sample True Enable sampling
max_new_tokens 30 โ€“ 80 Response length

๐Ÿ’ก Base models generally benefit from higher temperature values compared to instruct variants, since there is no fine-tuning to constrain the output distribution.


๐Ÿงช Intended Use Cases

Use Case Suitability
๐Ÿง  Fine-tuning (chat, instruction, roleplay) โœ… Ideal
๐ŸŽฎ Prompt playground & experimentation โœ… Ideal
๐Ÿ”ฌ Research on tiny LLMs โœ… Ideal
๐Ÿ“‰ Benchmarking small architectures โœ… Ideal
โšก Local / CPU-only applications โœ… Ideal
๐Ÿญ Critical production environments โŒ Not recommended

โš ๏ธ Disclaimer

  • Extremely small model (~900K parameters)
  • Limited world knowledge and weak generalization
  • No safety or alignment measures
  • Not suitable for production use

๐Ÿ”ฎ Future Work

  • ๐ŸŽฏ Instruction-tuned version โ†’ MiniBot-0.9M-Instruct
  • ๐Ÿ“š Larger and more diverse dataset
  • ๐Ÿ”ค Tokenizer improvements
  • ๐Ÿ“ˆ Scaling to 1Mโ€“10M parameters
  • ๐Ÿง  Experimental reasoning fine-tuning

๐Ÿ“œ License

Distributed under the MIT License. See LICENSE for more details.


๐Ÿ‘ค Author

Developed by AxionLab ๐Ÿ”ฌ


MiniBot-0.9M-Base ยท AxionLab ยท MIT License