Update README.md

5fe3a00 verified about 12 hours ago

5.05 kB

license: mit
language:
  - pt
pipeline_tag: text-generation
tags:
  - base
  - pretrain
  - pretrained
  - nano
  - mini
  - chatbot

🧠 MiniBot-0.9M-Base

Ultra-lightweight GPT-2 style language model (~900K parameters) specialized in Portuguese conversational text.

📌 Overview

MiniBot-0.9M-Base is a tiny decoder-only Transformer (~0.9M parameters) based on the GPT-2 architecture, designed for efficient text generation in Portuguese.

This is a base (pretrained) model — trained purely for next-token prediction, with no instruction tuning or alignment of any kind. It serves as the foundation for fine-tuned variants such as MiniBot-0.9M-Instruct.

🎯 Key Characteristics

Attribute	Detail
🇧🇷 Language	Portuguese (primary)
🧠 Architecture	GPT-2 style (Transformer decoder-only)
🔤 Embeddings	GPT-2 compatible
📉 Parameters	~900K
⚙️ Objective	Causal Language Modeling (next-token prediction)
🚫 Alignment	None (base model)

🏗️ Architecture

MiniBot-0.9M follows a scaled-down GPT-2 design:

Token embeddings + positional embeddings
Multi-head self-attention
Feed-forward (MLP) layers
Autoregressive decoding

Despite its small size, it preserves the core inductive biases of GPT-2, making it well-suited for experimentation and educational purposes.

📚 Training Dataset

The model was trained on a Portuguese conversational dataset focused on language pattern learning.

Training notes:

Pure next-token prediction objective
No instruction tuning (no SFT, no RLHF, no alignment)
Lightweight training pipeline
Optimized for small-scale experimentation

💡 Capabilities

✅ Strengths

Portuguese text generation
Basic dialogue structure
Simple prompt continuation
Linguistic pattern learning

❌ Limitations

Very limited reasoning ability
Loses context in long conversations
Inconsistent outputs
Prone to repetition or incoherence

⚠️ This model behaves as a statistical language generator, not a reasoning system.

🚀 Getting Started

Installation

pip install transformers torch

Usage with Hugging Face Transformers

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "AxionLab-official/MiniBot-0.9M-Base"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

prompt = "User: Me explique o que é gravidade\nBot:"
inputs = tokenizer(prompt, return_tensors="pt")

outputs = model.generate(
    **inputs,
    max_new_tokens=50,
    temperature=0.8,
    top_p=0.95,
    do_sample=True,
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

⚙️ Recommended Settings

Parameter	Recommended Value	Description
`temperature`	`0.7 – 1.0`	Controls randomness
`top_p`	`0.9 – 0.95`	Nucleus sampling
`do_sample`	`True`	Enable sampling
`max_new_tokens`	`30 – 80`	Response length

💡 Base models generally benefit from higher temperature values compared to instruct variants, since there is no fine-tuning to constrain the output distribution.

🧪 Intended Use Cases

Use Case	Suitability
🧠 Fine-tuning (chat, instruction, roleplay)	✅ Ideal
🎮 Prompt playground & experimentation	✅ Ideal
🔬 Research on tiny LLMs	✅ Ideal
📉 Benchmarking small architectures	✅ Ideal
⚡ Local / CPU-only applications	✅ Ideal
🏭 Critical production environments	❌ Not recommended

⚠️ Disclaimer

Extremely small model (~900K parameters)
Limited world knowledge and weak generalization
No safety or alignment measures
Not suitable for production use

🔮 Future Work

🎯 Instruction-tuned version → MiniBot-0.9M-Instruct
📚 Larger and more diverse dataset
🔤 Tokenizer improvements
📈 Scaling to 1M–10M parameters
🧠 Experimental reasoning fine-tuning

📜 License

Distributed under the MIT License. See LICENSE for more details.

👤 Author

Developed by AxionLab 🔬

_{MiniBot-0.9M-Base · AxionLab · MIT License}