Instructions to use AxionLab-official/MiniBot-0.9M-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use AxionLab-official/MiniBot-0.9M-Instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="AxionLab-official/MiniBot-0.9M-Instruct")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("AxionLab-official/MiniBot-0.9M-Instruct")
model = AutoModelForCausalLM.from_pretrained("AxionLab-official/MiniBot-0.9M-Instruct")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use AxionLab-official/MiniBot-0.9M-Instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "AxionLab-official/MiniBot-0.9M-Instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AxionLab-official/MiniBot-0.9M-Instruct",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/AxionLab-official/MiniBot-0.9M-Instruct

SGLang

How to use AxionLab-official/MiniBot-0.9M-Instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "AxionLab-official/MiniBot-0.9M-Instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AxionLab-official/MiniBot-0.9M-Instruct",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "AxionLab-official/MiniBot-0.9M-Instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AxionLab-official/MiniBot-0.9M-Instruct",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use AxionLab-official/MiniBot-0.9M-Instruct with Docker Model Runner:
```
docker model run hf.co/AxionLab-official/MiniBot-0.9M-Instruct
```

AxionLab-official commited on Apr 5

Commit

75b0ac8

verified ·

1 Parent(s): 08e51a9

Update README.md

Browse files

Files changed (1) hide show

README.md +147 -3

README.md CHANGED Viewed

@@ -1,3 +1,147 @@
----
-license: mit
----

+---
+license: mit
+language:
+- pt
+pipeline_tag: text-generation
+---
+## 🧠 MiniBot-0.9M-Instruct
+Instruction-tuned GPT-2 style language model (~900K parameters) optimized for Portuguese conversational tasks.
+## 📌 Model Overview
+MiniBot-0.9M-Instruct is an instruction-tuned version of MiniBot-0.9M-Base, designed to better follow prompts, respond to user inputs, and generate more coherent conversational outputs in Portuguese.
+Built on a GPT-2 architecture (~0.9M parameters), this model was fine-tuned on conversational and instruction-style data to improve usability in real-world interactions.
+🎯 Key Characteristics
+🇧🇷 Language: Portuguese (primary)
+🧠 Architecture: GPT-2 style (decoder-only Transformer)
+🔤 Embeddings: GPT-2 compatible
+📉 Parameters: ~900K
+⚙️ Base Model: MiniBot-0.9M-Base
+🎯 Fine-tuning: Instruction tuning (supervised)
+✅ Alignment: Basic prompt-following behavior
+🧠 What Changed from Base?
+Compared to the base model:
+Feature	Base	Instruct
+Prompt understanding	❌	✅
+Conversational flow	⚠️	✅
+Instruction following	❌	✅
+Coherence	Baixa	Melhorada
+Usability	Experimental	Practical
+👉 The model is now significantly more usable in chat scenarios.
+🏗️ Architecture
+Same core as base:
+Decoder-only Transformer (GPT-2 style)
+Token + positional embeddings
+Self-attention + MLP blocks
+Autoregressive generation
+No architectural changes — only behavioral improvement via fine-tuning.
+📚 Fine-Tuning
+Dataset
+The model was fine-tuned on a Portuguese instruction-style conversational dataset, including:
+Perguntas e respostas
+Instruções simples
+Chat estilo assistente
+Roleplay básico
+Conversas naturais
+Format
+User: Me explique o que é gravidade
+Bot: A gravidade é a força que atrai objetos com massa...
+Strategy
+Supervised fine-tuning (SFT)
+Pattern learning for instruction-following
+No RLHF or preference optimization
+💡 Capabilities
+✅ Strengths:
+Seguir instruções simples
+Responder perguntas básicas
+Conversar de forma mais natural
+Melhor coerência em respostas curtas
+Estrutura de diálogo mais consistente
+❌ Limitations:
+Raciocínio ainda limitado
+Pode errar fatos
+Não mantém contexto longo
+Sensível a prompts mal estruturados
+👉 Mesmo com instruct tuning, ainda é um modelo extremamente pequeno.
+🚀 Usage
+Hugging Face Transformers
+```Python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+model_name = "AxionLab-official/MiniBot-0.9M-Instruct"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(model_name)
+prompt = "User: Me diga uma curiosidade sobre o espaço\nBot:"
+inputs = tokenizer(prompt, return_tensors="pt")
+outputs = model.generate(
+    **inputs,
+    max_new_tokens=80,
+    temperature=0.7,
+    top_p=0.9,
+    do_sample=True
+)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+⚙️ Recommended Settings
+Para melhor qualidade:
+temperature: 0.6 – 0.8
+top_p: 0.85 – 0.95
+do_sample: True
+max_new_tokens: 40 – 100
+👉 Instruct models tendem a performar melhor com menos aleatoriedade.
+🧪 Intended Use
+💬 Chatbots leves em português
+🎮 NPCs e jogos
+🧠 Testes de fine-tuning
+📚 Educação em NLP
+⚡ Aplicações locais (CPU-only)
+⚠️ Limitations
+Modelo extremamente pequeno
+Sem alinhamento robusto
+Pode gerar respostas incorretas
+Não adequado para produção crítica
+🔮 Future Work
+🧠 Reasoning-tuned version (MiniBot-Reason)
+📈 Scaling para 1M–10M parâmetros
+📚 Dataset mais diverso
+🤖 Melhor alinhamento de respostas
+🧩 Tool-use experiments
+📜 License
+MIT
+👤 Author
+Developed by AxionLab