AxionLab-official commited on
Commit
75b02a4
·
verified ·
1 Parent(s): f41e2de

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +140 -1
README.md CHANGED
@@ -3,4 +3,143 @@ license: mit
3
  language:
4
  - pt
5
  pipeline_tag: text-generation
6
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  language:
4
  - pt
5
  pipeline_tag: text-generation
6
+ tags:
7
+ - base
8
+ - pretrain
9
+ - pretrained
10
+ - nano
11
+ - mini
12
+ - chatbot
13
+ ---
14
+
15
+ 🧠 MiniBot-0.9M-Base
16
+
17
+ Ultra-lightweight GPT-2 style language model (~900K parameters) specialized in Portuguese conversational text.
18
+
19
+ 📌 Model Overview
20
+
21
+ MiniBot-0.9M-Base is a tiny decoder-only Transformer (~0.9M parameters) based on the GPT-2 architecture, designed for efficient text generation in Portuguese.
22
+
23
+ This model is a base (pretrained) model, meaning it was trained for next-token prediction without instruction tuning or alignment.
24
+
25
+ It is intended primarily for:
26
+
27
+ 🧪 Fine-tuning experiments
28
+ 🎮 Playground usage
29
+ ⚡ Ultra-fast local inference
30
+ 🧠 Research on small-scale language models
31
+ 🎯 Key Characteristics
32
+ 🇧🇷 Language: Portuguese (primary)
33
+ 🧠 Architecture: GPT-2 style (decoder-only Transformer)
34
+ 🔤 Embeddings: GPT-2 compatible embeddings
35
+ 📉 Parameters: ~900,000
36
+ ⚙️ Objective: Causal Language Modeling (next-token prediction)
37
+ 🚫 Alignment: None (base model)
38
+ 🏗️ Architecture Details
39
+
40
+ MiniBot-0.9M follows a scaled-down GPT-2 design, including:
41
+
42
+ Token + positional embeddings
43
+ Multi-head self-attention
44
+ Feed-forward (MLP) layers
45
+ Autoregressive decoding
46
+
47
+ Despite its small size, it preserves the core inductive biases of GPT-2, making it ideal for experimentation and educational purposes.
48
+
49
+ 📚 Training
50
+ Dataset
51
+
52
+ The model was trained on a Portuguese conversational dataset, including:
53
+
54
+ Dialogues (Usuário ↔ Bot)
55
+ Perguntas e respostas simples
56
+ Conversas casuais
57
+ Estruturas de linguagem natural
58
+ Format
59
+ User: Oi!
60
+ Bot: Olá! Como posso te ajudar?
61
+ Training Notes
62
+ Focused on language pattern learning, not reasoning
63
+ No instruction tuning (no RLHF, no alignment)
64
+ Lightweight training pipeline
65
+ Optimized for small-scale experimentation
66
+ 💡 Capabilities
67
+
68
+ ✅ Strengths:
69
+
70
+ Geração de texto em português
71
+ Estrutura básica de diálogo
72
+ Continuação de prompts simples
73
+ Aprendizado de padrões linguísticos
74
+
75
+ ❌ Limitations:
76
+
77
+ Raciocínio muito limitado
78
+ Perda de contexto em conversas longas
79
+ Respostas inconsistentes
80
+ Possível repetição ou incoerência
81
+
82
+ 👉 This model behaves as a statistical language generator, not a reasoning system.
83
+
84
+ 🚀 Usage
85
+ Hugging Face Transformers
86
+ ```python
87
+ from transformers import AutoTokenizer, AutoModelForCausalLM
88
+
89
+ model_name = "AxionLab-official/MiniBot-0.9M-Base"
90
+
91
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
92
+ model = AutoModelForCausalLM.from_pretrained(model_name)
93
+
94
+ prompt = "User: Oi\nBot:"
95
+ inputs = tokenizer(prompt, return_tensors="pt")
96
+
97
+ outputs = model.generate(
98
+ **inputs,
99
+ max_new_tokens=50,
100
+ temperature=0.8,
101
+ top_p=0.95,
102
+ do_sample=True
103
+ )
104
+ ```
105
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
106
+ ⚙️ Recommended Generation Settings
107
+
108
+ For better results:
109
+
110
+ temperature: 0.7 – 1.0
111
+ top_p: 0.9 – 0.95
112
+ do_sample: True
113
+ max_new_tokens: 30 – 80
114
+ 🧪 Intended Use
115
+
116
+ This is a foundation model, ideal for:
117
+
118
+ 🧠 Fine-tuning (chat, instruction, roleplay, tools)
119
+ 🎮 Prompt playground experimentation
120
+ 🔬 Research in tiny LLMs
121
+ 📉 Benchmarking small architectures
122
+ ⚠️ Limitations
123
+
124
+ Due to its extremely small size:
125
+
126
+ Limited world knowledge
127
+ Weak generalization
128
+ No safety alignment
129
+ Not suitable for production use
130
+ 🔮 Future Work
131
+
132
+ Planned directions:
133
+
134
+ 🧠 Instruction-tuned version (MiniBot-Instruct)
135
+ 📚 Larger dataset scaling
136
+ 🔤 Tokenizer improvements
137
+ 📈 Larger variants (1M–10M params)
138
+ 🤖 Experimental reasoning fine-tuning
139
+ 📜 License
140
+
141
+ MIT
142
+
143
+ 👤 Author
144
+
145
+ Developed by AxionLab