π§ Zelin-4B β Argentine Spanish Minecraft Discord Bot LLM
Fine-tuned Qwen3-4B-Instruct for Zelin, the autonomous AI bot of the TomateSMP Minecraft server.
π― What It Does
Zelin-4B is specialized for:
- Argentine Spanish chat β speaks natively with "vos", "che", "dale", "quΓ© bajΓ³n"
- Minecraft server management β understands commands, server status, gameplay
- Intent detection β classifies what users want (JSON output)
- Moderation decisions β detects toxicity and suggests actions (JSON output)
- Sentiment analysis β reads emotional tone in Argentine context (JSON output)
- Short Discord responses β 1-3 lines, casual, no formal language
π Model Details
| Attribute | Value |
|---|---|
| Base Model | Qwen3-4B-Instruct |
| Fine-tune Method | QLoRA (4-bit, r=16) |
| Training Framework | Unsloth |
| Training Data | 3,000 ChatML conversations |
| Languages | es-AR (Argentine Spanish) |
| Context Length | 2048 tokens |
| GGUF Quantization | Q4_K_M (~2.5 GB) |
π Quick Start
llama.cpp (CPU, fastest)
# Download GGUF
huggingface-cli download TomatitoToho/Zelin-4B zelin-4b-Q4_K_M.gguf --local-dir .
# Run server
llama-server -m zelin-4b-Q4_K_M.gguf -c 2048 -t 4 --port 8080
Python (llama-cpp-python)
from llama_cpp import Llama
llm = Llama(model_path="zelin-4b-Q4_K_M.gguf", n_ctx=2048)
result = llm.create_chat_completion(
messages=[
{"role": "system", "content": "Sos Zelin, la IA del servidor TomateSMP..."},
{"role": "user", "content": "hola zelin, quΓ© onda"},
],
max_tokens=100,
temperature=0.7,
)
print(result["choices"][0]["message"]["content"])
# β "holaa, quΓ© onda che"
HuggingFace Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("TomatitoToho/Zelin-4B")
tokenizer = AutoTokenizer.from_pretrained("TomatitoToho/Zelin-4B")
π Training Data
| Category | Count | Description |
|---|---|---|
| Casual Chat | 1,142 | Argentine Spanish conversations |
| Minecraft | 706 | Server management, gameplay |
| Intent Detection | 430 | Classification JSON |
| Moderation | 288 | Action decision JSON |
| Sentiment | 284 | Emotional analysis JSON |
| Total | 3,000 | 95% train / 5% validation |
π§ Training Configuration
# QLoRA Configuration
r = 16
alpha = 16
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]
quantization = "4-bit"
# Training Hyperparameters
batch_size = 4
gradient_accumulation = 4
learning_rate = 2e-4
max_steps = 500
optimizer = "adamw_8bit"
scheduler = "cosine"
ποΈ Architecture
Qwen3-4B-Instruct
βββ GQA (Grouped Query Attention) β 2-3x faster inference
βββ RoPE (Rotary Position Embeddings) β better length generalization
βββ SwiGLU activation β better than GeLU
βββ Hybrid thinking β toggle reasoning on/off
β
ββββββ΄βββββ
β QLoRA β r=16, alpha=16
β Adapters β 7 target modules
ββββββ¬βββββ
β
Zelin-4B (Fine-tuned)
β
ββββββ΄βββββ
β GGUF β Q4_K_M quantization
β Export β ~2.5 GB, 30-50 tok/s CPU
βββββββββββ
π Performance
| Metric | Value |
|---|---|
| Inference speed (CPU) | 30-50 tokens/second |
| 20-token response time | 400-670ms |
| Model size (Q4_K_M) | ~2.5 GB |
| RAM usage | ~4 GB |
| Context window | 2048 tokens |
π€ Integration with Zelin Bot
// In zelin-v6/src/local-ai.js
const ZELIN_CUSTOM_REPO = 'TomatitoToho/Zelin-4B';
const ZELIN_CUSTOM_FILE = 'zelin-4b-Q4_K_M.gguf';
// The custom model handles:
// - Fast intent detection (replaces callAIBackground)
// - Moderation classification
// - Sentiment analysis
// - Casual chat fallback
// RigoChat-7B-v2 handles: main conversation responses
π¦ Repositories
- Model: TomatitoToho/Zelin-4B
- Dataset: TomatitoToho/zelin-conversations
- Inference Space: TomatitoToho/zelin-llm
- Training Space: TomatitoToho/zelin-train
- Zelin Bot: TomatitoToho/zelin-v6
π License
Apache 2.0 β Based on Qwen3-4B (Apache 2.0) + custom training data.