Text Generation
Transformers
Spanish
minecraft
discord-bot
argentine-spanish
qwen3
qLoRA
unsloth
conversational
Instructions to use TomatitoToho/Zelin-4B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use TomatitoToho/Zelin-4B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="TomatitoToho/Zelin-4B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("TomatitoToho/Zelin-4B", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use TomatitoToho/Zelin-4B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "TomatitoToho/Zelin-4B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TomatitoToho/Zelin-4B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/TomatitoToho/Zelin-4B
- SGLang
How to use TomatitoToho/Zelin-4B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "TomatitoToho/Zelin-4B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TomatitoToho/Zelin-4B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "TomatitoToho/Zelin-4B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TomatitoToho/Zelin-4B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Unsloth Studio
How to use TomatitoToho/Zelin-4B with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for TomatitoToho/Zelin-4B to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for TomatitoToho/Zelin-4B to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for TomatitoToho/Zelin-4B to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="TomatitoToho/Zelin-4B", max_seq_length=2048, ) - Docker Model Runner
How to use TomatitoToho/Zelin-4B with Docker Model Runner:
docker model run hf.co/TomatitoToho/Zelin-4B
π§ Zelin-4B β Argentine Spanish Minecraft Discord Bot LLM
Fine-tuned Qwen3-4B-Instruct for Zelin, the autonomous AI bot of the TomateSMP Minecraft server.
π― What It Does
Zelin-4B is specialized for:
- Argentine Spanish chat β speaks natively with "vos", "che", "dale", "quΓ© bajΓ³n"
- Minecraft server management β understands commands, server status, gameplay
- Intent detection β classifies what users want (JSON output)
- Moderation decisions β detects toxicity and suggests actions (JSON output)
- Sentiment analysis β reads emotional tone in Argentine context (JSON output)
- Short Discord responses β 1-3 lines, casual, no formal language
π Model Details
| Attribute | Value |
|---|---|
| Base Model | Qwen3-4B-Instruct |
| Fine-tune Method | QLoRA (4-bit, r=16) |
| Training Framework | Unsloth |
| Training Data | 3,000 ChatML conversations |
| Languages | es-AR (Argentine Spanish) |
| Context Length | 2048 tokens |
| GGUF Quantization | Q4_K_M (~2.5 GB) |
π Quick Start
llama.cpp (CPU, fastest)
# Download GGUF
huggingface-cli download TomatitoToho/Zelin-4B zelin-4b-Q4_K_M.gguf --local-dir .
# Run server
llama-server -m zelin-4b-Q4_K_M.gguf -c 2048 -t 4 --port 8080
Python (llama-cpp-python)
from llama_cpp import Llama
llm = Llama(model_path="zelin-4b-Q4_K_M.gguf", n_ctx=2048)
result = llm.create_chat_completion(
messages=[
{"role": "system", "content": "Sos Zelin, la IA del servidor TomateSMP..."},
{"role": "user", "content": "hola zelin, quΓ© onda"},
],
max_tokens=100,
temperature=0.7,
)
print(result["choices"][0]["message"]["content"])
# β "holaa, quΓ© onda che"
HuggingFace Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("TomatitoToho/Zelin-4B")
tokenizer = AutoTokenizer.from_pretrained("TomatitoToho/Zelin-4B")
π Training Data
| Category | Count | Description |
|---|---|---|
| Casual Chat | 1,142 | Argentine Spanish conversations |
| Minecraft | 706 | Server management, gameplay |
| Intent Detection | 430 | Classification JSON |
| Moderation | 288 | Action decision JSON |
| Sentiment | 284 | Emotional analysis JSON |
| Total | 3,000 | 95% train / 5% validation |
π§ Training Configuration
# QLoRA Configuration
r = 16
alpha = 16
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]
quantization = "4-bit"
# Training Hyperparameters
batch_size = 4
gradient_accumulation = 4
learning_rate = 2e-4
max_steps = 500
optimizer = "adamw_8bit"
scheduler = "cosine"
ποΈ Architecture
Qwen3-4B-Instruct
βββ GQA (Grouped Query Attention) β 2-3x faster inference
βββ RoPE (Rotary Position Embeddings) β better length generalization
βββ SwiGLU activation β better than GeLU
βββ Hybrid thinking β toggle reasoning on/off
β
ββββββ΄βββββ
β QLoRA β r=16, alpha=16
β Adapters β 7 target modules
ββββββ¬βββββ
β
Zelin-4B (Fine-tuned)
β
ββββββ΄βββββ
β GGUF β Q4_K_M quantization
β Export β ~2.5 GB, 30-50 tok/s CPU
βββββββββββ
π Performance
| Metric | Value |
|---|---|
| Inference speed (CPU) | 30-50 tokens/second |
| 20-token response time | 400-670ms |
| Model size (Q4_K_M) | ~2.5 GB |
| RAM usage | ~4 GB |
| Context window | 2048 tokens |
π€ Integration with Zelin Bot
// In zelin-v6/src/local-ai.js
const ZELIN_CUSTOM_REPO = 'TomatitoToho/Zelin-4B';
const ZELIN_CUSTOM_FILE = 'zelin-4b-Q4_K_M.gguf';
// The custom model handles:
// - Fast intent detection (replaces callAIBackground)
// - Moderation classification
// - Sentiment analysis
// - Casual chat fallback
// RigoChat-7B-v2 handles: main conversation responses
π¦ Repositories
- Model: TomatitoToho/Zelin-4B
- Dataset: TomatitoToho/zelin-conversations
- Inference Space: TomatitoToho/zelin-llm
- Training Space: TomatitoToho/zelin-train
- Zelin Bot: TomatitoToho/zelin-v6
π License
Apache 2.0 β Based on Qwen3-4B (Apache 2.0) + custom training data.
docker model run hf.co/TomatitoToho/Zelin-4B