Instructions to use TomatitoToho/Zelin-4B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use TomatitoToho/Zelin-4B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="TomatitoToho/Zelin-4B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("TomatitoToho/Zelin-4B", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use TomatitoToho/Zelin-4B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "TomatitoToho/Zelin-4B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "TomatitoToho/Zelin-4B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/TomatitoToho/Zelin-4B

SGLang

How to use TomatitoToho/Zelin-4B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "TomatitoToho/Zelin-4B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "TomatitoToho/Zelin-4B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "TomatitoToho/Zelin-4B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "TomatitoToho/Zelin-4B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio new

How to use TomatitoToho/Zelin-4B with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for TomatitoToho/Zelin-4B to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for TomatitoToho/Zelin-4B to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for TomatitoToho/Zelin-4B to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="TomatitoToho/Zelin-4B",
    max_seq_length=2048,
)

Docker Model Runner
How to use TomatitoToho/Zelin-4B with Docker Model Runner:
```
docker model run hf.co/TomatitoToho/Zelin-4B
```

🧠 Zelin-4B — Argentine Spanish Minecraft Discord Bot LLM

Fine-tuned Qwen3-4B-Instruct for Zelin, the autonomous AI bot of the TomateSMP Minecraft server.

🎯 What It Does

Zelin-4B is specialized for:

Argentine Spanish chat — speaks natively with "vos", "che", "dale", "qué bajón"
Minecraft server management — understands commands, server status, gameplay
Intent detection — classifies what users want (JSON output)
Moderation decisions — detects toxicity and suggests actions (JSON output)
Sentiment analysis — reads emotional tone in Argentine context (JSON output)
Short Discord responses — 1-3 lines, casual, no formal language

📊 Model Details

Attribute	Value
Base Model	Qwen3-4B-Instruct
Fine-tune Method	QLoRA (4-bit, r=16)
Training Framework	Unsloth
Training Data	3,000 ChatML conversations
Languages	es-AR (Argentine Spanish)
Context Length	2048 tokens
GGUF Quantization	Q4_K_M (~2.5 GB)

🚀 Quick Start

llama.cpp (CPU, fastest)

# Download GGUF
huggingface-cli download TomatitoToho/Zelin-4B zelin-4b-Q4_K_M.gguf --local-dir .

# Run server
llama-server -m zelin-4b-Q4_K_M.gguf -c 2048 -t 4 --port 8080

Python (llama-cpp-python)

from llama_cpp import Llama

llm = Llama(model_path="zelin-4b-Q4_K_M.gguf", n_ctx=2048)

result = llm.create_chat_completion(
    messages=[
        {"role": "system", "content": "Sos Zelin, la IA del servidor TomateSMP..."},
        {"role": "user", "content": "hola zelin, qué onda"},
    ],
    max_tokens=100,
    temperature=0.7,
)
print(result["choices"][0]["message"]["content"])
# → "holaa, qué onda che"

HuggingFace Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("TomatitoToho/Zelin-4B")
tokenizer = AutoTokenizer.from_pretrained("TomatitoToho/Zelin-4B")

📝 Training Data

Category	Count	Description
Casual Chat	1,142	Argentine Spanish conversations
Minecraft	706	Server management, gameplay
Intent Detection	430	Classification JSON
Moderation	288	Action decision JSON
Sentiment	284	Emotional analysis JSON
Total	3,000	95% train / 5% validation

🔧 Training Configuration

# QLoRA Configuration
r = 16
alpha = 16
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]
quantization = "4-bit"

# Training Hyperparameters
batch_size = 4
gradient_accumulation = 4
learning_rate = 2e-4
max_steps = 500
optimizer = "adamw_8bit"
scheduler = "cosine"

🏗️ Architecture

Qwen3-4B-Instruct
    ├── GQA (Grouped Query Attention) — 2-3x faster inference
    ├── RoPE (Rotary Position Embeddings) — better length generalization
    ├── SwiGLU activation — better than GeLU
    └── Hybrid thinking — toggle reasoning on/off
         │
    ┌────┴────┐
    │  QLoRA  │  r=16, alpha=16
    │  Adapters │  7 target modules
    └────┬────┘
         │
    Zelin-4B (Fine-tuned)
         │
    ┌────┴────┐
    │  GGUF   │  Q4_K_M quantization
    │  Export  │  ~2.5 GB, 30-50 tok/s CPU
    └─────────┘

📈 Performance

Metric	Value
Inference speed (CPU)	30-50 tokens/second
20-token response time	400-670ms
Model size (Q4_K_M)	~2.5 GB
RAM usage	~4 GB
Context window	2048 tokens

🤝 Integration with Zelin Bot

// In zelin-v6/src/local-ai.js
const ZELIN_CUSTOM_REPO = 'TomatitoToho/Zelin-4B';
const ZELIN_CUSTOM_FILE = 'zelin-4b-Q4_K_M.gguf';

// The custom model handles:
// - Fast intent detection (replaces callAIBackground)
// - Moderation classification
// - Sentiment analysis
// - Casual chat fallback
// RigoChat-7B-v2 handles: main conversation responses

📦 Repositories

Model: TomatitoToho/Zelin-4B
Dataset: TomatitoToho/zelin-conversations
Inference Space: TomatitoToho/zelin-llm
Training Space: TomatitoToho/zelin-train
Zelin Bot: TomatitoToho/zelin-v6

📜 License

Apache 2.0 — Based on Qwen3-4B (Apache 2.0) + custom training data.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for TomatitoToho/Zelin-4B

Base model

Qwen/Qwen3-4B-Base

Finetuned

Qwen/Qwen3-4B

Finetuned

unsloth/Qwen3-4B

Finetuned

(631)

this model

TomatitoToho
/

Zelin-4B

🧠 Zelin-4B — Argentine Spanish Minecraft Discord Bot LLM

🎯 What It Does

📊 Model Details

🚀 Quick Start

llama.cpp (CPU, fastest)

Python (llama-cpp-python)

HuggingFace Transformers

📝 Training Data

🔧 Training Configuration

🏗️ Architecture

📈 Performance

🤝 Integration with Zelin Bot

📦 Repositories

📜 License

Model tree for TomatitoToho/Zelin-4B

Spaces using TomatitoToho/Zelin-4B 2