Text Generation
PEFT
Safetensors
Transformers
Spanish
lora
sft
trl
intent-classification
spanish
conversational
Instructions to use RagnarokReinier/schakel-gemma-router with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use RagnarokReinier/schakel-gemma-router with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("google/gemma-4-E2B-it") model = PeftModel.from_pretrained(base_model, "RagnarokReinier/schakel-gemma-router") - Transformers
How to use RagnarokReinier/schakel-gemma-router with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="RagnarokReinier/schakel-gemma-router") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("RagnarokReinier/schakel-gemma-router", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use RagnarokReinier/schakel-gemma-router with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "RagnarokReinier/schakel-gemma-router" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "RagnarokReinier/schakel-gemma-router", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/RagnarokReinier/schakel-gemma-router
- SGLang
How to use RagnarokReinier/schakel-gemma-router with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "RagnarokReinier/schakel-gemma-router" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "RagnarokReinier/schakel-gemma-router", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "RagnarokReinier/schakel-gemma-router" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "RagnarokReinier/schakel-gemma-router", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use RagnarokReinier/schakel-gemma-router with Docker Model Runner:
docker model run hf.co/RagnarokReinier/schakel-gemma-router
Schakel Intent Router
A LoRA adapter fine-tuned on google/gemma-4-E2B-it for Spanish-language intent classification.
Classifies user input into one of three intents and responds with strict JSON:
{"intent": "DOMOTICA"}
Intents
| Intent | Description |
|---|---|
DOMOTICA |
Home automation commands (lights, temperature, appliances) |
MUSICA |
Music playback requests |
GENERAL |
Everything else (time, jokes, general questions) |
Usage
import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "google/gemma-4-E2B-it"
adapter_path = "RagnarokReinier/schakel-gemma-router"
tokenizer = AutoTokenizer.from_pretrained(adapter_path)
base_model = AutoModelForCausalLM.from_pretrained(model_id, dtype=torch.bfloat16, device_map="auto")
model = PeftModel.from_pretrained(base_model, adapter_path)
model.eval()
messages = [
{"role": "system", "content": "Clasifica la intención del usuario. Responde solo JSON válido con una clave intent y un valor entre DOMOTICA, MUSICA o GENERAL."},
{"role": "user", "content": "enciende la luz del salón"},
]
inputs = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt", return_dict=True).to(model.device)
with torch.no_grad():
output_ids = model.generate(**inputs, max_new_tokens=64, do_sample=False)
new_tokens = output_ids[0, inputs["input_ids"].shape[1]:]
print(tokenizer.decode(new_tokens, skip_special_tokens=True))
# {"intent":"DOMOTICA"}
Training Details
- Base model: google/gemma-4-E2B-it
- Method: LoRA (rank 16, alpha 32) via SFTTrainer
- Precision: bf16
- Target modules: language model attention + MLP projections (
q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj) - Dataset: 240 train / 60 validation examples (Spanish voice assistant text)
- Epochs: 3
- Learning rate: 2e-4
- Max sequence length: 256
- Batch size: 4 (gradient accumulation: 4)
Results
| Metric | Value |
|---|---|
| Train loss | 4.07 |
| Eval loss | 0.48 |
| Token accuracy | ~90% |
| Training time | ~3.5 min (Apple Silicon, MPS) |
Framework versions
- PEFT 0.18.1
- TRL 1.1.0
- Transformers 5.5.4
- PyTorch 2.11.0
- Downloads last month
- 1