Instructions to use erwingm10/gemma4-e2b-patrimo-v3.3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use erwingm10/gemma4-e2b-patrimo-v3.3 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="erwingm10/gemma4-e2b-patrimo-v3.3") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("erwingm10/gemma4-e2b-patrimo-v3.3", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use erwingm10/gemma4-e2b-patrimo-v3.3 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "erwingm10/gemma4-e2b-patrimo-v3.3" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "erwingm10/gemma4-e2b-patrimo-v3.3", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/erwingm10/gemma4-e2b-patrimo-v3.3
- SGLang
How to use erwingm10/gemma4-e2b-patrimo-v3.3 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "erwingm10/gemma4-e2b-patrimo-v3.3" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "erwingm10/gemma4-e2b-patrimo-v3.3", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "erwingm10/gemma4-e2b-patrimo-v3.3" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "erwingm10/gemma4-e2b-patrimo-v3.3", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use erwingm10/gemma4-e2b-patrimo-v3.3 with Docker Model Runner:
docker model run hf.co/erwingm10/gemma4-e2b-patrimo-v3.3
- Patrimo Gemma 4 E2B — Fine-tuned (v3.3)
- Casos de uso
- Métricas (benchmark v3, n=208 cases)
- Quickstart
- System prompts de producción
- Schema de tools (Chat IA)
- Schema Auto-Tracker
- Schema Phase 1 Pre-screener
- Categorías soportadas
- Locales soportados (Phase 1)
- Limitaciones conocidas
- Training data composition
- Hyperparameters
- Files
- Citation
- License
- Contact
- Casos de uso
Patrimo Gemma 4 E2B — Fine-tuned (v3.3)
Model: patrimo-ai/gemma4-e2b-patrimo-v3.3
Base: google/gemma-4-E2B-it (2B params, multimodal)
Method: QLoRA r=64, lora_alpha=128, 4-bit quantization
Training data: 2,092 examples (chat + auto-tracker + phase-1 pre-screener)
LoRA fine-tune de Gemma 4 E2B especializado en finanzas personales conversacional + clasificación de notificaciones bancarias para la app Patrimo (Android).
Casos de uso
- Chat IA financiero — registro de gastos/ingresos via tool calling, consultas de balance, reportes, cierre de mes
- Auto-tracker de notificaciones — clasificación de notificaciones bancarias en APPROVED/DISCARDED/REVIEW con extracción de monto, merchant, categoría
- Pre-screener Phase 1 — extracción rápida de locale + monto antes de clasificación completa
Optimizado para correr on-device en Android via LiteRT-LM SDK (también disponible en GGUF para llama.cpp).
Métricas (benchmark v3, n=208 cases)
Auto-Tracker (notificaciones bancarias)
| Métrica | Score | Mínimo target |
|---|---|---|
| Phase 1 accuracy | 100% ✅ | 95% |
| Phase 1 financial F1 | 100% ✅ | 95% |
| AT json_valid_rate | 100% ✅ | 98% |
| AT amount_accuracy | 100% ✅ | 90% |
| AT merchant_clean | 98% ✅ | 85% |
| AT verdict_accuracy | 70.8% | 88% |
| AT BILL_REMINDER | 100% ✅ | - |
| AT DUPLICATE | 100% ✅ | - |
| AT OWN_ACCOUNT_TRANSFER | 100% ✅ | - |
| AT MARKETING | 100% ✅ | - |
| AT OTP_OR_AUTH | 100% ✅ | - |
Chat IA (tool calling)
| Métrica | Score | Mínimo target |
|---|---|---|
| Chat refusal_accuracy | 92.3% ✅ | 85% |
| Chat check_duplicate_rate | 66.7% | 95% |
| Chat tool_accuracy | 48.8% | 85% |
| Chat json_valid_rate | 54.4% | 98% |
Nota sobre métricas chat: Con system prompt completo de producción (verbose), las métricas chat suben significativamente. El benchmark usa system prompt minimal para evaluar peor caso. En producción Android usando OnDeviceSystemPrompt.kt, esperar +20-30% en check_duplicate_rate.
Quickstart
Inferencia con Transformers + Unsloth
from unsloth import FastModel
from unsloth.chat_templates import get_chat_template
model, tokenizer = FastModel.from_pretrained(
model_name="patrimo-ai/gemma4-e2b-patrimo-v3.3",
max_seq_length=1024,
load_in_4bit=True,
)
tokenizer = get_chat_template(tokenizer, chat_template="gemma-4")
FastModel.for_inference(model)
# Auto-tracker example
messages = [
{
"role": "system",
"content": "Eres el revisor de transacciones de Patrimo. Responde con UNA SOLA LINEA de JSON valido.\n\nSCHEMA:\n(A) APPROVED: {\"verdict\":\"APPROVED\",\"amount\":NUM,\"type\":\"income\"|\"expense\",\"merchant\":\"STR\",\"category\":\"STR\",\"confidence\":0.0-1.0,\"reasoning\":\"1 linea\"}\n(B) DISCARDED: {\"verdict\":\"DISCARDED\",\"discard_reason_code\":\"CODE\",\"discard_reason_text\":\"STR\",\"reasoning\":\"1 linea\"}\n(C) REVIEW: {\"verdict\":\"REVIEW\",...}",
},
{
"role": "user",
"content": '<NOTIF>Titulo:"Bancolombia" Cuerpo:"Compra aprobada por COP 45900 en STARBUCKS CHAPINERO" App:co.com.bancolombia.personas.superapp Fecha:2026-05-07</NOTIF>\n\nClasifica.',
},
]
inputs = tokenizer.apply_chat_template(
messages, tokenize=True, add_generation_prompt=True, return_tensors="pt"
).to("cuda")
outputs = model.generate(input_ids=inputs, max_new_tokens=200, do_sample=False)
print(tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True))
# Expected: {"verdict":"APPROVED","amount":45900,"type":"expense","merchant":"Starbucks Chapineo","category":"Restaurantes","confidence":0.95,"reasoning":"Compra clara en cafeteria"}
Inferencia con llama.cpp (GGUF)
# Download GGUF Q4_K_M
huggingface-cli download patrimo-ai/gemma4-e2b-patrimo-v3.3 \
--include "*.gguf" --local-dir ./model
# Run with llama.cpp
./llama-cli -m ./model/gemma4-e2b-patrimo-q4_k_m.gguf \
-p "<start_of_turn>user\n<NOTIF>...</NOTIF><end_of_turn>\n<start_of_turn>model\n" \
-n 200 --temp 0
On-device Android (LiteRT-LM)
Convertir GGUF → .litertlm usando los tools oficiales de Google LiteRT-LM, luego cargar via:
import com.google.ai.edge.litertlm.Engine
import com.google.ai.edge.litertlm.EngineConfig
val config = EngineConfig(
modelPath = "/path/to/gemma4-e2b-patrimo-v3.3.litertlm",
backend = Backend.GPU(),
)
val engine = Engine(config)
engine.initialize()
System prompts de producción
Para máxima precisión, usar los system prompts completos de producción (incluidos en prompts/):
prompts/auto_tracker_phase1.txt— pre-screener locale+amount (Phase 1)prompts/auto_tracker_phase2.txt— clasificación completa (Phase 2)prompts/chat_ia.txt— chat con tools
Los prompts incluyen reglas anti-hallucination y few-shot examples que el modelo fue entrenado para seguir.
Schema de tools (Chat IA)
Total: 33 tools. Subset crítico:
check_duplicate(type: "income"|"expense", amount: number, date: string)
add_expense(amount, date, description, category, payment_method?, provider?, notes?)
add_income(gross_amount, date, description, category, source?, notes?)
list_expenses(period?)
list_income(period?)
get_balance_sheet()
generate_report(period?, type?)
get_budget_status(period?)
add_asset(name, asset_type, current_value, liquidity)
update_asset(asset_id, current_value)
add_liability(name, liability_type, current_balance, creditor)
close_month(period)
add_adjustment(period, type, amount, description, reason, asset_id?, liability_id?)
Schema Auto-Tracker
type Verdict = "APPROVED" | "DISCARDED" | "REVIEW"
type Approved = {
verdict: "APPROVED"
amount: number
type: "income" | "expense"
merchant: string
category: string
confidence: number // 0.0-1.0
reasoning: string
// Opcionales:
amount_inferred?: boolean
amount_inference_reason?: string
confidence_amount?: number
currency_original?: string // "USD"|"EUR"|"MXN"|...
amount_original?: number
category_uncertain?: boolean
detected_account_mask?: string // "**XXXX"
}
type Discarded = {
verdict: "DISCARDED"
discard_reason_code: "OTP_OR_AUTH" | "MARKETING" | "LOGIN_ALERT" |
"BILL_REMINDER" | "OWN_ACCOUNT_TRANSFER" |
"DUPLICATE" | "REFUND_OR_REVERSAL" | "OTHER_NON_FINANCIAL"
discard_reason_text: string
reasoning: string
}
type Review = {
verdict: "REVIEW"
suggested_type: "income" | "expense"
suggested_merchant: string
suggested_category: string
value_uncertainty_reason: string
reasoning: string
}
Schema Phase 1 Pre-screener
type Phase1 = {
f: 0 | 1 // financial flag
loc?: string // locale (es-CO, en-US, ...)
amt?: number // amount
cur?: string // currency (COP, USD, EUR, ...)
t?: "income" | "expense" // type hint
}
Categorías soportadas
Gastos: Alimentacion, Transporte, Vivienda, Salud, Entretenimiento, Educacion, Ropa, Tecnologia, Restaurantes, Mercado, Gasolina, Farmacia, Otros
Ingresos: Salario, Freelance, Arriendo, Dividendos, Venta, Otros_ingresos
Locales soportados (Phase 1)
- es-CO, es-MX, es-AR, es-PE, es-CL (LATAM Spanish)
- pt-BR (Brazilian Portuguese)
- en-US, en-CA, en-GB, en-AU, en-IN (English variants)
- eu (Eurozone)
Limitaciones conocidas
Chat tool selection —
tool_accuracy 48.8%significa que ~50% de los registros chat (registra/anota gasto) van directo aadd_expensesaltándosecheck_duplicate. Mitigación en producción: Android puede agregar fallback enOnDeviceAgentLooppara inyectar check_duplicate antes de add_*.AT verdict_accuracy 70.8% — los failures principales son:
- Transferencias entre cuentas propias clasificadas como APPROVED (en lugar de DISCARDED+OWN_ACCOUNT_TRANSFER)
- Devoluciones tratadas como income
- Mitigación: Android
AutoTrackerVerdictParser.ktya tiene fallback mapping para verdicts inválidos (EXPENSE→APPROVED+type=expense, BILL_REMINDER→DISCARDED+code).
Modelo base limitado — Gemma 4 E2B tiene 2B parámetros. Para casos complejos (multi-step queries, ambigüedad de intent), un modelo más grande (E4B) tendría mejor accuracy.
Schema regression occasional — el modelo a veces emite
verdict=OWN_ACCOUNT_TRANSFERen lugar del schema correcto. Aprox 2-3% de los casos AT.
Training data composition
| Fuente | Ejemplos |
|---|---|
| v1 (base SFT) | 890 |
| v3 lotes (chat + AT) | 562 |
| v3.1 (anti-pattern fixes) | 290 |
| v3.2 (check_duplicate flooding + AT fixes) | 350 |
| Total | 2,092 |
Distribución:
- Chat tool calls: 1,177 (56%)
- Auto-tracker (phase 1+2+legacy+edge): 857 (41%)
- Chat natural (refusal/educación): 58 (3%)
Hyperparameters
- LoRA rank: 64
- LoRA alpha: 128
- LoRA dropout: 0
- Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- Learning rate: 1.5e-4 (cosine schedule)
- Warmup steps: 15
- Weight decay: 0.01
- Optimizer: adamw_8bit
- Batch size: 1, gradient_accumulation_steps=4 (effective batch=4)
- Epochs: 4
- Max seq length: 1024
- Quantization: 4-bit QLoRA (bnb)
- Hardware: RTX 4070 12GB (108 min training)
- Final training loss: 0.2589
Files
gemma4-e2b-patrimo-v3.3/
├── README.md # this file
├── adapter_config.json # LoRA config
├── adapter_model.safetensors # LoRA weights (~580 MB)
├── tokenizer_config.json
├── tokenizer.model
├── special_tokens_map.json
├── chat_template.jinja
├── prompts/ # Production system prompts
│ ├── auto_tracker_phase1.txt
│ ├── auto_tracker_phase2.txt
│ └── chat_ia.txt
└── benchmarks/ # Benchmark results
└── v3_3_results.json
Citation
@misc{patrimo2026gemma,
title={Patrimo Gemma 4 E2B Fine-tuned},
author={Erwin Gomez and Patrimo.AI},
year={2026},
publisher={HuggingFace},
howpublished={\url{https://huggingface.co/patrimo-ai/gemma4-e2b-patrimo-v3.3}},
}
License
Apache 2.0 (heredada del modelo base Gemma 4)
Contact
- Project: https://patrimo.app
- Issues: https://github.com/patrimo-ai/issues
- Email: erwingm10@gmail.com