Instructions to use erwingm10/gemma4-e2b-patrimo-v3.3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use erwingm10/gemma4-e2b-patrimo-v3.3 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="erwingm10/gemma4-e2b-patrimo-v3.3")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("erwingm10/gemma4-e2b-patrimo-v3.3", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use erwingm10/gemma4-e2b-patrimo-v3.3 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "erwingm10/gemma4-e2b-patrimo-v3.3"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "erwingm10/gemma4-e2b-patrimo-v3.3",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/erwingm10/gemma4-e2b-patrimo-v3.3

SGLang

How to use erwingm10/gemma4-e2b-patrimo-v3.3 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "erwingm10/gemma4-e2b-patrimo-v3.3" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "erwingm10/gemma4-e2b-patrimo-v3.3",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "erwingm10/gemma4-e2b-patrimo-v3.3" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "erwingm10/gemma4-e2b-patrimo-v3.3",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use erwingm10/gemma4-e2b-patrimo-v3.3 with Docker Model Runner:
```
docker model run hf.co/erwingm10/gemma4-e2b-patrimo-v3.3
```

Patrimo Gemma 4 E2B — Fine-tuned (v3.3)

Model: patrimo-ai/gemma4-e2b-patrimo-v3.3 Base: google/gemma-4-E2B-it (2B params, multimodal) Method: QLoRA r=64, lora_alpha=128, 4-bit quantization Training data: 2,092 examples (chat + auto-tracker + phase-1 pre-screener)

LoRA fine-tune de Gemma 4 E2B especializado en finanzas personales conversacional + clasificación de notificaciones bancarias para la app Patrimo (Android).

Casos de uso

Chat IA financiero — registro de gastos/ingresos via tool calling, consultas de balance, reportes, cierre de mes
Auto-tracker de notificaciones — clasificación de notificaciones bancarias en APPROVED/DISCARDED/REVIEW con extracción de monto, merchant, categoría
Pre-screener Phase 1 — extracción rápida de locale + monto antes de clasificación completa

Optimizado para correr on-device en Android via LiteRT-LM SDK (también disponible en GGUF para llama.cpp).

Métricas (benchmark v3, n=208 cases)

Auto-Tracker (notificaciones bancarias)

Métrica	Score	Mínimo target
Phase 1 accuracy	100% ✅	95%
Phase 1 financial F1	100% ✅	95%
AT json_valid_rate	100% ✅	98%
AT amount_accuracy	100% ✅	90%
AT merchant_clean	98% ✅	85%
AT verdict_accuracy	70.8%	88%
AT BILL_REMINDER	100% ✅	-
AT DUPLICATE	100% ✅	-
AT OWN_ACCOUNT_TRANSFER	100% ✅	-
AT MARKETING	100% ✅	-
AT OTP_OR_AUTH	100% ✅	-

Chat IA (tool calling)

Métrica	Score	Mínimo target
Chat refusal_accuracy	92.3% ✅	85%
Chat check_duplicate_rate	66.7%	95%
Chat tool_accuracy	48.8%	85%
Chat json_valid_rate	54.4%	98%

Nota sobre métricas chat: Con system prompt completo de producción (verbose), las métricas chat suben significativamente. El benchmark usa system prompt minimal para evaluar peor caso. En producción Android usando OnDeviceSystemPrompt.kt, esperar +20-30% en check_duplicate_rate.

Quickstart

Inferencia con Transformers + Unsloth

from unsloth import FastModel
from unsloth.chat_templates import get_chat_template

model, tokenizer = FastModel.from_pretrained(
    model_name="patrimo-ai/gemma4-e2b-patrimo-v3.3",
    max_seq_length=1024,
    load_in_4bit=True,
)
tokenizer = get_chat_template(tokenizer, chat_template="gemma-4")
FastModel.for_inference(model)

# Auto-tracker example
messages = [
    {
        "role": "system",
        "content": "Eres el revisor de transacciones de Patrimo. Responde con UNA SOLA LINEA de JSON valido.\n\nSCHEMA:\n(A) APPROVED: {\"verdict\":\"APPROVED\",\"amount\":NUM,\"type\":\"income\"|\"expense\",\"merchant\":\"STR\",\"category\":\"STR\",\"confidence\":0.0-1.0,\"reasoning\":\"1 linea\"}\n(B) DISCARDED: {\"verdict\":\"DISCARDED\",\"discard_reason_code\":\"CODE\",\"discard_reason_text\":\"STR\",\"reasoning\":\"1 linea\"}\n(C) REVIEW: {\"verdict\":\"REVIEW\",...}",
    },
    {
        "role": "user",
        "content": '<NOTIF>Titulo:"Bancolombia" Cuerpo:"Compra aprobada por COP 45900 en STARBUCKS CHAPINERO" App:co.com.bancolombia.personas.superapp Fecha:2026-05-07</NOTIF>\n\nClasifica.',
    },
]

inputs = tokenizer.apply_chat_template(
    messages, tokenize=True, add_generation_prompt=True, return_tensors="pt"
).to("cuda")
outputs = model.generate(input_ids=inputs, max_new_tokens=200, do_sample=False)
print(tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True))
# Expected: {"verdict":"APPROVED","amount":45900,"type":"expense","merchant":"Starbucks Chapineo","category":"Restaurantes","confidence":0.95,"reasoning":"Compra clara en cafeteria"}

Inferencia con llama.cpp (GGUF)

# Download GGUF Q4_K_M
huggingface-cli download patrimo-ai/gemma4-e2b-patrimo-v3.3 \
    --include "*.gguf" --local-dir ./model

# Run with llama.cpp
./llama-cli -m ./model/gemma4-e2b-patrimo-q4_k_m.gguf \
    -p "<start_of_turn>user\n<NOTIF>...</NOTIF><end_of_turn>\n<start_of_turn>model\n" \
    -n 200 --temp 0

On-device Android (LiteRT-LM)

Convertir GGUF → .litertlm usando los tools oficiales de Google LiteRT-LM, luego cargar via:

import com.google.ai.edge.litertlm.Engine
import com.google.ai.edge.litertlm.EngineConfig

val config = EngineConfig(
    modelPath = "/path/to/gemma4-e2b-patrimo-v3.3.litertlm",
    backend = Backend.GPU(),
)
val engine = Engine(config)
engine.initialize()

System prompts de producción

Para máxima precisión, usar los system prompts completos de producción (incluidos en prompts/):

prompts/auto_tracker_phase1.txt — pre-screener locale+amount (Phase 1)
prompts/auto_tracker_phase2.txt — clasificación completa (Phase 2)
prompts/chat_ia.txt — chat con tools

Los prompts incluyen reglas anti-hallucination y few-shot examples que el modelo fue entrenado para seguir.

Schema de tools (Chat IA)

Total: 33 tools. Subset crítico:

check_duplicate(type: "income"|"expense", amount: number, date: string)
add_expense(amount, date, description, category, payment_method?, provider?, notes?)
add_income(gross_amount, date, description, category, source?, notes?)
list_expenses(period?)
list_income(period?)
get_balance_sheet()
generate_report(period?, type?)
get_budget_status(period?)
add_asset(name, asset_type, current_value, liquidity)
update_asset(asset_id, current_value)
add_liability(name, liability_type, current_balance, creditor)
close_month(period)
add_adjustment(period, type, amount, description, reason, asset_id?, liability_id?)

Schema Auto-Tracker

type Verdict = "APPROVED" | "DISCARDED" | "REVIEW"

type Approved = {
  verdict: "APPROVED"
  amount: number
  type: "income" | "expense"
  merchant: string
  category: string
  confidence: number  // 0.0-1.0
  reasoning: string
  // Opcionales:
  amount_inferred?: boolean
  amount_inference_reason?: string
  confidence_amount?: number
  currency_original?: string  // "USD"|"EUR"|"MXN"|...
  amount_original?: number
  category_uncertain?: boolean
  detected_account_mask?: string  // "**XXXX"
}

type Discarded = {
  verdict: "DISCARDED"
  discard_reason_code: "OTP_OR_AUTH" | "MARKETING" | "LOGIN_ALERT" |
                      "BILL_REMINDER" | "OWN_ACCOUNT_TRANSFER" |
                      "DUPLICATE" | "REFUND_OR_REVERSAL" | "OTHER_NON_FINANCIAL"
  discard_reason_text: string
  reasoning: string
}

type Review = {
  verdict: "REVIEW"
  suggested_type: "income" | "expense"
  suggested_merchant: string
  suggested_category: string
  value_uncertainty_reason: string
  reasoning: string
}

Schema Phase 1 Pre-screener

type Phase1 = {
  f: 0 | 1                    // financial flag
  loc?: string                // locale (es-CO, en-US, ...)
  amt?: number                // amount
  cur?: string                // currency (COP, USD, EUR, ...)
  t?: "income" | "expense"    // type hint
}

Categorías soportadas

Gastos: Alimentacion, Transporte, Vivienda, Salud, Entretenimiento, Educacion, Ropa, Tecnologia, Restaurantes, Mercado, Gasolina, Farmacia, Otros

Ingresos: Salario, Freelance, Arriendo, Dividendos, Venta, Otros_ingresos

Locales soportados (Phase 1)

es-CO, es-MX, es-AR, es-PE, es-CL (LATAM Spanish)
pt-BR (Brazilian Portuguese)
en-US, en-CA, en-GB, en-AU, en-IN (English variants)
eu (Eurozone)

Limitaciones conocidas

Chat tool selection — tool_accuracy 48.8% significa que ~50% de los registros chat (registra/anota gasto) van directo a add_expense saltándose check_duplicate. Mitigación en producción: Android puede agregar fallback en OnDeviceAgentLoop para inyectar check_duplicate antes de add_*.
AT verdict_accuracy 70.8% — los failures principales son:
- Transferencias entre cuentas propias clasificadas como APPROVED (en lugar de DISCARDED+OWN_ACCOUNT_TRANSFER)
- Devoluciones tratadas como income
- Mitigación: Android AutoTrackerVerdictParser.kt ya tiene fallback mapping para verdicts inválidos (EXPENSE→APPROVED+type=expense, BILL_REMINDER→DISCARDED+code).
Modelo base limitado — Gemma 4 E2B tiene 2B parámetros. Para casos complejos (multi-step queries, ambigüedad de intent), un modelo más grande (E4B) tendría mejor accuracy.
Schema regression occasional — el modelo a veces emite verdict=OWN_ACCOUNT_TRANSFER en lugar del schema correcto. Aprox 2-3% de los casos AT.

Training data composition

Fuente	Ejemplos
v1 (base SFT)	890
v3 lotes (chat + AT)	562
v3.1 (anti-pattern fixes)	290
v3.2 (check_duplicate flooding + AT fixes)	350
Total	2,092

Distribución:

Chat tool calls: 1,177 (56%)
Auto-tracker (phase 1+2+legacy+edge): 857 (41%)
Chat natural (refusal/educación): 58 (3%)

Hyperparameters

LoRA rank: 64
LoRA alpha: 128
LoRA dropout: 0
Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Learning rate: 1.5e-4 (cosine schedule)
Warmup steps: 15
Weight decay: 0.01
Optimizer: adamw_8bit
Batch size: 1, gradient_accumulation_steps=4 (effective batch=4)
Epochs: 4
Max seq length: 1024
Quantization: 4-bit QLoRA (bnb)
Hardware: RTX 4070 12GB (108 min training)
Final training loss: 0.2589

Files

gemma4-e2b-patrimo-v3.3/
├── README.md                    # this file
├── adapter_config.json          # LoRA config
├── adapter_model.safetensors    # LoRA weights (~580 MB)
├── tokenizer_config.json
├── tokenizer.model
├── special_tokens_map.json
├── chat_template.jinja
├── prompts/                     # Production system prompts
│   ├── auto_tracker_phase1.txt
│   ├── auto_tracker_phase2.txt
│   └── chat_ia.txt
└── benchmarks/                  # Benchmark results
    └── v3_3_results.json

Citation

@misc{patrimo2026gemma,
  title={Patrimo Gemma 4 E2B Fine-tuned},
  author={Erwin Gomez and Patrimo.AI},
  year={2026},
  publisher={HuggingFace},
  howpublished={\url{https://huggingface.co/patrimo-ai/gemma4-e2b-patrimo-v3.3}},
}

License

Apache 2.0 (heredada del modelo base Gemma 4)

Contact

Project: https://patrimo.app
Issues: https://github.com/patrimo-ai/issues
Email: erwingm10@gmail.com

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for erwingm10/gemma4-e2b-patrimo-v3.3

Base model

google/gemma-4-E2B

Finetuned

google/gemma-4-E2B-it

Adapter

(90)

this model