Poker & Blackjack AI โ Gemma 4 E4B LoRA
Fine-tuned Gemma 4 E4B (7.5B dense) for poker and blackjack decision-making.
What This Model Does
Given a poker or blackjack game state, the model outputs the optimal action as JSON.
Training Details
- Base model:
unsloth/gemma-4-E4B-it(7.5B dense) - Method: LoRA (r=16, alpha=32)
- Data: 12,848 examples (3,072 poker + 9,776 blackjack)
- Training: 3 epochs on NVIDIA RTX 3090 24GB
- Final metrics: Loss 0.099, Token accuracy 96.4%
- Cost: ~$1.32 on RunPod
Arena Results (1000 hands poker)
- BB/100: -0.1 (breakeven over 1000 hands)
- VPIP: 80.5% (plays too many hands โ GRPO fix planned)
- Beats CallingStation, survives against ExploitBot/NitBot
Usage
# Quantize to GGUF Q4_K_M (~5GB), serve with llama.cpp:
llama-server --model gemma4-poker-q4_k_m.gguf --port 8080 --n-gpu-layers 999 --jinja
Disable thinking mode: {"chat_template_kwargs": {"enable_thinking": false}}