waltgrace
/

poker-gemma4-e4b-lora

Text Generation

Model card Files Files and versions

Poker & Blackjack AI — Gemma 4 E4B LoRA

Fine-tuned Gemma 4 E4B (7.5B dense) for poker and blackjack decision-making.

What This Model Does

Given a poker or blackjack game state, the model outputs the optimal action as JSON.

Training Details

Base model: unsloth/gemma-4-E4B-it (7.5B dense)
Method: LoRA (r=16, alpha=32)
Data: 12,848 examples (3,072 poker + 9,776 blackjack)
Training: 3 epochs on NVIDIA RTX 3090 24GB
Final metrics: Loss 0.099, Token accuracy 96.4%
Cost: ~$1.32 on RunPod

Arena Results (1000 hands poker)

BB/100: -0.1 (breakeven over 1000 hands)
VPIP: 80.5% (plays too many hands — GRPO fix planned)
Beats CallingStation, survives against ExploitBot/NitBot

Usage

# Quantize to GGUF Q4_K_M (~5GB), serve with llama.cpp:
llama-server --model gemma4-poker-q4_k_m.gguf --port 8080 --n-gpu-layers 999 --jinja

Disable thinking mode: {"chat_template_kwargs": {"enable_thinking": false}}

Links

26B Vision version (bigger, has vision)

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for waltgrace/poker-gemma4-e4b-lora

Base model

google/gemma-4-E4B

Finetuned

google/gemma-4-E4B-it

Finetuned

unsloth/gemma-4-E4B-it

Adapter

(20)

this model