Instructions to use chayuto/gemma-4-e2b-it-solitaire-advisor-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use chayuto/gemma-4-e2b-it-solitaire-advisor-lora with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # if on a CUDA device, also pip install mlx[cuda] # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("chayuto/gemma-4-e2b-it-solitaire-advisor-lora") prompt = "Once upon a time in" text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- MLX LM
How to use chayuto/gemma-4-e2b-it-solitaire-advisor-lora with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Generate some text mlx_lm.generate --model "chayuto/gemma-4-e2b-it-solitaire-advisor-lora" --prompt "Once upon a time"
gemma-4-E2B-it solitaire-advisor LoRA
A LoRA adapter that distils a 31B Gemma Klondike Solitaire advisor into the ~2B-effective Gemma 4 E2B text model (4-bit, MLX), runnable locally on a 16 GB Apple Silicon Mac.
This is the Gemma 4 E2B successor to
chayuto/gemma-3n-e2b-it-solitaire-advisor-lora.
That earlier card promised a Gemma 4 E2B variant "once mlx-lm ships the
missing architecture support"; the base is now loadable via a small local
sanitize() patch (see Usage), so this is that promised release,
same project, same teacher, same data pipeline.
It is the project's strongest and lead student: it beats the untuned base under a trusted full-game evaluation both on held-out decks and on fresh, never-seen decks (it generalizes), not just on a single-turn bench. Headline numbers, all adjudicated by exact engine replay plus a sound solver:
- In-distribution (13 held-out solver-winnable decks): 5 wins vs the base's 1, mean foundation cards 27.7 vs 14.2.
- Out-of-distribution (12 fresh, solver-winnable decks with zero corpus overlap): 5 wins vs the base's 1, +12.9 mean paired foundation-card delta, better than base on 9 of 12 decks.
The teacher is gemma-4-31b-it (Google's Gemma 4 31B, accessed through a
separate harvester app). The teacher itself wins roughly 31% of games, so this
adapter's ceiling is teacher-level imitation, not optimal play.
Model details
| Base model | mlx-community/Gemma4-E2B-IT-Text-int4 (Gemma 4 E2B IT, text-only, int4) |
| Adapter type | LoRA over a 4-bit-quantised base |
| LoRA rank | 16 (scale 2.0, dropout 0.05) |
| LoRA target modules | self_attn.{q,k,v,o}_proj, mlp.{gate,up,down}_proj |
| LoRA layers | top 16 layers |
| Training framework | mlx-lm |
| Hardware | Apple M5, 16 GB unified memory (Metal GPU) |
| Adapter size on disk | ~51 MB per checkpoint (bf16 LoRA weights over the 4-bit base) |
| Iterations trained | 1,000 (shipped weights = iter 1,000) |
| Quantisation | base remains int4; LoRA weights bfloat16 |
| Decoding used for eval | greedy |
The shipped adapters.safetensors is byte-identical to
checkpoints/0001000_adapters.safetensors (iter 1,000), selected over the
earlier checkpoints by the checkpoint-selection pass
(below).
Intended use
In scope. Acting as a move-selection advisor inside a Klondike Solitaire client that already enforces game rules:
- Imperfect-information draw-1 Klondike (one card flipped from stock per draw); the advisor is shown the full visible state plus the count of face-down cards.
- Per-turn decisions: given the harvester prompt (with a
LEGAL MOVESblock), emit a single JSON object choosing one offered legal move by index, ormove_index: -1to resign. - Local inference on Apple Silicon via
mlx-lm.
Out of scope.
- Open-ended chat or general text generation. The model is fine-tuned to a narrow JSON-emitting role and is expected to be worse than the base at unrelated tasks.
- Game-rule enforcement. It selects from a
legalMovesarray supplied in the prompt; it does not verify legality from first principles. - Optimal play. The teacher wins ~31% of games; this adapter inherits that ceiling.
- Other Solitaire variants (Spider, FreeCell), out of distribution.
Evaluation
Method
Unlike the 3n predecessor (scored on a 20-state single-turn tier bench), this
adapter is evaluated on full games, which is the eval the project now
trusts. Each game is played turn-by-turn under the faithful production prompt
(hybrid-v1.6) on a fixed deck, greedy decoding, cap 200 turns, with a tiered
JSON parse-rescue (temp-0.3 retry) matching production. Every finished or
resigned game is then exact-adjudicated: the recorded decisions are replayed
through the engine with zero drift, and the final or resign position is handed
to a sound best-first solver (SOLVED / UNSOLVABLE / UNKNOWN at a 300k-node cap).
This distinguishes a real win from a cap-truncated one, and a correct resign on
a dead board from a quit on a winnable one. The two eval sets:
- In-distribution: 13 winnable decks held out by seed from the harvester pool that the training data was drawn from.
- Generalization: 12 freshly dealt, solver-confirmed-winnable decks (seeds 9000002..9000026) with verified zero overlap with the benchmark or any training corpus, so they are guaranteed unseen by teacher and student.
meanFC below is the mean number of cards on foundations (out of 52) at game
end across the deck set; 52 is a win.
In-distribution (13 held-out winnable decks)
| arm | corpus | meanFC | wins | resigns |
|---|---|---|---|---|
| base (untuned) | none | 14.2 | 1 | 0 |
| gate | 2,492 rows, 100% won | 27.5 | 4 | 0 |
| allsucc | 2,500 rows, 38% won | 25.5 | 3 | 1 |
| volume (this adapter) | 6,859 rows, 36% won | 27.7 | 5 | 1 |
Two facts this established. First, training beats the untuned base under trusted
eval (the first time in the project). Second, the won-only filter is not the
lever; data volume is: the won-only gate and the natural-mix allsucc
tie at matched size, and the full-volume arm here is the strongest student, so
"collect and train on more data" is the validated recipe. The 1 in-distribution
resign (#4221577640) is on a deal that was winnable from the deal; whether
volume's board at the moment it resigned was already dead is UNKNOWN at the
300k-node cap, so it is neither claimed correct nor false.
Generalization (12 fresh, never-seen, solver-winnable decks)
The decisive question was whether the student learned to play Klondike or memorized the harvester's deck distribution. On fresh decks with zero corpus overlap, paired against base:
| arm | wins/12 | meanFC | mean paired delta vs base | better than base |
|---|---|---|---|---|
| base | 1 | 15.6 | --- | --- |
| gate (won-only) | 3 | 21.2 | +5.6 | 7/12 |
| volume (this adapter) | 5 | 28.5 | +12.9 | 9/12 |
Verdict: GENERALIZES, decisively. Both trained arms show a positive paired delta and win multiple fresh, never-seen decks that base cannot. The student learned to play; it did not memorize the deck pool. Win counts are exact (no cap-truncated win above fc=40). Read the paired delta, not absolute fc: the fresh set is biased easy (only deals the solver cracked under a 200k-node cap were kept), which lifts both arms and compresses the gap.
Resigns on the fresh set are informative because every fresh deck is winnable.
Volume resigned 2, and both were adjudicated correct: #9000021 (fc18/fd3)
solved UNSOLVABLE, #9000024 (fc11/fd12) solved UNSOLVABLE. Both boards were
already structurally dead at the moment of resignation, so the resigns saved a
200-turn flail rather than throwing a winnable game. Where there is an error it
is earlier play, not the resign itself (judge resign-correctness on the board at
resign, not the deck at deal).
Checkpoint selection (why iter 1,000)
Quality across the four saved checkpoints is non-monotonic, so the final checkpoint is not automatically the best; it was selected on the 13 held-out decks. JSON discipline is measured as temp-0.3 parse-rescues (lower is cleaner):
| checkpoint | wins | meanFC | temp parse-rescues |
|---|---|---|---|
| iter 250 | 5 | 31.7 | 102 |
| iter 500 | 2 | 21.3 | 126 |
| iter 1,000 (shipped) | 5 | 27.7 | 34 |
Iter 1,000 keeps the win count of the best early checkpoint while being roughly
3x cleaner on JSON, and iter 500 is a deep trough. The remaining JSON-discipline
gap is best closed by constrained decoding at inference, not by an earlier
checkpoint (see Limitations). The 250/500/750 checkpoints are
included under checkpoints/ for reproducibility of this table.
Note on comparing to the 3n predecessor
The 3n adapter's headline numbers come from a 20-state single-turn tier bench; this adapter's come from full-game play with exact adjudication. They are not directly comparable. The shift to full-game adjudicated eval is deliberate: in this project, raw harness scores have repeatedly disagreed with what an exact replay plus a sound solver show, so single-turn or unadjudicated numbers are treated as unreliable. The full-game result here is the stronger and more honest claim.
Training data
Source. Per-decision play logs from a Klondike Solitaire client where the
31B gemma-4-31b-it teacher chose moves turn-by-turn, published as the
chayuto/klondike-llm-decisions
dataset (CC-BY-4.0). Logs span multiple app builds and prompt-template versions.
Selection (this adapter, the "volume" arm). The entire non-eval success
pool: 6,859 decisions across 77 games (36% of which were won), with the 13
eval seeds held out. Split at the game level into 5,663 train / 531 validation /
665 test. At iters=1,000 with batch size 1 the model sees 1,000 examples
(0.18 epoch), so this arm tests "more unique, diverse data at a fixed gradient
budget", and it won: full volume beat a matched 2,500-row arm on wins (5 vs 3).
Known data-quality issues the adapter inherits:
- Mixed prompt-template formats across the corpus (v1.0 through v1.6); eval is on the v1.6 production template only, so cross-template generalization is a confound that the held-out and fresh-deck results only partially control for.
- Lost-game trajectories in the corpus teach a resign capability. On the
evidence above this is net useful (correct resigns on dead boards), but it is
a behaviour the won-only
gatearm does not have. - The teacher itself wins ~31% of games, so imitation targets are imperfect.
Training procedure
model: mlx-community/Gemma4-E2B-IT-Text-int4
max_seq_length: 2048
batch_size: 1
num_layers: 16
grad_checkpoint: true
learning_rate: 2.0e-4
iters: 1000
save_every: 250
val_batches: 25
lora_parameters:
rank: 16
scale: 2.0
dropout: 0.05
keys:
- self_attn.q_proj
- self_attn.k_proj
- self_attn.v_proj
- self_attn.o_proj
- mlp.gate_proj
- mlp.up_proj
- mlp.down_proj
Hyperparameters are identical to the project's other Gemma 4 E2B arms (gate, allsucc, v2, v5), so arm-to-arm differences are attributable to the corpus, not the optimiser.
Usage
Install
# Apple Silicon, Python 3.12 venv recommended
python3.12 -m venv venv && source venv/bin/activate
pip install mlx mlx-lm huggingface-hub
Loading the base (important)
The Gemma4-E2B-IT-Text-int4 base needs a small sanitize() patch to load on
current mlx-lm (it strips the alternating-attention k_norm/k_proj/v_proj
keys that the loader does not yet implement). The 6-line patch used here is
gemma4_finetune/gemma4_text_patch.py in the source repo
(github.com/chayuto/solitaire-analytics);
apply it (or use an mlx-lm version that has since merged the support) before
loading. Peak memory to load the patched base is ~2.7 GB.
Quick start
from huggingface_hub import snapshot_download
from mlx_lm import load, generate
# import gemma4_text_patch # apply the base-loading patch first (see above)
adapter_path = snapshot_download(
repo_id="chayuto/gemma-4-e2b-it-solitaire-advisor-lora",
allow_patterns=["adapters.safetensors", "adapter_config.json"],
)
model, tokenizer = load(
"mlx-community/Gemma4-E2B-IT-Text-int4",
adapter_path=adapter_path,
)
solitaire_prompt = open("your_solitaire_prompt.txt").read() # the v1.6 harvester prompt
wrapped = tokenizer.apply_chat_template(
[{"role": "user", "content": solitaire_prompt}],
tokenize=False, add_generation_prompt=True,
)
print(generate(model, tokenizer, prompt=wrapped, max_tokens=512))
The model emits a JSON object whose final_decision.move_index is a 0-based
index into the prompt's legalMoves array (or -1 to resign). Trust
move_index, not the prose. The exact prompt renderer, the play harness
(tournament_A.py, play_deck_with_student.py), and the adjudicator
(adjudicate_final_position.py) that produced every number in this card are in
the source repo.
Limitations
- Small n. 13 in-distribution and 12 fresh decks, with high per-deck variance. Read the paired deltas; do not over-interpret a single deck.
- Easy-deck bias on the fresh set. Only deals the solver cracked under a 200k-node cap were kept, so harder-but-winnable deals are under-represented.
- JSON discipline is imperfect. The eval used a tiered parse-rescue (temp-0.3 retry); raw greedy output is not 100% valid JSON. For production, use constrained decoding or a JSON grammar at inference. This is a known gap, not yet fixed in the weights.
- Teacher ceiling. Trained to imitate a teacher that wins ~31% of games; this is an advisor, not a solver, for imperfect-information draw-1 only.
- Greedy eval only. Behaviour under temperature sampling is untested.
- Base-loading patch required. See Usage; the int4 Gemma 4 E2B text base
does not load on stock
mlx-lmwithout thesanitize()patch. - Apple-Silicon / MLX only. CUDA/CPU inference via
transformers/PEFT is not validated here. - Loads only onto this exact base. The LoRA was trained against
mlx-community/Gemma4-E2B-IT-Text-int4; applying it to a different quant or the full-precision base is unvalidated.
License
The adapter is released under the Gemma Terms of Use (inherited from the
base model). Use, redistribution, and modification require compliance with the
Gemma Prohibited Use Policy.
The training and evaluation code in the source repository is MIT. The
training data (chayuto/klondike-llm-decisions)
is CC-BY-4.0.
Citation
@misc{orapinpatipat2026solitaireadvisorgemma4,
title = {Distilling a 31B Klondike Solitaire advisor into Gemma 4 E2B via MLX LoRA},
author = {Orapinpatipat, Chayut},
year = {2026},
month = jun,
howpublished = {\url{https://huggingface.co/chayuto/gemma-4-e2b-it-solitaire-advisor-lora}},
note = {LoRA adapter; volume arm, iter-1000 checkpoint},
}
Acknowledgements
- Base model
mlx-community/Gemma4-E2B-IT-Text-int4from themlx-communityteam. - Training framework
mlx-lmfrom Apple Machine Learning Research. - Teacher model
gemma-4-31b-itfrom Google DeepMind.
Project status
This is the Gemma 4 E2B milestone of an ongoing project (solitaire-analytics):
the strongest student so far, beating the untuned base under trusted full-game
eval and generalizing to fresh, unseen decks. It supersedes the
gemma-3n
adapter as the project's lead student; the 3n repo remains as the v1 baseline.
Next directions: constrained decoding for JSON robustness, a resign-calibration pass (loop-compressed training reaches winnable endgames but can resign too early), and solver-as-teacher trajectories to break the ~31% teacher imitation ceiling.
Quantized
Model tree for chayuto/gemma-4-e2b-it-solitaire-advisor-lora
Base model
google/gemma-4-E2B