Pokemon Red Strategic Commander (Qwen3-Coder-Next 80B Merged)
An AI-powered strategic brain for Pokemon Red, fine-tuned from Qwen3-Coder-Next (80B total / 3B active MoE) using QLoRA — full-precision merged weights.
This is the full merged model (BFloat16 safetensors). For a quantized version, see the GGUF / 4B variant.
Model Description
This model is a QLoRA fine-tune of Qwen/Qwen3-Coder-Next with LoRA adapters merged back into the full-precision weights. It provides expert-level Pokemon Red gameplay guidance — analyzing game state and providing actionable strategic recommendations.
Rather than playing the game directly, it acts as an expert advisory system for Gen 1 Pokemon battles, team building, route planning, and overall strategy.
Architecture
| Parameter | Value |
|---|---|
| Architecture | Qwen3-Coder-Next (Hybrid MoE) |
| Total Parameters | 80B |
| Active Parameters | 3B (MoE routing) |
| Hidden Dimension | 2048 |
| Layers | 48 (Hybrid: Gated DeltaNet + Gated Attention + MoE) |
| Experts | 512 total, 10 active + 1 shared |
| Context Length | 262,144 tokens |
| Precision | BFloat16 |
Training Details
| Parameter | Value |
|---|---|
| Base Model | Qwen/Qwen3-Coder-Next (80B total, 3B active MoE) |
| Method | QLoRA (4-bit quantization during training) |
| LoRA Rank | 8 |
| LoRA Alpha | 16 |
| Target Modules | q_proj, k_proj, v_proj, o_proj |
| Trainable Parameters | 2,064,384 / 79,676,455,680 (0.003%) |
| Training Examples | ~1,000 (903 train / 53 val / 48 test) |
| Epochs | 3 |
| Batch Size | 16 (1 x 16 grad accum) |
| Learning Rate | 2e-4 (cosine schedule) |
| Optimizer | Paged AdamW 8-bit |
| Precision | BFloat16 |
| Hardware | NVIDIA H100 80GB HBM3 |
| Framework | Unsloth 2026.2.1 + PyTorch 2.6.0 |
Loss Curve
| Step | Loss | Epoch |
|---|---|---|
| 50 | 0.3827 | 0.89 |
| 60 | 0.3216 | 1.05 |
| 70 | 0.2321 | 1.23 |
| 80 | 0.2227 | 1.41 |
| 90 | 0.2546 | 1.58 |
| 110 | 0.1795 | 1.94 |
| 120 | 0.2046 | 2.11 |
| 130 | 0.2135 | 2.28 |
| 150 | 0.2212 | 2.64 |
| 160 | 0.1703 | 2.82 |
Training Data
Trained on ~1,000 curated instruction-response pairs (903 train / 53 val / 48 test) covering:
- Pokedex Knowledge — Stats, types, evolution chains for all 151 Gen 1 Pokemon
- Move Knowledge — Move stats, type effectiveness, PP management
- Battle Strategy — Type matchups, damage calculation, switch decisions
- Team Building — Optimal team compositions, coverage analysis
- Route Planning — Efficient progression through Kanto
- Gym Strategy — Leader teams, weaknesses, recommended counters
- Elite Four — Championship preparation and strategy
- Game Mechanics — Gen 1 quirks (badge boosts, Ghost/Psychic bug, crit formula, etc.)
- Speedrun Tactics — Optimized routing and execution
Data was sourced from PokeAPI, Bulbapedia, and the pret/pokered disassembly project, then converted into ChatML-formatted instruction pairs.
Usage
With Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"clarkkitchen22/Pokemon-Red-Qwen3-80B",
torch_dtype="auto",
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(
"clarkkitchen22/Pokemon-Red-Qwen3-80B"
)
messages = [
{"role": "system", "content": "You are the Strategic Commander for a Pokemon Red playthrough. Analyze the game state and provide optimal decisions."},
{"role": "user", "content": "I'm about to fight Misty. My team is Charmeleon (lv 22) with Ember, Slash, Leer, Rage. What should I do?"},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.3)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
With vLLM
vllm serve clarkkitchen22/Pokemon-Red-Qwen3-80B \
--port 8000 \
--tensor-parallel-size 2 \
--enable-auto-tool-choice \
--tool-call-parser qwen3_coder
With SGLang
python -m sglang.launch_server \
--model clarkkitchen22/Pokemon-Red-Qwen3-80B \
--port 30000 \
--tp-size 2 \
--tool-call-parser qwen3_coder
Convert to GGUF
You can quantize this model yourself using llama.cpp:
# Pull the model
git lfs install
git clone https://huggingface.co/clarkkitchen22/Pokemon-Red-Qwen3-80B
# Build llama.cpp and convert
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp && make -j
# Convert and quantize
python convert_hf_to_gguf.py ../Pokemon-Red-Qwen3-80B --outtype q4_k_m
Related Models
| Variant | Description | Link |
|---|---|---|
| Full Merged (this) | BFloat16 safetensors, 80B params | Pokemon-Red-Qwen3-80B |
| 4B + GGUF | Smaller model + Q4_K_M quantized | pokemon-red-commander-qwen3-4b |
Intended Use
- Pokemon Red gameplay advisory and strategy analysis
- Educational demonstration of QLoRA fine-tuning on large MoE models
- Game AI research
Limitations
- Trained exclusively on Gen 1 (Pokemon Red/Blue) data — may hallucinate about later generations
- Small training set (~1,000 examples) — responses may lack diversity
- Strategic advice quality depends on accurate game state description
- Not designed for direct game control — provides text-based recommendations only
- Full model requires significant VRAM (~160 GB for BF16, or use quantization/offloading)
Citation
@misc{pokemon-red-commander-2026,
title={Pokemon Red Strategic Commander},
author={clarkkitchen22},
year={2026},
url={https://huggingface.co/clarkkitchen22/Pokemon-Red-Qwen3-80B}
}
License
MIT
- Downloads last month
- 15
Model tree for clarkkitchen22/Pokemon-Red-Qwen3-80B
Base model
Qwen/Qwen3-Coder-Next