Pokemon Red Strategic Commander (Qwen3-Coder-Next 80B Merged)

An AI-powered strategic brain for Pokemon Red, fine-tuned from Qwen3-Coder-Next (80B total / 3B active MoE) using QLoRA — full-precision merged weights.

This is the full merged model (BFloat16 safetensors). For a quantized version, see the GGUF / 4B variant.

Model Description

This model is a QLoRA fine-tune of Qwen/Qwen3-Coder-Next with LoRA adapters merged back into the full-precision weights. It provides expert-level Pokemon Red gameplay guidance — analyzing game state and providing actionable strategic recommendations.

Rather than playing the game directly, it acts as an expert advisory system for Gen 1 Pokemon battles, team building, route planning, and overall strategy.

Architecture

Parameter Value
Architecture Qwen3-Coder-Next (Hybrid MoE)
Total Parameters 80B
Active Parameters 3B (MoE routing)
Hidden Dimension 2048
Layers 48 (Hybrid: Gated DeltaNet + Gated Attention + MoE)
Experts 512 total, 10 active + 1 shared
Context Length 262,144 tokens
Precision BFloat16

Training Details

Parameter Value
Base Model Qwen/Qwen3-Coder-Next (80B total, 3B active MoE)
Method QLoRA (4-bit quantization during training)
LoRA Rank 8
LoRA Alpha 16
Target Modules q_proj, k_proj, v_proj, o_proj
Trainable Parameters 2,064,384 / 79,676,455,680 (0.003%)
Training Examples ~1,000 (903 train / 53 val / 48 test)
Epochs 3
Batch Size 16 (1 x 16 grad accum)
Learning Rate 2e-4 (cosine schedule)
Optimizer Paged AdamW 8-bit
Precision BFloat16
Hardware NVIDIA H100 80GB HBM3
Framework Unsloth 2026.2.1 + PyTorch 2.6.0

Loss Curve

Step Loss Epoch
50 0.3827 0.89
60 0.3216 1.05
70 0.2321 1.23
80 0.2227 1.41
90 0.2546 1.58
110 0.1795 1.94
120 0.2046 2.11
130 0.2135 2.28
150 0.2212 2.64
160 0.1703 2.82

Training Data

Trained on ~1,000 curated instruction-response pairs (903 train / 53 val / 48 test) covering:

  • Pokedex Knowledge — Stats, types, evolution chains for all 151 Gen 1 Pokemon
  • Move Knowledge — Move stats, type effectiveness, PP management
  • Battle Strategy — Type matchups, damage calculation, switch decisions
  • Team Building — Optimal team compositions, coverage analysis
  • Route Planning — Efficient progression through Kanto
  • Gym Strategy — Leader teams, weaknesses, recommended counters
  • Elite Four — Championship preparation and strategy
  • Game Mechanics — Gen 1 quirks (badge boosts, Ghost/Psychic bug, crit formula, etc.)
  • Speedrun Tactics — Optimized routing and execution

Data was sourced from PokeAPI, Bulbapedia, and the pret/pokered disassembly project, then converted into ChatML-formatted instruction pairs.

Usage

With Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "clarkkitchen22/Pokemon-Red-Qwen3-80B",
    torch_dtype="auto",
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(
    "clarkkitchen22/Pokemon-Red-Qwen3-80B"
)

messages = [
    {"role": "system", "content": "You are the Strategic Commander for a Pokemon Red playthrough. Analyze the game state and provide optimal decisions."},
    {"role": "user", "content": "I'm about to fight Misty. My team is Charmeleon (lv 22) with Ember, Slash, Leer, Rage. What should I do?"},
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.3)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

With vLLM

vllm serve clarkkitchen22/Pokemon-Red-Qwen3-80B \
    --port 8000 \
    --tensor-parallel-size 2 \
    --enable-auto-tool-choice \
    --tool-call-parser qwen3_coder

With SGLang

python -m sglang.launch_server \
    --model clarkkitchen22/Pokemon-Red-Qwen3-80B \
    --port 30000 \
    --tp-size 2 \
    --tool-call-parser qwen3_coder

Convert to GGUF

You can quantize this model yourself using llama.cpp:

# Pull the model
git lfs install
git clone https://huggingface.co/clarkkitchen22/Pokemon-Red-Qwen3-80B

# Build llama.cpp and convert
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp && make -j

# Convert and quantize
python convert_hf_to_gguf.py ../Pokemon-Red-Qwen3-80B --outtype q4_k_m

Related Models

Variant Description Link
Full Merged (this) BFloat16 safetensors, 80B params Pokemon-Red-Qwen3-80B
4B + GGUF Smaller model + Q4_K_M quantized pokemon-red-commander-qwen3-4b

Intended Use

  • Pokemon Red gameplay advisory and strategy analysis
  • Educational demonstration of QLoRA fine-tuning on large MoE models
  • Game AI research

Limitations

  • Trained exclusively on Gen 1 (Pokemon Red/Blue) data — may hallucinate about later generations
  • Small training set (~1,000 examples) — responses may lack diversity
  • Strategic advice quality depends on accurate game state description
  • Not designed for direct game control — provides text-based recommendations only
  • Full model requires significant VRAM (~160 GB for BF16, or use quantization/offloading)

Citation

@misc{pokemon-red-commander-2026,
  title={Pokemon Red Strategic Commander},
  author={clarkkitchen22},
  year={2026},
  url={https://huggingface.co/clarkkitchen22/Pokemon-Red-Qwen3-80B}
}

License

MIT

Downloads last month
15
Safetensors
Model size
80B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for clarkkitchen22/Pokemon-Red-Qwen3-80B

Finetuned
(13)
this model