alexchapin/portkit-coder-8b-grpo7

Self-reflection RL fine-tuned model for Minecraft Java (Forge) to Bedrock Add-on conversion.

Model Details

  • Base model: Qwen/Qwen3-8B
  • Training method: GRPO with self-reflection rewards (inspired by ReflexiCoder)
  • Checkpoint: GRPO7 final (100 steps, group_size=12)
  • Learning rate: 1e-6

Reward Components

Component Weight Description
manifest_completeness 0.20 format_version, header, uuid, version validation
structure_building 0.20 JSON structure, module types
api_correctness 0.20 @minecraft/server API usage
js_syntax 0.20 JavaScript syntax validity
self_reflection 0.20 Correction pattern detection

Training Details

  • Steps: 100
  • Group size: 12
  • Reward function: Self-reflection rewards inspired by ReflexiCoder
  • Key insight: Lower LR (1e-6) for stability; slightly better JS API correctness (72.7% vs 72.5%)

Usage

from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("alexchapin/portkit-coder-8b-grpo7")

Framework Versions

  • tinker-cookbook: 0.4.1
  • transformers: 5.5.3
  • torch: 2.11.0+rocm7.2
Downloads last month
27
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for alexchapin/portkit-coder-8b-grpo7

Finetuned
Qwen/Qwen3-8B
Finetuned
(1609)
this model