--- license: apache-2.0 language: - en base_model: unsloth/Qwen2.5-14B-Instruct-bnb-4bit pipeline_tag: text-generation library_name: transformers tags: - reinforcement-learning - grpo - openenv - blast-radius - qwen2.5 - unsloth - sft - rl-environment metrics: - type: reward value: 0.72 name: Mean Episode Reward - type: reward value: 0.75 name: Format Reward - type: reward value: 0.48 name: KL Divergence --- # BlastRadius — GRPO Model Checkpoints This repository contains the trained model checkpoints. ## Live Demo https://huggingface.co/spaces/Idred/BlastRadius-OpenEnv ## Training Notebook https://huggingface.co/spaces/Idred/BlastRadius-OpenEnv/blob/main/BlastRadius_A100_Training_v2.ipynb ## Training Details - **Hardware:** Hugging Face Jobs (H200 GPU) - **Framework:** PyTorch 2.6 (CUDA 12.4) - **Approach:** SFT + GRPO (Reinforcement Learning) - **Experiment Tracking:** Weights & Biases (WandB) ## Note - The Space provides the complete working demo - The notebook contains the full training pipeline and reproducible steps - This repository is for model checkpoints only - HF Jobs are not publicly accessible by design - the notebook serves as the verifiable training record