Files changed (1) hide show
  1. README.md +33 -1
README.md CHANGED
@@ -1,21 +1,53 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  # BlastRadius — GRPO Model Checkpoints
2
 
3
  This repository contains the trained model checkpoints.
4
 
5
  ## Live Demo
 
6
  https://huggingface.co/spaces/Idred/BlastRadius-OpenEnv
7
 
8
  ## Training Notebook
 
9
  https://huggingface.co/spaces/Idred/BlastRadius-OpenEnv/blob/main/BlastRadius_A100_Training_v2.ipynb
10
 
11
  ## Training Details
 
12
  - **Hardware:** Hugging Face Jobs (H200 GPU)
13
  - **Framework:** PyTorch 2.6 (CUDA 12.4)
14
  - **Approach:** SFT + GRPO (Reinforcement Learning)
15
  - **Experiment Tracking:** Weights & Biases (WandB)
16
 
17
  ## Note
 
18
  - The Space provides the complete working demo
19
  - The notebook contains the full training pipeline and reproducible steps
20
  - This repository is for model checkpoints only
21
- - HF Jobs are not publicly accessible by design the notebook serves as the verifiable training record
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ base_model: unsloth/Qwen2.5-14B-Instruct-bnb-4bit
6
+ pipeline_tag: text-generation
7
+ library_name: transformers
8
+ tags:
9
+ - reinforcement-learning
10
+ - grpo
11
+ - openenv
12
+ - blast-radius
13
+ - qwen2.5
14
+ - unsloth
15
+ - sft
16
+ - rl-environment
17
+ metrics:
18
+ - type: reward
19
+ value: 0.72
20
+ name: Mean Episode Reward
21
+ - type: reward
22
+ value: 0.75
23
+ name: Format Reward
24
+ - type: reward
25
+ value: 0.48
26
+ name: KL Divergence
27
+ ---
28
+
29
  # BlastRadius — GRPO Model Checkpoints
30
 
31
  This repository contains the trained model checkpoints.
32
 
33
  ## Live Demo
34
+
35
  https://huggingface.co/spaces/Idred/BlastRadius-OpenEnv
36
 
37
  ## Training Notebook
38
+
39
  https://huggingface.co/spaces/Idred/BlastRadius-OpenEnv/blob/main/BlastRadius_A100_Training_v2.ipynb
40
 
41
  ## Training Details
42
+
43
  - **Hardware:** Hugging Face Jobs (H200 GPU)
44
  - **Framework:** PyTorch 2.6 (CUDA 12.4)
45
  - **Approach:** SFT + GRPO (Reinforcement Learning)
46
  - **Experiment Tracking:** Weights & Biases (WandB)
47
 
48
  ## Note
49
+
50
  - The Space provides the complete working demo
51
  - The notebook contains the full training pipeline and reproducible steps
52
  - This repository is for model checkpoints only
53
+ - HF Jobs are not publicly accessible by design - the notebook serves as the verifiable training record