Instructions to use PhaseOfCode/sevzero-llama3-8b-grpo-primary with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use PhaseOfCode/sevzero-llama3-8b-grpo-primary with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("unsloth/Meta-Llama-3.1-8B-Instruct") model = PeftModel.from_pretrained(base_model, "PhaseOfCode/sevzero-llama3-8b-grpo-primary") - Notebooks
- Google Colab
- Kaggle
| base_model: unsloth/Meta-Llama-3.1-8B-Instruct | |
| library_name: peft | |
| license: llama3.1 | |
| tags: | |
| - sevzero | |
| - openenv | |
| - grpo | |
| - lora | |
| - sre | |
| # SevZero GRPO-primary adapter | |
| LoRA adapter produced by the primary GRPO run for the SevZero OpenEnv India Hackathon 2026 submission. | |
| ## Training recipe | |
| - Initialization: `PhaseOfCode/sevzero-llama3-8b-sft-primary` | |
| - Base model: `unsloth/Meta-Llama-3.1-8B-Instruct` | |
| - RL method: GRPO through TRL against the live SevZero FastAPI/OpenEnv surface | |
| - Steps: 120 | |
| - Learning rate: `7e-6` | |
| - Group size: 4 generations | |
| - Temperature: 0.85 | |
| - Beta: 0.04 | |
| - Scheduler: cosine | |
| - vLLM: colocate mode, GPU memory utilization 0.55 | |
| The training loop produced nonzero reward variance, gradients, and KL movement. The held-out eval did not show score lift. | |
| ## Eval summary | |
| Held-out seeds: `13`, `99`, `777`. Tasks: Easy, Medium, Hard. | |
| | Model | Easy | Medium | Hard | Mean | | |
| |---|---:|---:|---:|---:| | |
| | Untrained Llama-3.1-8B-Instruct | 0.8199 | 0.9419 | 0.6369 | 0.7996 | | |
| | GRPO-primary | 0.8199 | 0.9419 | 0.6369 | 0.7996 | | |
| The honest conclusion: 120 GRPO steps were not enough to change deterministic held-out outcomes. SevZero's contribution is the environment, training harness, and reproducible failure surface. | |
| ## Links | |
| - Final mirrored adapter: https://huggingface.co/Mist-ic/sevzero-llama3-8b-grpo | |
| - Environment Space: https://huggingface.co/spaces/Mist-ic/sevzero-env | |
| - Blog: https://huggingface.co/spaces/Mist-ic/sevzero-env/blob/main/BLOG.md | |
| - Eval dataset: https://huggingface.co/datasets/Mist-ic/sevzero-eval-results | |