SevZero SFT-primary adapter
LoRA adapter for unsloth/Meta-Llama-3.1-8B-Instruct, trained for the SevZero OpenEnv India Hackathon 2026 submission.
What this is
This is the supervised fine-tuning stage of the SevZero pipeline. It was trained on curated incident-response trajectories collected from frontier teachers against the SevZero SRE simulator.
- Base model:
unsloth/Meta-Llama-3.1-8B-Instruct - Training stack:
transformers + peft + trl.SFTTrainer - Adapter: LoRA, rank around 64 over attention and MLP modules
- Precision: bf16 adapter training on a single H200 HF Job
- Steps: 200
- Learning rate:
1e-5 - Max sequence length: 1024
Unsloth is used here as the ungated base-model mirror. The GRPO stage did not use Unsloth as the trainer.
Eval summary
Held-out seeds: 13, 99, 777. Tasks: Easy, Medium, Hard.
| Model | Easy | Medium | Hard | Mean |
|---|---|---|---|---|
| Untrained Llama-3.1-8B-Instruct | 0.8199 | 0.9419 | 0.6369 | 0.7996 |
| SFT-primary | 0.8199 | 0.9419 | 0.6269 | 0.7962 |
SFT improved formatting and tool-call priors, but did not improve held-out decision quality. That flat result is discussed in the SevZero blog.
Links
- Environment Space: https://huggingface.co/spaces/Mist-ic/sevzero-env
- Blog: https://huggingface.co/spaces/Mist-ic/sevzero-env/blob/main/BLOG.md
- Eval dataset: https://huggingface.co/datasets/Mist-ic/sevzero-eval-results
- Training data: https://huggingface.co/datasets/Mist-ic/sevzero-expert-trajectories
- Downloads last month
- 49
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support
Model tree for PhaseOfCode/sevzero-llama3-8b-sft-primary
Base model
meta-llama/Llama-3.1-8B Finetuned
meta-llama/Llama-3.1-8B-Instruct Finetuned
unsloth/Meta-Llama-3.1-8B-Instruct