---
base_model: unsloth/Meta-Llama-3.1-8B-Instruct
library_name: peft
license: llama3.1
tags:
- sevzero
- openenv
- sft
- lora
- sre
---

# SevZero SFT-primary adapter

LoRA adapter for `unsloth/Meta-Llama-3.1-8B-Instruct`, trained for the SevZero OpenEnv India Hackathon 2026 submission.

## What this is

This is the supervised fine-tuning stage of the SevZero pipeline. It was trained on curated incident-response trajectories collected from frontier teachers against the SevZero SRE simulator.

- Base model: `unsloth/Meta-Llama-3.1-8B-Instruct`
- Training stack: `transformers + peft + trl.SFTTrainer`
- Adapter: LoRA, rank around 64 over attention and MLP modules
- Precision: bf16 adapter training on a single H200 HF Job
- Steps: 200
- Learning rate: `1e-5`
- Max sequence length: 1024

Unsloth is used here as the ungated base-model mirror. The GRPO stage did not use Unsloth as the trainer.

## Eval summary

Held-out seeds: `13`, `99`, `777`. Tasks: Easy, Medium, Hard.

| Model | Easy | Medium | Hard | Mean |
|---|---:|---:|---:|---:|
| Untrained Llama-3.1-8B-Instruct | 0.8199 | 0.9419 | 0.6369 | 0.7996 |
| SFT-primary | 0.8199 | 0.9419 | 0.6269 | 0.7962 |

SFT improved formatting and tool-call priors, but did not improve held-out decision quality. That flat result is discussed in the SevZero blog.

## Links

- Environment Space: https://huggingface.co/spaces/Mist-ic/sevzero-env
- Blog: https://huggingface.co/spaces/Mist-ic/sevzero-env/blob/main/BLOG.md
- Eval dataset: https://huggingface.co/datasets/Mist-ic/sevzero-eval-results
- Training data: https://huggingface.co/datasets/Mist-ic/sevzero-expert-trajectories