SevZero SFT-primary adapter

LoRA adapter for unsloth/Meta-Llama-3.1-8B-Instruct, trained for the SevZero OpenEnv India Hackathon 2026 submission.

What this is

This is the supervised fine-tuning stage of the SevZero pipeline. It was trained on curated incident-response trajectories collected from frontier teachers against the SevZero SRE simulator.

Base model: unsloth/Meta-Llama-3.1-8B-Instruct
Training stack: transformers + peft + trl.SFTTrainer
Adapter: LoRA, rank around 64 over attention and MLP modules
Precision: bf16 adapter training on a single H200 HF Job
Steps: 200
Learning rate: 1e-5
Max sequence length: 1024

Unsloth is used here as the ungated base-model mirror. The GRPO stage did not use Unsloth as the trainer.

Eval summary

Held-out seeds: 13, 99, 777. Tasks: Easy, Medium, Hard.

Model	Easy	Medium	Hard	Mean
Untrained Llama-3.1-8B-Instruct	0.8199	0.9419	0.6369	0.7996
SFT-primary	0.8199	0.9419	0.6269	0.7962

SFT improved formatting and tool-call priors, but did not improve held-out decision quality. That flat result is discussed in the SevZero blog.

Model tree for PhaseOfCode/sevzero-llama3-8b-sft-primary

Base model

meta-llama/Llama-3.1-8B

Finetuned

meta-llama/Llama-3.1-8B-Instruct

Finetuned

unsloth/Meta-Llama-3.1-8B-Instruct

Adapter

(433)

this model

PhaseOfCode
/

sevzero-llama3-8b-sft-primary

SevZero SFT-primary adapter

What this is

Eval summary

Links

Model tree for PhaseOfCode/sevzero-llama3-8b-sft-primary

Space using PhaseOfCode/sevzero-llama3-8b-sft-primary 1