PhaseOfCode
/

sevzero-llama3-8b-sft-primary

Model card Files Files and versions

sevzero-llama3-8b-sft-primary / README.md

PhaseOfCode's picture

Replace template card with SevZero details

37e61c3 verified 21 days ago

|

history blame contribute delete

1.64 kB

	---
	base_model: unsloth/Meta-Llama-3.1-8B-Instruct
	library_name: peft
	license: llama3.1
	tags:
	- sevzero
	- openenv
	- sft
	- lora
	- sre
	---

	# SevZero SFT-primary adapter

	LoRA adapter for `unsloth/Meta-Llama-3.1-8B-Instruct`, trained for the SevZero OpenEnv India Hackathon 2026 submission.

	## What this is

	This is the supervised fine-tuning stage of the SevZero pipeline. It was trained on curated incident-response trajectories collected from frontier teachers against the SevZero SRE simulator.

	- Base model: `unsloth/Meta-Llama-3.1-8B-Instruct`
	- Training stack: `transformers + peft + trl.SFTTrainer`
	- Adapter: LoRA, rank around 64 over attention and MLP modules
	- Precision: bf16 adapter training on a single H200 HF Job
	- Steps: 200
	- Learning rate: `1e-5`
	- Max sequence length: 1024

	Unsloth is used here as the ungated base-model mirror. The GRPO stage did not use Unsloth as the trainer.

	## Eval summary

	Held-out seeds: `13`, `99`, `777`. Tasks: Easy, Medium, Hard.

	\| Model \| Easy \| Medium \| Hard \| Mean \|
	\|---\|---:\|---:\|---:\|---:\|
	\| Untrained Llama-3.1-8B-Instruct \| 0.8199 \| 0.9419 \| 0.6369 \| 0.7996 \|
	\| SFT-primary \| 0.8199 \| 0.9419 \| 0.6269 \| 0.7962 \|

	SFT improved formatting and tool-call priors, but did not improve held-out decision quality. That flat result is discussed in the SevZero blog.

	## Links

	- Environment Space: https://huggingface.co/spaces/Mist-ic/sevzero-env
	- Blog: https://huggingface.co/spaces/Mist-ic/sevzero-env/blob/main/BLOG.md
	- Eval dataset: https://huggingface.co/datasets/Mist-ic/sevzero-eval-results
	- Training data: https://huggingface.co/datasets/Mist-ic/sevzero-expert-trajectories