Arijit-07
/

aria-devops-llama8b

Text Generation

reinforcement-learning

incident-response

Model card Files Files and versions

ARIA — DevOps Incident Response Agent

Llama-3.1-8B fine-tuned with GRPO

Trained on the ARIA DevOps Incident Response live RL environment using GRPO.

Training Results

Task	Baseline	Fine-tuned	Improvement
easy	0.320	0.685	+0.365
medium	0.050	0.378	+0.328
hard	0.190	0.869	+0.679
bonus	0.152	0.682	+0.530

Setup

Algorithm: GRPO
Base: Llama-3.1-8B-Instruct
LoRA rank: 32, alpha: 64
Episodes: 160 (40 per task)
GPU: NVIDIA L4, 162 minutes
Framework: Unsloth + HuggingFace TRL

Links

Environment: https://huggingface.co/spaces/Arijit-07/devops-incident-response
GitHub: https://github.com/Twilight-13/devops-incident-response

Downloads last month: -

Safetensors

Model size

8B params

Tensor type

BF16

·

Model tree for Arijit-07/aria-devops-llama8b

Base model

meta-llama/Llama-3.1-8B

Finetuned

meta-llama/Llama-3.1-8B-Instruct

Finetuned

unsloth/Meta-Llama-3.1-8B-Instruct

Adapter

(434)

this model

Spaces using Arijit-07/aria-devops-llama8b 2