ARIA β€” DevOps Incident Response Agent

Llama-3.1-8B fine-tuned with GRPO

Trained on the ARIA DevOps Incident Response live RL environment using GRPO.

Training Results

Task Baseline Fine-tuned Improvement
easy 0.320 0.685 +0.365
medium 0.050 0.378 +0.328
hard 0.190 0.869 +0.679
bonus 0.152 0.682 +0.530

Training Curve

Setup

  • Algorithm: GRPO
  • Base: Llama-3.1-8B-Instruct
  • LoRA rank: 32, alpha: 64
  • Episodes: 160 (40 per task)
  • GPU: NVIDIA L4, 162 minutes
  • Framework: Unsloth + HuggingFace TRL

Links

Downloads last month
-
Safetensors
Model size
8B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Arijit-07/aria-devops-llama8b

Spaces using Arijit-07/aria-devops-llama8b 2