aria-devops-llama8b / README.md
Arijit-07's picture
Update README.md
e774b09 verified
metadata
base_model: unsloth/Meta-Llama-3.1-8B-Instruct
library_name: peft
pipeline_tag: text-generation
tags:
  - grpo
  - reinforcement-learning
  - devops
  - incident-response
  - openenv
  - unsloth

ARIA — DevOps Incident Response Agent

Llama-3.1-8B fine-tuned with GRPO

Trained on the ARIA DevOps Incident Response live RL environment using GRPO.

Training Results

Task Baseline Fine-tuned Improvement
easy 0.320 0.685 +0.365
medium 0.050 0.378 +0.328
hard 0.190 0.869 +0.679
bonus 0.152 0.682 +0.530

Training Curve

Setup

  • Algorithm: GRPO
  • Base: Llama-3.1-8B-Instruct
  • LoRA rank: 32, alpha: 64
  • Episodes: 160 (40 per task)
  • GPU: NVIDIA L4, 162 minutes
  • Framework: Unsloth + HuggingFace TRL

Links