Update README.md

e774b09 verified 28 days ago

1.04 kB

base_model: unsloth/Meta-Llama-3.1-8B-Instruct
library_name: peft
pipeline_tag: text-generation
tags:
  - grpo
  - reinforcement-learning
  - devops
  - incident-response
  - openenv
  - unsloth

ARIA — DevOps Incident Response Agent

Llama-3.1-8B fine-tuned with GRPO

Trained on the ARIA DevOps Incident Response live RL environment using GRPO.

Training Results

Task	Baseline	Fine-tuned	Improvement
easy	0.320	0.685	+0.365
medium	0.050	0.378	+0.328
hard	0.190	0.869	+0.679
bonus	0.152	0.682	+0.530

Setup

Algorithm: GRPO
Base: Llama-3.1-8B-Instruct
LoRA rank: 32, alpha: 64
Episodes: 160 (40 per task)
GPU: NVIDIA L4, 162 minutes
Framework: Unsloth + HuggingFace TRL

Arijit-07
/

aria-devops-llama8b

ARIA — DevOps Incident Response Agent

Llama-3.1-8B fine-tuned with GRPO

Training Results

Setup

Links