Arijit-07
/

aria-devops-llama8b

Text Generation

reinforcement-learning

incident-response

Model card Files Files and versions

aria-devops-llama8b / README.md

Arijit-07's picture

Update README.md

e774b09 verified 28 days ago

|

history blame contribute delete

1.04 kB

	---
	base_model: unsloth/Meta-Llama-3.1-8B-Instruct
	library_name: peft
	pipeline_tag: text-generation
	tags:
	- grpo
	- reinforcement-learning
	- devops
	- incident-response
	- openenv
	- unsloth
	---

	# ARIA — DevOps Incident Response Agent
	### Llama-3.1-8B fine-tuned with GRPO

	Trained on the [ARIA DevOps Incident Response](https://huggingface.co/spaces/Arijit-07/devops-incident-response)
	live RL environment using GRPO.

	## Training Results

	\| Task \| Baseline \| Fine-tuned \| Improvement \|
	\|---\|---\|---\|---\|
	\| easy \| 0.320 \| 0.685 \| +0.365 \|
	\| medium \| 0.050 \| 0.378 \| +0.328 \|
	\| hard \| 0.190 \| 0.869 \| +0.679 \|
	\| bonus \| 0.152 \| 0.682 \| +0.530 \|

	![Training Curve](training_curve_8b.png)

	## Setup
	- Algorithm: GRPO
	- Base: Llama-3.1-8B-Instruct
	- LoRA rank: 32, alpha: 64
	- Episodes: 160 (40 per task)
	- GPU: NVIDIA L4, 162 minutes
	- Framework: Unsloth + HuggingFace TRL

	## Links
	- Environment: https://huggingface.co/spaces/Arijit-07/devops-incident-response
	- GitHub: https://github.com/Twilight-13/devops-incident-response