danish1423/sre-agent-training-data
Viewer โข Updated โข 9k โข 7
How to use princeuser/llama-3.2-3b-sre-agent with Unsloth Studio:
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for princeuser/llama-3.2-3b-sre-agent to start chatting
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for princeuser/llama-3.2-3b-sre-agent to start chatting
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for princeuser/llama-3.2-3b-sre-agent to start chatting
pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
model_name="princeuser/llama-3.2-3b-sre-agent",
max_seq_length=2048,
)This is a fine-tuned version of Llama 3.2 3B Instruct specifically trained to act as an autonomous Site Reliability Engineering (SRE) agent. It is designed to navigate the SRE Decision Environment, a Dec-POMDP simulator for incident response.
The model was trained using Unsloth with a two-phase approach:
You can load this model efficiently using unsloth for inference. Since this repository contains LoRA adapters, they will be seamlessly merged into the base model.
pip install unsloth
from unsloth import FastLanguageModel
import torch
# Load the model and tokenizer
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="princeuser/llama-3.2-3b-sre-agent",
max_seq_length=2048,
dtype=None, # Auto-detects bfloat16/float16
load_in_4bit=True, # Optimizes for consumer GPUs like T4
)
# Switch to inference mode
FastLanguageModel.for_inference(model)
# Define your incident scenario
system_prompt = """You are the Lead SRE Manager.
Available services: [api_gateway, auth_service, user_db, frontend_service, product_db, cache_service]
Root Causes: [cpu_saturation, memory_leak, db_connection_leak, cascading_failure]
Fix Map: cpu_saturationโscale api_gateway | memory_leakโrestart auth_service | db_connection_leakโrestart user_db | cascading_failureโrestart cache_service"""
scenario = """INCIDENT ACTIVE.
Logs show: auth_service โ FATAL: OutOfMemoryError in auth_service
Metrics show: auth_service cpu=0.95, latency=0.03, error_rate=0.0
All other services are running normally.
What is the root cause and how do you fix it?"""
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": scenario},
]
# Apply chat template
prompt = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
# Generate response
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.1, # Low temperature for deterministic actions
do_sample=True,
)
response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)
The model was rigorously tested against standard failure scenarios defined in the sre-env-triage project:
| Scenario | Root Cause | Fix Action | Service Target | Success Rate |
|---|---|---|---|---|
| Memory Leak | โ Correct | โ Correct | โ Correct | 100% |
| CPU Saturation | โ Correct | โ Correct | โ Correct | 100% |
| DB Connection Leak | โ Correct | โ Correct | โ Correct | 100% |
| Cascading Failure | โ Correct | โ Correct | โ Correct | 100% |
Action: execute_fix\nAction Input: {"service_name": "auth_service"}).This model follows the llama3.2 license. Please ensure compliance with Meta's acceptable use policy.
Base model
meta-llama/Llama-3.2-3B-Instruct