Spaces:

Arijit-07
/

devops-incident-response

Running

App Files Files Community

Arijit-07 commited on Apr 2

Commit

77eea12

1 Parent(s): 3a48bc7

Improvements: partial log obs, search_logs action, CoT inference, dashboard UI, README motivation

Browse files

Files changed (10) hide show

README.md +59 -16
api.py +147 -0
inference.py +58 -10
models.py +2 -0
openenv.yaml +2 -0
tasks/base.py +31 -3
tasks/task_bonus.py +2 -0
tasks/task_easy.py +3 -3
tasks/task_hard.py +3 -0
tasks/task_medium.py +3 -3

README.md CHANGED Viewed

@@ -30,22 +30,64 @@ remediation, while penalising collateral damage and blind actions.
 ---
-## Why This Environment?
-Every software company runs incident response. On-call engineers spend hours
-each week reading logs, correlating metrics, and executing precise remediations
-under time pressure. This is exactly the kind of multi-step, information-sparse,
-high-stakes reasoning task that separates strong AI agents from weak ones.
-**What makes it a rigorous benchmark:**
-- The hard task fires **no standard alerts** — the signal is buried in WARN-level
-  logs and business metric anomalies across 6 services
-- The reward function gives **dense partial credit** so training signal is never sparse
-- **SLA degradation** — services worsen each step if unresolved, creating real time pressure
-- **Service dependency map** — exposes call topology so agents can trace cascades
-- **Evidence log** — accumulated across steps so agents can reason over gathered data
-- **Collateral damage penalty** — restarting healthy services reduces the score
-- **Blind remediation penalty** — acting without diagnosing first is penalised
 ---
@@ -83,6 +125,7 @@ and exact metric values.
 | `read_logs` | `service` (str) | Fetch recent log lines for a service |
 | `read_metrics` | `service` (str) | Fetch CPU, memory, error rate, P99 latency |
 | `read_runbook` | `runbook` (str) | Read an operational runbook |
 | `restart_service` | `service` (str) | Restart a service (clears memory/connections) |
 | `rollback` | `service`, `version` | Roll back to a previous artifact version |
 | `scale_up` | `service` (str) | Increase replica count |

 ---
+## Motivation
+Existing agent benchmarks focus on software engineering (SWE-bench),
+web navigation (WebArena), or general tool use (AgentBench). None
+model **operational intelligence** — the ability to reason under
+uncertainty about live production systems.
+Yet incident response is one of the highest-stakes, highest-frequency
+tasks in software organizations. Every company running microservices
+faces this daily. The skills required are exactly what distinguishes
+capable AI agents from weak ones:
+- **Multi-step information gathering** under time pressure
+- **Causal reasoning** over dependent systems
+- **Precise action selection** where wrong actions cause additional damage
+- **Signal vs noise discrimination** (red-herring alerts, silent failures)
+This environment fills that gap. It is the first OpenEnv-compliant RL
+environment specifically designed to benchmark agent performance on
+production incident response.
+### Comparison to Existing Benchmarks
+| Benchmark | Domain | Multi-step | Real-world | Partial obs | Dense reward |
+|---|---|---|---|---|---|
+| SWE-bench | Code repair | ✓ | ✓ | ✗ | ✗ |
+| WebArena | Web navigation | ✓ | ✓ | ✓ | ✗ |
+| AgentBench | General tools | ✓ | Partial | ✗ | ✗ |
+| **DevOps-IR (ours)** | **Incident response** | **✓** | **✓** | **✓** | **✓** |
+### Episode Architecture
+```mermaid
+graph TD
+    A[Agent] -->|Action| B[DevOpsIncidentEnv]
+    B -->|Observation| A
+    B --> C[ServiceStatus x N]
+    B --> D[AlertList]
+    B --> E[EvidenceLog]
+    B --> F[DependencyMap]
+    B --> G[SLAStatus]
+    H[Grader] -->|score 0-1| I[Episode Analytics]
+    B -->|done=True| H
+    I --> J[steps_to_diagnosis]
+    I --> K[info_gathering_ratio]
+    I --> L[collateral_damage_events]
+```
+### What Makes This Hard
+The four tasks are designed to require qualitatively different
+reasoning strategies:
+- **Easy**: Direct signal reading — logs clearly show OOM, fix is obvious
+- **Medium**: Dependency tracing — must follow the call chain to find root
+- **Hard**: Anomaly correlation — zero error alerts, signal buried in WARN
+  logs and business metrics across 6 services
+- **Bonus**: Parallel diagnosis — two unrelated failures, agent must
+  decompose and fix independently
 ---
 | `read_logs` | `service` (str) | Fetch recent log lines for a service |
 | `read_metrics` | `service` (str) | Fetch CPU, memory, error rate, P99 latency |
 | `read_runbook` | `runbook` (str) | Read an operational runbook |
+| `search_logs` | `service`, `query` | Search log lines matching a keyword |
 | `restart_service` | `service` (str) | Restart a service (clears memory/connections) |
 | `rollback` | `service`, `version` | Roll back to a previous artifact version |
 | `scale_up` | `service` (str) | Increase replica count |

api.py CHANGED Viewed

@@ -1,5 +1,6 @@
 from __future__ import annotations
 from fastapi import FastAPI, HTTPException
 from fastapi.middleware.cors import CORSMiddleware
 from pydantic import BaseModel
 from typing import Optional
@@ -33,6 +34,152 @@ class ResetRequest(BaseModel):
     seed: Optional[int] = None
 @app.get("/health")
 def health():
     return {"status": "ok", "env": "devops-incident-response", "version": "1.0.0"}

 from __future__ import annotations
 from fastapi import FastAPI, HTTPException
+from fastapi.responses import HTMLResponse
 from fastapi.middleware.cors import CORSMiddleware
 from pydantic import BaseModel
 from typing import Optional
     seed: Optional[int] = None
+@app.get("/", response_class=HTMLResponse)
+def dashboard():
+    env_state = None
+    if _env is not None:
+        try:
+            s = _env.state()
+            env_state = s
+        except Exception:
+            pass
+    task_info = ""
+    if env_state:
+        task_info = f"""
+        <div class="stat">
+            <span class="label">Current Task</span>
+            <span class="value">{env_state.task_id.upper()}</span>
+        </div>
+        <div class="stat">
+            <span class="label">Step</span>
+            <span class="value">{env_state.step} / {env_state.current_observation.max_steps}</span>
+        </div>
+        <div class="stat">
+            <span class="label">Score So Far</span>
+            <span class="value">{env_state.info.get('current_score', 0):.3f}</span>
+        </div>
+        <div class="stat">
+            <span class="label">Resolved</span>
+            <span class="value">{'YES' if env_state.incident_resolved else 'NO'}</span>
+        </div>
+        <div class="stat">
+            <span class="label">Evidence Gathered</span>
+            <span class="value">{len(env_state.current_observation.evidence_log)} items</span>
+        </div>
+        """
+    else:
+        task_info = '<div class="stat"><span class="label">Status</span><span class="value">No active episode — call /reset to start</span></div>'
+    html = f"""<!DOCTYPE html>
+<html>
+<head>
+    <title>DevOps Incident Response — OpenEnv</title>
+    <meta charset="utf-8">
+    <meta http-equiv="refresh" content="10">
+    <style>
+        body {{ font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', sans-serif;
+               background: #0f1117; color: #e0e0e0; margin: 0; padding: 2rem; }}
+        h1 {{ color: #ff6b35; font-size: 1.8rem; margin-bottom: 0.25rem; }}
+        h2 {{ color: #888; font-size: 1rem; font-weight: 400; margin-bottom: 2rem; }}
+        .grid {{ display: grid; grid-template-columns: repeat(auto-fit, minmax(200px, 1fr)); gap: 1rem; margin-bottom: 2rem; }}
+        .stat {{ background: #1a1d27; border: 1px solid #2d3148; border-radius: 8px; padding: 1.25rem; }}
+        .label {{ display: block; color: #888; font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; margin-bottom: 0.5rem; }}
+        .value {{ display: block; font-size: 1.4rem; font-weight: 600; color: #fff; }}
+        .tasks {{ display: grid; grid-template-columns: repeat(auto-fit, minmax(280px, 1fr)); gap: 1rem; margin-bottom: 2rem; }}
+        .task {{ background: #1a1d27; border: 1px solid #2d3148; border-radius: 8px; padding: 1.25rem; }}
+        .task h3 {{ margin: 0 0 0.5rem; color: #ff6b35; font-size: 1rem; }}
+        .task p {{ margin: 0; color: #aaa; font-size: 0.85rem; line-height: 1.5; }}
+        .badge {{ display: inline-block; padding: 0.2rem 0.6rem; border-radius: 4px; font-size: 0.7rem; font-weight: 600; margin-bottom: 0.5rem; }}
+        .easy {{ background: #1a3a1a; color: #4caf50; }}
+        .medium {{ background: #3a2a1a; color: #ff9800; }}
+        .hard {{ background: #3a1a1a; color: #f44336; }}
+        .bonus {{ background: #1a1a3a; color: #9c27b0; }}
+        .endpoints {{ background: #1a1d27; border: 1px solid #2d3148; border-radius: 8px; padding: 1.25rem; margin-bottom: 2rem; }}
+        .endpoints h3 {{ margin: 0 0 1rem; color: #fff; }}
+        .endpoint {{ display: flex; align-items: center; gap: 0.75rem; margin-bottom: 0.5rem; }}
+        .method {{ background: #1e3a5f; color: #64b5f6; padding: 0.15rem 0.5rem; border-radius: 4px; font-size: 0.75rem; font-weight: 600; font-family: monospace; }}
+        .path {{ color: #81c784; font-family: monospace; font-size: 0.85rem; }}
+        .desc {{ color: #888; font-size: 0.8rem; }}
+        .footer {{ color: #555; font-size: 0.8rem; text-align: center; margin-top: 2rem; }}
+    </style>
+</head>
+<body>
+    <h1>DevOps Incident Response</h1>
+    <h2>OpenEnv — Meta x PyTorch x Hugging Face Hackathon Submission</h2>
+    <div class="grid">
+        {task_info}
+    </div>
+    <div class="tasks">
+        <div class="task">
+            <span class="badge easy">EASY</span>
+            <h3>Single Service OOM</h3>
+            <p>One service crash-loops from a memory leak. Which service varies by seed. Max 15 steps.</p>
+        </div>
+        <div class="task">
+            <span class="badge medium">MEDIUM</span>
+            <h3>Cascading Failure</h3>
+            <p>Bad deployment cascades through 3 services. One red-herring alert included. Max 20 steps.</p>
+        </div>
+        <div class="task">
+            <span class="badge hard">HARD</span>
+            <h3>Silent Data Corruption</h3>
+            <p>All services green. No error alerts. Requires correlating subtle business metric signals. Max 25 steps.</p>
+        </div>
+        <div class="task">
+            <span class="badge bonus">BONUS</span>
+            <h3>Dual Simultaneous Failure</h3>
+            <p>Two independent failures at once. Both must be fixed for full credit. Max 25 steps.</p>
+        </div>
+    </div>
+    <div class="endpoints">
+        <h3>API Endpoints</h3>
+        <div class="endpoint">
+            <span class="method">GET</span>
+            <span class="path">/health</span>
+            <span class="desc">Health check</span>
+        </div>
+        <div class="endpoint">
+            <span class="method">POST</span>
+            <span class="path">/reset</span>
+            <span class="desc">Start new episode — body: {{"task_id": "easy", "seed": 42}}</span>
+        </div>
+        <div class="endpoint">
+            <span class="method">POST</span>
+            <span class="path">/step</span>
+            <span class="desc">Take one action — body: Action JSON</span>
+        </div>
+        <div class="endpoint">
+            <span class="method">GET</span>
+            <span class="path">/state</span>
+            <span class="desc">Full state with ground truth and analytics</span>
+        </div>
+        <div class="endpoint">
+            <span class="method">GET</span>
+            <span class="path">/validate</span>
+            <span class="desc">Self-validation report for all 4 tasks</span>
+        </div>
+        <div class="endpoint">
+            <span class="method">GET</span>
+            <span class="path">/docs</span>
+            <span class="desc">Interactive API documentation (Swagger UI)</span>
+        </div>
+    </div>
+    <div class="footer">
+        Auto-refreshes every 10 seconds &nbsp;|&nbsp;
+        <a href="/docs" style="color:#ff6b35;">API Docs</a> &nbsp;|&nbsp;
+        <a href="/validate" style="color:#ff6b35;">Run Validation</a> &nbsp;|&nbsp;
+        <a href="/health" style="color:#ff6b35;">Health Check</a>
+    </div>
+</body>
+</html>"""
+    return html
 @app.get("/health")
 def health():
     return {"status": "ok", "env": "devops-incident-response", "version": "1.0.0"}

inference.py CHANGED Viewed

@@ -37,15 +37,17 @@ dependency map, and a log of all evidence you have gathered so far.
 Your strategy:
 1. Read logs and metrics for the most suspicious services BEFORE acting
-2. Use the dependency map to trace cascades to their ROOT cause
-3. Issue a DIAGNOSE action once you have enough evidence
-4. Apply the precise fix — wrong service or wrong action loses points
-5. On hard incidents: both rollback AND alert_oncall may be required
 Respond with ONLY a valid JSON object — no markdown, no commentary:
 {
-  "action_type": "<diagnose|read_logs|read_metrics|read_runbook|restart_service|rollback|scale_up|alert_oncall|acknowledge|noop>",
   "service": "<service name or null>",
   "root_cause": "<diagnosis string if action_type is diagnose, else null>",
   "runbook": "<runbook filename if action_type is read_runbook, else null>",
   "version": "<version string if action_type is rollback, else null>",
@@ -56,6 +58,19 @@ Available runbooks: high_cpu.md, memory_leak.md, db_connection.md,
 deployment_rollback.md, cascade_failure.md, data_corruption.md
 """).strip()
 def observation_to_text(obs: Observation) -> str:
     lines = [
@@ -158,6 +173,7 @@ def parse_action(response_text: str) -> Action:
         return Action(
             action_type=ActionType(at_str),
             service=data.get("service"),
             root_cause=data.get("root_cause"),
             runbook=data.get("runbook"),
             version=data.get("version"),
@@ -183,18 +199,49 @@ def run_task(client: OpenAI, task_id: str, seed: int = 42) -> dict:
         prompt = observation_to_text(obs)
         try:
-            completion = client.chat.completions.create(
                 model=MODEL_NAME,
                 messages=[
                     {"role": "system", "content": SYSTEM_PROMPT},
                     {"role": "user", "content": prompt},
                 ],
-                temperature=TEMPERATURE,
-                max_tokens=MAX_TOKENS,
             )
-            response_text = completion.choices[0].message.content or ""
         except Exception as exc:
             print(f"  Step {step:02d}: API error — {exc}")
             response_text = ""
         action = parse_action(response_text)
@@ -213,7 +260,8 @@ def run_task(client: OpenAI, task_id: str, seed: int = 42) -> dict:
         reward_str = f"  reward={result.reward:+.3f}" if result.reward != 0 else ""
         resolution_str = f"  *** {result.info.get('resolution', '')} ***" if result.done and result.info.get("resolution") else ""
-        print(f"  Step {step:02d}: {action_label}{reward_str}{resolution_str}")
         if obs.last_action_error:
             print(f"           ⚠ {obs.last_action_error[:80]}")

 Your strategy:
 1. Read logs and metrics for the most suspicious services BEFORE acting
+2. Use search_logs to find specific error patterns efficiently instead of reading all logs when you know what to look for.
+3. Use the dependency map to trace cascades to their ROOT cause
+4. Issue a DIAGNOSE action once you have enough evidence
+5. Apply the precise fix — wrong service or wrong action loses points
+6. On hard incidents: both rollback AND alert_oncall may be required
 Respond with ONLY a valid JSON object — no markdown, no commentary:
 {
+  "action_type": "<diagnose|read_logs|search_logs|read_metrics|read_runbook|restart_service|rollback|scale_up|alert_oncall|acknowledge|noop>",
   "service": "<service name or null>",
+  "query": "<search keyword if action_type is search_logs, else null>",
   "root_cause": "<diagnosis string if action_type is diagnose, else null>",
   "runbook": "<runbook filename if action_type is read_runbook, else null>",
   "version": "<version string if action_type is rollback, else null>",
 deployment_rollback.md, cascade_failure.md, data_corruption.md
 """).strip()
+REASONING_PROMPT = """
+You are a senior DevOps engineer responding to a production incident.
+Before deciding your next action, think through what you know:
+1. What services are affected and what is their status?
+2. What evidence have you gathered so far?
+3. What is the most likely root cause based on your evidence?
+4. What is the single most valuable piece of information still missing?
+5. What action would best close that information gap?
+Respond in plain text with your reasoning. Be concise (3-5 sentences).
+Do NOT output a JSON action yet — just your analysis.
+""".strip()
 def observation_to_text(obs: Observation) -> str:
     lines = [
         return Action(
             action_type=ActionType(at_str),
             service=data.get("service"),
+            query=data.get("query"),
             root_cause=data.get("root_cause"),
             runbook=data.get("runbook"),
             version=data.get("version"),
         prompt = observation_to_text(obs)
         try:
+            reasoning_completion = client.chat.completions.create(
+                model=MODEL_NAME,
+                messages=[
+                    {"role": "system", "content": REASONING_PROMPT},
+                    {"role": "user", "content": prompt},
+                ],
+                temperature=0.3,
+                max_tokens=256,
+            )
+            reasoning = reasoning_completion.choices[0].message.content or ""
+            action_prompt = f"""
+            Based on your analysis:
+            {reasoning}
+            Now output your action as a JSON object:
+            {{
+              "action_type": "...",
+              "service": "...",
+              "query": "...",
+              "root_cause": "...",
+              "runbook": "...",
+              "version": "...",
+              "reason": "one sentence summary"
+            }}
+            Output ONLY the JSON object.
+            """.strip()
+            action_completion = client.chat.completions.create(
                 model=MODEL_NAME,
                 messages=[
                     {"role": "system", "content": SYSTEM_PROMPT},
                     {"role": "user", "content": prompt},
+                    {"role": "assistant", "content": reasoning},
+                    {"role": "user", "content": action_prompt},
                 ],
+                temperature=0.1,
+                max_tokens=200,
             )
+            response_text = action_completion.choices[0].message.content or ""
         except Exception as exc:
             print(f"  Step {step:02d}: API error — {exc}")
+            reasoning = "(error)"
             response_text = ""
         action = parse_action(response_text)
         reward_str = f"  reward={result.reward:+.3f}" if result.reward != 0 else ""
         resolution_str = f"  *** {result.info.get('resolution', '')} ***" if result.done and result.info.get("resolution") else ""
+        print(f"  Step {step:02d} reasoning: {reasoning[:100]}...")
+        print(f"  Step {step:02d} action:    {action_label}{reward_str}{resolution_str}")
         if obs.last_action_error:
             print(f"           ⚠ {obs.last_action_error[:80]}")

models.py CHANGED Viewed

@@ -15,6 +15,7 @@ class ActionType(str, Enum):
     ALERT_ONCALL = "alert_oncall"
     ACKNOWLEDGE = "acknowledge"
     NOOP = "noop"
 class Action(BaseModel):
@@ -24,6 +25,7 @@ class Action(BaseModel):
     runbook: Optional[str] = None
     version: Optional[str] = None
     reason: Optional[str] = None
 class Alert(BaseModel):

     ALERT_ONCALL = "alert_oncall"
     ACKNOWLEDGE = "acknowledge"
     NOOP = "noop"
+    SEARCH_LOGS = "search_logs"
 class Action(BaseModel):
     runbook: Optional[str] = None
     version: Optional[str] = None
     reason: Optional[str] = None
+    query: Optional[str] = None  # used with search_logs
 class Alert(BaseModel):

openenv.yaml CHANGED Viewed

@@ -86,6 +86,8 @@ action_space:
       description: Record the agent's root cause hypothesis
     - name: read_logs
       description: Read recent log lines for a named service
     - name: read_metrics
       description: Read CPU, memory, error rate, latency for a named service
     - name: read_runbook

       description: Record the agent's root cause hypothesis
     - name: read_logs
       description: Read recent log lines for a named service
+    - name: search_logs
+      description: Search log lines for a service matching a query string
     - name: read_metrics
       description: Read CPU, memory, error rate, latency for a named service
     - name: read_runbook

tasks/base.py CHANGED Viewed

@@ -62,6 +62,8 @@ class InternalState:
     ground_truth_root_cause: str
     ground_truth_fix: str
     incident_start_time: str
     rewards_given: Set[str] = field(default_factory=set)
     healthy_services: List[str] = field(default_factory=list)
     evidence_log: List[dict] = field(default_factory=list)
@@ -131,6 +133,8 @@ class InternalState:
         last_action_result: Optional[str] = None,
         last_action_error: Optional[str] = None,
     ) -> Observation:
         services = []
         for name, s in self.services.items():
             services.append(ServiceStatus(
@@ -160,13 +164,16 @@ class InternalState:
             task_description=TASK_DESCRIPTIONS.get(self.task_id, ""),
             services=services,
             active_alerts=alerts,
-            recent_logs=self.logs,
             available_runbooks=AVAILABLE_RUNBOOKS,
             service_dependencies=deps,
             evidence_log=evidence,
             sla_status=sla,
-            last_action_result=last_action_result,
-            last_action_error=last_action_error,
             incident_start_time=self.incident_start_time,
             elapsed_minutes=self.step * 2,
         )
@@ -224,6 +231,27 @@ class BaseTask(ABC):
                 return result, None
             return None, f"No logs found for service '{svc}'"
         if at == "read_metrics":
             svc = action.service
             if svc and svc in state.services:

     ground_truth_root_cause: str
     ground_truth_fix: str
     incident_start_time: str
+    last_action_result: Optional[str] = field(default=None)
+    last_action_error: Optional[str] = field(default=None)
     rewards_given: Set[str] = field(default_factory=set)
     healthy_services: List[str] = field(default_factory=list)
     evidence_log: List[dict] = field(default_factory=list)
         last_action_result: Optional[str] = None,
         last_action_error: Optional[str] = None,
     ) -> Observation:
+        if last_action_result is not None: self.last_action_result = last_action_result
+        if last_action_error is not None: self.last_action_error = last_action_error
         services = []
         for name, s in self.services.items():
             services.append(ServiceStatus(
             task_description=TASK_DESCRIPTIONS.get(self.task_id, ""),
             services=services,
             active_alerts=alerts,
+            recent_logs={
+                svc: lines[-2:] + ([f"[... {len(lines)-2} more lines — use read_logs to see full history]"] if len(lines) > 2 else [])
+                for svc, lines in self.logs.items()
+            },
             available_runbooks=AVAILABLE_RUNBOOKS,
             service_dependencies=deps,
             evidence_log=evidence,
             sla_status=sla,
+            last_action_result=self.last_action_result,
+            last_action_error=self.last_action_error,
             incident_start_time=self.incident_start_time,
             elapsed_minutes=self.step * 2,
         )
                 return result, None
             return None, f"No logs found for service '{svc}'"
+        if at == "search_logs":
+            svc = action.service
+            query = (action.query or "").lower()
+            if not svc or svc not in state.logs:
+                return None, f"Unknown service '{svc}'"
+            if not query:
+                return None, "search_logs requires a query parameter"
+            lines = state.logs[svc]
+            matches = [l for l in lines if query in l.lower()]
+            if not matches:
+                result = f"No lines matching '{query}' in {svc} logs."
+            else:
+                result = f"Found {len(matches)} lines matching '{query}':\n" + "\n".join(matches)
+            state.evidence_log.append({
+                "step": state.step,
+                "source": f"search:{svc}",
+                "summary": f"Searched {svc} for '{query}': {len(matches)} matches",
+                "raw": result,
+            })
+            return result, None
         if at == "read_metrics":
             svc = action.service
             if svc and svc in state.services:

tasks/task_bonus.py CHANGED Viewed

@@ -141,7 +141,9 @@ class BonusTask(BaseTask):
         gather_map = {
             ("read_logs", "log-aggregator"):       ("rl_agg", 0.05),
             ("read_logs", "ml-inference-service"): ("rl_ml", 0.05),
             ("read_metrics", "log-aggregator"):    ("rm_agg", 0.05),
             ("read_metrics", "ml-inference-service"): ("rm_ml", 0.05),
         }

         gather_map = {
             ("read_logs", "log-aggregator"):       ("rl_agg", 0.05),
+            ("search_logs", "log-aggregator"):     ("rl_agg", 0.05),
             ("read_logs", "ml-inference-service"): ("rl_ml", 0.05),
+            ("search_logs", "ml-inference-service"):("rl_ml", 0.05),
             ("read_metrics", "log-aggregator"):    ("rm_agg", 0.05),
             ("read_metrics", "ml-inference-service"): ("rm_ml", 0.05),
         }

tasks/task_easy.py CHANGED Viewed

@@ -177,10 +177,10 @@ class EasyTask(BaseTask):
         result_text, error_text = self._apply_action_to_logs(state, action)
-        if at == ActionType.READ_LOGS and svc == failing:
-            if "read_logs" not in state.rewards_given:
                 reward += 0.15
-                state.rewards_given.add("read_logs")
         if at == ActionType.READ_METRICS and svc == failing:
             if "read_metrics" not in state.rewards_given:

         result_text, error_text = self._apply_action_to_logs(state, action)
+        if at in (ActionType.READ_LOGS, ActionType.SEARCH_LOGS) and svc == failing:
+            if "logs_investigated" not in state.rewards_given:
                 reward += 0.15
+                state.rewards_given.add("logs_investigated")
         if at == ActionType.READ_METRICS and svc == failing:
             if "read_metrics" not in state.rewards_given:

tasks/task_hard.py CHANGED Viewed

@@ -161,8 +161,11 @@ class HardTask(BaseTask):
         gather_map = {
             ("read_logs", "price-validation-service"): ("rl_price", 0.05),
             ("read_logs", "analytics-service"):         ("rl_analytics", 0.05),
             ("read_logs", "data-pipeline-service"):     ("rl_pipeline", 0.05),
             ("read_metrics", "analytics-service"):      ("rm_analytics", 0.10),
             ("read_metrics", "data-pipeline-service"):  ("rm_pipeline", 0.10),
         }

         gather_map = {
             ("read_logs", "price-validation-service"): ("rl_price", 0.05),
+            ("search_logs", "price-validation-service"): ("rl_price", 0.05),
             ("read_logs", "analytics-service"):         ("rl_analytics", 0.05),
+            ("search_logs", "analytics-service"):       ("rl_analytics", 0.05),
             ("read_logs", "data-pipeline-service"):     ("rl_pipeline", 0.05),
+            ("search_logs", "data-pipeline-service"):   ("rl_pipeline", 0.05),
             ("read_metrics", "analytics-service"):      ("rm_analytics", 0.10),
             ("read_metrics", "data-pipeline-service"):  ("rm_pipeline", 0.10),
         }

tasks/task_medium.py CHANGED Viewed

@@ -207,9 +207,9 @@ class MediumTask(BaseTask):
         result_text, error_text = self._apply_action_to_logs(state, action)
-        if at == ActionType.READ_LOGS and svc == "inventory-service":
-            if "read_logs_inv" not in state.rewards_given:
-                reward += 0.10; state.rewards_given.add("read_logs_inv")
         if at == ActionType.READ_METRICS and svc == "inventory-service":
             if "read_metrics_inv" not in state.rewards_given:
                 reward += 0.10; state.rewards_given.add("read_metrics_inv")

         result_text, error_text = self._apply_action_to_logs(state, action)
+        if at in (ActionType.READ_LOGS, ActionType.SEARCH_LOGS) and svc == "inventory-service":
+            if "logs_investigated" not in state.rewards_given:
+                reward += 0.10; state.rewards_given.add("logs_investigated")
         if at == ActionType.READ_METRICS and svc == "inventory-service":
             if "read_metrics_inv" not in state.rewards_given:
                 reward += 0.10; state.rewards_given.add("read_metrics_inv")