Spaces:

omm7
/

CausalOps-Env

Sleeping

App Files Files Community

CausalOps-Env / openenv.yaml

omm7

Upload folder using huggingface_hub

bc2ead7 verified about 2 months ago

raw

history blame contribute delete

3.27 kB

	name: NovaTechIncidentCommand
	description: >
	Seeded OpenEnv incident-response benchmark built from a realistic NovaTech log corpus.
	Agents operate under partial observability: they must query logs, inspect dependencies,
	update a structured causal hypothesis, choose safe containment, and submit a final report.

	tasks:
	- id: easy
	description: Detect a clear login outage caused by auth-service heap exhaustion.
	- id: medium
	description: Resolve competing hypotheses during a payment confirmation outage.
	- id: hard
	description: Reconstruct a cascading multi-service incident under partial observability.

	action_space:
	type: structured
	fields:
	session_id: string
	action_type: "query_logs \| inspect_dependencies \| update_hypothesis \| execute_containment \| submit_report \| request_more \| no_anomalies"
	query: "optional structured filter with service_name, server_id, levels, start_time, end_time, text_contains, limit"
	target_service: "optional service name"
	hypothesis: "optional structured tuple: primary_service, failure_mode, dependency, customer_impact, confidence"
	containment_plan: "optional list of containment action names"
	report: "optional structured report with evidence_log_ids, impacted_services, root_cause, containment_plan, summary"

	observation_space:
	type: structured
	fields:
	session_id: string
	task_id: string
	task_title: string
	briefing: "structured incident briefing with incident window, objective, suspected_services, customer_statement, operational_constraints"
	dependency_graph: "service dependency map"
	visible_logs: "list of currently revealed log entries only"
	revealed_log_count: integer
	visited_services: "list of services explored so far"
	submitted_containment: "list of chosen containment actions"
	last_hypothesis: "optional structured root-cause hypothesis"
	step_number: integer
	max_steps: integer
	feedback: string
	done: boolean
	notes:
	- "Observations expose only agent-revealed logs."
	- "The dependency graph is visible, but hidden logs and gold evidence remain private."
	- "The latest structured hypothesis is included so agents can reason iteratively."

	reward_definition:
	type: scalar
	range: [0.0, 1.0]
	components:
	signal_reward: "Rewards newly discovered relevant signals and evidence quality."
	hypothesis_reward: "Rewards improvement toward the gold causal tuple and safe containment alignment."
	efficiency_reward: "Rewards solving within the action budget."
	penalty: "Penalizes unseen evidence, contradictions, forbidden containment, loops, and empty queries."
	techniques:
	- "Information-gain shaping: focused discovery beats broad noisy retrieval."
	- "Best-hypothesis tracking: reward is tied to causal improvement across the episode."
	- "Observation-consistent grading: unseen evidence references are rejected."
	- "Contradiction penalties: evidence, cause, impact, and timeline must agree."
	- "Safety shaping: destructive containment is penalized even if diagnosis is partially correct."

	interfaces:
	reset: "reset() -> initial observation"
	step: "step(action) -> observation, reward, done, info"
	state: "state() -> non-leaking public session state"