Spaces:

skyruh
/

causal-stream

Sleeping

App Files Files Community

causal-stream / README.md

skyruh

docs: define explicit F1 evidence scoring math and reward hacking guardrails

eff8e56 2 months ago

preview code

raw

history blame contribute delete

3.6 kB

	---
	title: CausalStream
	emoji: 🌊
	colorFrom: indigo
	colorTo: blue
	sdk: docker
	pinned: false
	---

	# CausalStream: High-Fidelity SRE Logic Environment

	CausalStream is a specialized Reinforcement Learning (RL) environment designed to benchmark and train agents in temporal causal reasoning and resource allocation under uncertainty. It simulates a complex streaming data infrastructure (Kafka/Flink pattern) where agents must diagnose production incidents.

	## �️ Core Engine Architecture

	### 1. The Stochastic World Clock (Ticked Execution)
	Unlike traditional "static" debugging environments, CausalStream operates on a discrete Tick system.
	- Temporal Drift: Every action that consumes compute or network resources (e.g., sampling a raw stream) advances the world clock.
	- Event Physics: Events are generated with a stochastic latency model: `arrival_time = event_time + base_latency + jitter`.
	- Incident Injection: Incidents are not just boolean flags; they modify the distribution of the stream. For example, a `LATENCY_SPIKE` incident increases the `base_latency` variance, causing "late arrival" data loss in aggregation windows.

	### 2. Tiered Observation Model
	Information is not free. Agents interact with a hierarchical observation space:
	- Dashboard (L0): Free to read. Provides aggregated metrics (Revenue, Error Rate). High signal but low granularity.
	- Stream Samples (L1): Costs 1 Tick. Provides raw JSON event snippets. Necessary for detecting jitter and schema mismatches.
	- Lineage Graph (L2): Costs 1 Tick. Provides the SQL logic and dependency chain of the data pipeline.

	### 3. Counterfactual Query Engine (Causal Discovery)
	To prove a hypothesis, agents can execute Counterfactual Actions.
	- Action: `ask_counterfactual(window_offset=X)`
	- Mechanism: The engine forks the internal state and simulates what the metrics would have been if a variable (like the late-arrival window) was modified.
	- Significance: This forces agents to move beyond pattern matching and perform true scientific experimentation.

	## ⚖️ Deterministic Grading & F1 Evidence Scoring

	CausalStream solves the "Subjective Reasoning" problem by requiring agents to submit a structured Theory. The environment enforces strict, deterministic grading based on the F1 score of evidence retrieval.

	- The Theory: The agent submits a `RootCauseEnum` (e.g., `OUT_OF_ORDER`) and a `List[Evidence]` (specific JSON tracking keys or IDs).
	- The Formula:
	- `Precision = True_Positives / (True_Positives + False_Positives)`
	- `Recall = True_Positives / (True_Positives + False_Negatives)`
	- `F1_Score = 2 * (Precision * Recall) / (Precision + Recall)`
	- Scoring Bounds:
	- Incorrect cause: `0.0`.
	- Correct cause: Exactly mathematically matches the F1 bound of `(0.0, 1.0]`.

	### 🛡️ Anti-Reward Hacking (Exploit Guardrails)
	To prevent agents from "loop farming" API endpoints (Reward Hacking), all tools exist within a Stateful Tracking Set. A specific semantic action type is only rewarded once per episode. If an agent loops `sample_stream` indefinitely, the reward collapses to +0.00.

	## 🚀 Technical Setup

	### Local Development
	```bash
	pip install -r requirements.txt
	python server.py --port 7860
	```

	### Docker Deployment
	The environment is optimized for 2 vCPU / 8GB RAM constraints and exposes a REST API via FastAPI.
	```bash
	docker build -t causal-stream .
	docker run -p 7860:7860 causal-stream
	```

	---
	Developed for the Meta PyTorch OpenEnv Hackathon 2026. Focus: Causal Intelligence.